Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^7: How to parse not closed HTML tags that don't have any attributes?

by haukex (Archbishop)
on Mar 08, 2021 at 16:06 UTC ( [id://11129345]=note: print w/replies, xml ) Need Help??


in reply to Re^6: How to parse not closed HTML tags that don't have any attributes?
in thread How to parse not closed HTML tags that don't have any attributes?

In your code, this is a hash, which gets returned from the subroutine as a pointer to a hash. If I understand correctly, inside the hash are three hashes ("Address", "Company" and "Phone").

Yes, that's correct, though in Perl we call them "references" instead of "pointers" (one of the differences being they're automatically memory-managed and garbage collected, with the exception of circular references). The full technical description is that sub get_data returns a reference to the hash %data, a hash that is newly allocated for each call to the sub, and whose values are references to other anonymous hashes. This is also called a "hash of hashes" or HoH, though the data structures can get arbitrarily complex.

learn about arrays, hashes, nested hashes, references to hashes etc.

Further reading: perldata, the Perl Data Structures Cookbook (perldsc) and perlreftut.

Wouldn't it be much quicker to throw data inside the subroutine not into a hash of hashes, but directly into a single, not deep array instead, and return a reference to that array?

Sure, that would certainly be an option. Personally I just like retaining as much information from the original data as possible, this usually allows for much easier future enhancements. For example, keeping the structure means you could easily also dump the data to JSON.

figure out a way to get Text::CSV running, by "unwrapping" the reference to a hash of hashes, getting a reference for each of the three included hashes, turning every of these hashes into an array, combining the arrays into one array, getting a reference to this array, and then calling Text::CSV with this reference.

One option of several to make the dereferencing a little easier might be Data::Diver.

use warnings; use strict; use Data::Diver qw/Dive/; use Text::CSV; my @data = ( { Company => { companyname => "Randomcompany" }, Address => { city => "Randomcity", street_and_nr => "SampleStreet 123", zip => "45678" }, Phone => { Telephone => "0123-4 56 78 90" }, }, { Company => { companyname => "Other Company" }, Address => { address => "Someplace 42\n12345 City" }, Phone => { Telefax => "333", Telephone => "+1 234 567 8900" }, } ); my $csv = Text::CSV->new({binary=>1, auto_diag=>2, eol=>$/ }); $csv->print(select, ['Company','Address','Phone','Fax']); for my $rec (@data) { my $addr = Dive($rec, 'Address', 'address') || Dive($rec, 'Address', 'street_and_nr') ."\n".Dive($rec, 'Address', 'zip') ." ".Dive($rec, 'Address', 'city'); $addr =~ s/\n/, /g; my @cols = ( scalar Dive($rec, 'Company', 'companyname'), $addr, scalar Dive($rec, 'Phone', 'Telephone'), scalar Dive($rec, 'Phone', 'Telefax'), ); $csv->print(select, \@cols); } __END__ Company,Address,Phone,Fax Randomcompany,"SampleStreet 123, 45678 Randomcity","0123-4 56 78 90", "Other Company","Someplace 42, 12345 City","+1 234 567 8900",333

Note the reason I use scalar is because Dive is documented to return an empty list if it doesn't find anything, and the empty list interpolated into an array means that the following elements of the array would shift down accordingly. scalar forces a single return value, e.g. undef, so that this doesn't happen. It's not needed for $addr because that's already a scalar variable.

In $csv->print(select, \@cols), select gets the current default output handle, usually STDOUT, but you could just as well pass a filehandle here to write to an output file (see "open" Best Practices).

Replies are listed 'Best First'.
Re^8: How to parse not closed HTML tags that don't have any attributes?
by Rantanplan (Novice) on Mar 09, 2021 at 13:39 UTC

    Many thanks Haukex, for your invaluable help!!

    After a little learning about how to correctly do the UTF-8 decoding, the script is now giving me human readable output. And, with the Mojo:DOM parsing and the Data::Diver dereferencing, everything works like a charm!

    Many thanks also to everyone else here, for your great input! Much appreciated! :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11129345]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-20 12:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found