Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Troubles with simple parsing

by uksza (Canon)
on Dec 19, 2004 at 02:11 UTC ( [id://415927]=perlquestion: print w/replies, xml ) Need Help??

uksza has asked for the wisdom of the Perl Monks concerning the following question:

Hello!

I'm newbi and sill I've got some troubles with simple things. I know that practice is a way to perfection, so I'm training all exercises I can find.
And I've found something like this:

In articles.txt files is:
[Athlon 4000+] price=300 euro produce=AMD description=Fast [Celeron 3000] price=200 euro produce=Intel description=Slower
How to parse this into:
%hash( "item"=>"Athlon 4000+", "price"=>"300 euro" "produce"="AMD" "description="Fast"); etc...
I've made some terrible stupid code:
#!/usr/bin/perl use warnings; use strict; my $file = "articles.txt"; open( FILE, "< $file" ); my @all = <FILE>; close FILE; my $temp; my ( @art, @produce, @prices, @descriptions ); foreach $temp (@all) { if ( $temp =~ m/\[(.*)\]/ ) { push @art, $1; } } foreach $temp (@all) { if ( $temp =~ m/produce=(.*)/ ) { push @produce, $1; } } foreach $temp (@all) { if ( $temp =~ m/price=(.*)/ ) { push @prices, $1; } } foreach $temp (@all) { if ( $temp =~ m/descriptions=(.*)/ ) { push @descriptions, $1; } }
and try to do with it something like this:
my %hash = ( "art" => ( shift @art ), "produce" => ( shift @produce ), "price" => ( shift @prices ), "descriptions" => ( shift @descriptions) );
I know it is totaly stupid - this is the worse code i ever wrote, but I have no idea how to do things like that.
Any compassionate monk can whisper me any hint?
Thanks a lot

Uksza

Replies are listed 'Best First'.
Re: Troubles with simple parsing
by ikegami (Patriarch) on Dec 19, 2004 at 02:34 UTC

    1) There are modules (search CPAN for "INI") that can help you here. Let's forget that given this is an excercise.

    2) I don't think you want a hash. Don't you want a list (read array) of records (read hash)?

    my @list; while (@art) { my $record = { "art" => shift(@art), "produce" => shift(@produce), "price" => shift(@prices), "descriptions" => shift(@descriptions), }; push(@list, $record); ) print("We have ", scalar(@list), " models for sale.\n"); print("\n"); print("They are:\n"); foreach (@list) { print("Model: ", $_->{"art"}, "\n"); print("Produced by: ", $_->{"produce"}, "\n"); print("Price: $", $_->{"price"}, "\n"); print("Desc: ", $_->{"descriptions"}, "\n"); print("\n"); }

    Update: Typed printf where I should have typed print. Fixed

      hey!
      Thanks for your reply!

      1) Do you mean App::Serializer::Ini ?
      2) You're right!! I know it but I doesn't know how to do it ;-)
Re: Troubles with simple parsing
by graff (Chancellor) on Dec 19, 2004 at 03:34 UTC
    Some hints:
    • Check the "perlvar" man page to read about the "INPUT_RECORD_SEPARATOR" variable, "$/". You can set it to read each block of lines (up to the next blank line) instead of one line at a time, so each element of your "@all" array contains a full config record.
    • Look up the "split" function (perldoc -f split). After reading each block of lines into each array element, separate the lines ( split /\n/), then split each non-initial line into "attribute" and "value" ( split /=/).
    • You can load your array of hashes while reading the file, so you don't end up with two copies of the data in memory and you don't need to do repeated loops over the same set of data.

    At the risk of short-cutting your exploration of the essential man pages, it could end up looking like this:

    my @AoH_attribs; open( FILE, "<$file" ); { local $/ = ''; # empty string is "magic" for "paragraph-mode" while (<FILE>) # read a group of lines into $_ { my %hash = (); my $itemname = ( /^\[(.*?)\]/ ) ? $1 : 'NO_KEY'; next if ( !/=/ or $hashkey eq 'NO_KEY' ); $hash{item} = $itemname; for my $line ( split /\n/ ) { next unless ( $line =~ /=/ ); my ( $attrib, $value ) = split /=/, $line; $hash{$attrib} = $value; } push @AoH_attribs, { %hash }; # note use of curly braces } } close FILE; # now, @AoH_attribs contains a list of hash refs, one for each input b +lock; # you can access each hash like this: for my $itemref ( @AoH_attribs ) { my %itemhash = %$itemref; print "$_ = $itemhash{$_}\n" for ( sort keys %itemhash ); print "\n"; }
    There are other ways as well.

    (update: fixed syntax error in code)

Re: Troubles with simple parsing
by bgreenlee (Friar) on Dec 19, 2004 at 03:12 UTC

    Your code is also not very efficient in that you're looping through the contents of the file four times. You also don't need to read the whole file into a list before processing it. Finally, you can take a shortcut and instead of checking individually for each valid key (e.g. "produce", "price", etc.), you can just take whatever is before the equals sign as a key and whatever is after as the value.

    open(FILE,"<$file") or die $!; # always check for the error condition! while (my $line = <FILE>) { if ($line =~ m/\[(.*)\]/) { push @art, { item => $1 }; } elsif ($line =~ m/^(\w+)\s*=\s*(.*?)\s*$/) { # the \s*'s ignore lead +ing and trailing whitespace. $art[-1]->{$1} = $2; # [-1] references the last item in the array; + i.e. what you just pushed on there } } close FILE;

    -b

Re: Troubles with simple parsing
by dws (Chancellor) on Dec 19, 2004 at 05:56 UTC

    Much depends on what you mean by "etc...". By calling your data structure "%hash" (which isn't very descriptive) and stopping right when things got interesting (i.e., how do you handle multiple values that share the key "item"), you've left the problem in an ambiguous state. Part of advancing past newbyness is learning to drive out ambiguity when thinking through what you need to do.

    Since in the data you've shown, several strings will end up sharing the "item" key, I suspect that you want individual anonymous hashes, one per group, each having a single "item" key. If this is true, then you want to parse the text into a structure that looks like

    my $data = [ { item => 'Athlon 4000+', price => '300 euro', ... }, { item => 'Celeron 3000', price => '200 euro', ... }, ... ];

    Does this help?

Re: Troubles with simple parsing
by Anonymous Monk on Dec 19, 2004 at 04:34 UTC
    Maybe it is a requirement that item must be a key with its value a chip name, (Athlon 4000+). Instead, the chip name could be a key to an anon hash of the three other attributes, producing results with min. code.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %chips; { local $/ = ""; while(<DATA>) { chomp; # Capture chip name in $1 and delete line with chip name s/^\[([^\]]+)\]\n// or die "error: ..."; $chips{$1} = {split /[\n=]/}; } } print Dumper \%chips; __DATA__ [Athlon 4000+] price=300 euro produce=AMD description=Fast [Celeron 3000] price=200 euro produce=Intel description=Slower
Re: Troubles with simple parsing
by CountZero (Bishop) on Dec 19, 2004 at 11:37 UTC
    On a more general level (the other Monks have already answered your question), remember that whenever you are working with data-structures such as array of hashes; hash of hashes, ... you can make good use of Data::Dumper to show you the data in your structures. Many times, errors which would totally stump you become obvious when you look at how the data is actually stored.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Troubles with simple parsing
by uksza (Canon) on Dec 19, 2004 at 15:42 UTC
    Thanks everybody for reply!
    Now I'm little smarter ;-)
    Thanks a lot for all

    Uksza
    P.S.
    TIMTOWTDI is saint sentence ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://415927]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-28 21:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found