Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Help with parsing a file

by GrandFather (Saint)
on May 28, 2022 at 23:59 UTC ( [id://11144261]=note: print w/replies, xml ) Need Help??


in reply to Help with parsing a file

ThereareabunchofissueswithyourcodethatI'llmentioninpassingtohelpyoutowardPerlish style programming instead of C style. The first issue is an almost complete lack of optional white space which I find hard to read. Use white space as you would for writing prose - that's probably what people read most of and what brains are trained to parse, so keep it simple for brains.

An immediate issue is that you don't show how you parse your input data so we can't tell what is in $row. That means we don't know what is in @face_ac and the line pushing @temp into it looks dubious to me. So lets throw all of that away to start with and build something new.

First, we want this to be a small self contained correct example so we start off with strictures and some baked in data. There is a hint that you know this, but always use strictures (use strict; use warnings; - see The strictures, according to Seuss).

use strict; use warnings; my $fileStr = <<STR; F001 1.2 F101 3.2 solvent1 0 solvent2 3 F001 2.2 F101 7.2 solvent1 5 solvent2 0 STR open my $fIn, '<', $fileStr or die "Couldn't open \$fileStr: $!\n";

This adds strictures, provides sample data as though it were in an external file and opens an input file handle to it. Now set up a loop to parse the input data. Perl allows us to tell it what constitutes an end of line character sequence so we take advantage of that to read the data one record at a time:

# Look for the empty line between records local $/ = "\n\n"; while (defined (my $record = <$fIn>)) {

Parse the lines. Note that %recordData is declared inside the loop because we don't need it outside the loop or before the loop. Always declare variables in the smallest scope and initialize them when they are declared if appropriate (arrays and hashes are empty by default so usually they don't need to be initialized). You are familiar with split already, but grep and map may be new. Pop off and skim their documentation. In this case we are using grep to remove empty lines and map to generate a key value pair for each line. Then we use grep to build a list of solvents and a list of Fs:

my %recordData = map{split /\s+/, $_} grep {length $_} split "\n", + $record; my @solvents = grep {/^solvent\d+/} keys %recordData; my @fractions = grep {/^F\d+/} keys %recordData;

Now we can find the solvent with the zero value. We assume there is one and only one. There could be error checking around this, but I'm skipping it for now. Note that grep operates on a list and generates a list so $zeroSolvent needs to in list context so the value of the first element of the list generated by grep is assigned to it:

my ($zeroSolvent) = grep {!$recordData{$_}} @solvents;

and now we can generate the report for the record:

print "${zeroSolvent}_$_ => $recordData{$_}\n" for @fractions; }

That prints:

solvent1_F101 => 3.2 solvent1_F001 => 1.2 solvent2_F101 => 7.2 solvent2_F001 => 2.2

The code above concatenated together is:

use strict; use warnings; my $fileStr = <<STR; F001 1.2 F101 3.2 solvent1 0 solvent2 3 F001 2.2 F101 7.2 solvent1 5 solvent2 0 STR open my $fIn, '<', \$fileStr or die "Couldn't open \$fileStr: $!\n"; # Look for the empty line between records local $/ = "\n\n"; while (defined (my $record = <$fIn>)) { my %recordData = map{split /\s+/, $_} grep {length $_} split "\n", + $record; my @solvents = grep {/^solvent\d+/} keys %recordData; my @fractions = grep {/^F\d+/} keys %recordData; my ($zeroSolvent) = grep {!$recordData{$_}} @solvents; print "${zeroSolvent}_$_ => $recordData{$_}\n" for @fractions; }

There may be follow up questions. :-D

This is not the solution that a person with experience in other programming languages might come up with first off, but it's worth exploring in detail because tools such as grep and map can clean up code something wonderful (they can also obscure code something dreadful).

Update: I should note that "${zeroSolvent}_$_ => $recordData{$_}\n" use variable interpolation. Perl expands the contents of variables used inside double quoted strings. The ${zeroSolvent} bit lets us use the variable zeroSolvent with an underscore character following it in the string without Perl seeing zeroSolvent_ as the variable name instead.

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

Replies are listed 'Best First'.
Re^2: Help with parsing a file
by Odar (Novice) on May 29, 2022 at 18:57 UTC

    Thank you very much for your help and pointing out my mistakes and omissions, I am learning a lot from this type of feedback. I have added some additional description about the format of the data file (e.g. the data blocks are not separated by an empty line but by three lines of text with an empty line above and bellow) and also updated my code to show how I was trying to parse the file.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144261]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-04-23 11:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found