Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Feedback request for text file manipulator

by NovMonk (Chaplain)
on Mar 25, 2004 at 14:31 UTC ( #339743=perlquestion: print w/replies, xml ) Need Help??

NovMonk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Esteemed Monks,

I have been playing around today with a little program that will read in a series of lines from an input file (using __DATA__ below for simplicity), use some things on those lines to create some new lines, and write the whole thing back out to a new file. So for an input file like this:

line1 line2 n10total n01chocoloate;more stuff n01vanilla;more stuff etc...

I can get this:

line1 line2 n10total n01chocolate;more stuff n01vanilla;more stuff g total choco vanil g ----- ----- ----- p x x x

Suppose I now want it to be a little smarter. I want it to look at how many letters/ spaces are before the ; on each n01 line, and make the g and p lines conform to that, so I have instead:

g total chocolate vanilla g ----- --------- ------- p x x x

and I want to keep the whole thing from getting over a page width of 142 characters. Here's my code thus far. I'd appreciate comments on how I could make what I'm currently doing better, and on whether my thoughts below about how I want to tackle my new objective are anywhere near the right track. Note, I'm not looking for solutions, but feedback on what I've got and where I might look next to build on my understanding.

#!/usr/bin/perl use warnings; use strict; my (@text, @gcard, @pcard); my ($ntext, $ftext, $gtext, $ptext); open (OUT, ">ban.test") or die "Can't open ban.test for write: $!\n"; while (<DATA>){ if (/^n10/) { @text = "g total"; @gcard = "g -----"; @pcard = "p x "; print OUT "$_"; } elsif (/^n01/) { $ntext = (split (/;/)) [0]; $ntext =~ s/n01//g; $ftext = substr($ntext,0,5); $gtext = "-----"; $ptext = " x "; push(@text, $ftext); push(@gcard, $gtext); push(@pcard, $ptext); print OUT "$_"; } else { print OUT "$_"; } } print OUT "\n"; print OUT "@text\n"; print OUT "@gcard\n"; print OUT "@pcard\n"; __DATA__ line1 line2 n10total n01chocoloate;more stuff n01vanilla;more stuff

I'm thinking first I want to know how many n01 lines there are (if over a certain number, I probably want to restrict each ----- to something like 5 or 6 characters). To get that, I'm thinking I might read the input file into an array and count the n01 lines as they go by, then have my while loop above run through that array.

Otherwise, I want to know how many characters are in each $ntext as they are read, and adjust the number of characters for $gtext and $ptext accordingly. I think getting the number of characters would be something like

$x = length ($gtext);
and then using $x as the length of my strings for all 3 variables, which would then look something like this:

$ftext = substr($ntext,0,$x);

I realize that will require me to change how I'm dealing with $gtext and $ptext, but that doesn't seem too difficult. I imagine if the $ntext is over a certain length, I'll want to truncate it, same as if there are too many n01 lines. I suspect that will be putting some subroutines in my future, if nothing else.

I'm sure there are dozens of nifty modules that could do all this for me, but my aim is to do a bit of learning as I reinvent the wheel.

I'm sure it's obvious that I'm very new to this-- but I hope it's also apparent that I'm progressing. Thanks as always for the wise admonishments.

NovMonk

Replies are listed 'Best First'.
Re: Feedback request for text file manipulator
by Roy Johnson (Monsignor) on Mar 25, 2004 at 15:48 UTC
    Overall, it looks like you're writing pretty sensibly. Just a couple of minor things I noticed:

    These assignments:

    @text = "g total";
    should really have the right side in parentheses, IMO. It just makes me cringe to have mismatched assignments.
    $ntext = (split (/;/)) [0]; $ntext =~ s/n01//g;
    Do you really want a global replacement? It would be clearer to do this as one extraction:
    ($ntext) = /^n01([^;]+)/; # or /^n01(.+?);/;
    You've got useless (though not harmful) quoting here:
    print OUT "$_";
    and it could be written as just:
    print OUT;
    And since it's part of every branch of your if, you might as well just move it outside the whole conditional and get rid of the else clause.

    About your plans for future development. Instead of reading all the lines into an array, I would recommend you stop keeping track of @gcard and @pcard, since they never really change. They're entirely dependent on corresponding elements in @text. So just accumulate @text as usual (without any truncation, and put all your column-width processing into the printing section after the while loop. Calculate the sum of the lengths of text, determine how wide each column can be, and then format all your output accordingly.


    The PerlMonk tr/// Advocate
      Thanks a lot for the clear explanations. This in particular sent me to the various manuals and references until I could translate for myself why they do what you suggest:

      ($ntext) = /^n01([^;]+)/; # or /^n01(.+?);/;

      Tried to msg you with my thanks, but the space in your user name sent my comment to "roy" instead. Hopefully this will reach you.

      Pax,

      NovMonk

      Oh-- I msg'ed duff directly-- but he deserves public thanks as well, so here it is.

Re: Feedback request for text file manipulator
by duff (Parson) on Mar 25, 2004 at 16:11 UTC

    My comments:

    • While valid perl, it's not really best to have @array = "scalar".
    • You're storing stuff in @gcard that you can generate from the information in @text
    • @text, @gcard, and @pcard, $ntext, etc. aren't very descriptive variable names
    • You mention reading the input file into an array. It looks to me from what you've posted that you don't need to read the input into an array, but rather generate a data structure from the processed input. e.g., build a hash for each of your n01 lines where the keys are like "chocolate" and the values are whatever you've represented by "x" in your code.

    I know you said you weren't looking for solutions but here's a different way of doing essentially the same thing that will hopefully give you some clues:

    #!/usr/bin/perl use strict; use warnings; my (@header,%data); while (<DATA>) { if (/^n10/) { push @header, 'total'; } # will it always be to +tal? elsif (/^n01([^;]+);/) { push @header, $1; $data{$1} = "x"; } print; } $data{'total'} = "x"; # maybe compute a sum here instead :) print join(' ', "g", @header), "\n"; print join(' ', "g", map { '-' x length($_) } @header), "\n"; print join(' ', "p", map { sprintf("%*s ", length($_)-1, $data{$_}) } +@header), "\n"; __DATA__ line1 line2 n10total n01chocoloate;more stuff n01vanilla;more stuff

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://339743]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2023-09-30 14:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?