Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Good points.

but what if the final line is supposed to be processed by some other piece of code? You can't just ungetc a readline...

You are correct in that there is no "unget" or "un-read" for a line that has already been read. There are various ways of handling that sort of situation. In the case where the process() sub needs to deal with the first line, I pass that first line as a parameter to the process() sub. Usually these sorts of things are record oriented.... something has to be done with a record that was read and the process() sub's job is to assemble a complete record. If you want the code that "does something to the record" to be in the main driver, then just have process() return a structure or modify a struct ref that is passed in. I don't see any issue here at all. Can't use Perl's single action "if" in that situation, but I don't see any issue.

Also, note that your process_record is making use of a global variable, DATA, and three of your four examples will throw an undef warning if the end-of-file is reached before the closing line is seen.

As far as global DATA goes, I have no issue with that for a short (<1 page) piece of code. In a larger program I would pass a lexical file handle to the sub. Note: You can make a lexical file handle out of DATA like this: my $fh = *DATA; print while (<fh>); Pass $fh to the sub.

In almost all of the situations I deal with, throwing an error for a malformed file input is the correct behaviour. This is a usually good thing and the input file needs to be fixed. It is rare for me to throw away or silently ignore a malformed record. Of course "seldom" does not mean "never". It could certainly be argued that the program that doesn't throw an undef warning is in error! Of course the programs I demoed can be modified to have either behaviour.

I think a state machine type approach would be better, because it is more flexible and can handle the above cases specially, if needed.

I guess we disagree. I don't see any case for "more flexible". However, having said that, there is no real quibble on my part with having a state variable approach. Using a sub() to keep track of the "inside record" state is very clean. I actually think the Perl flip-flop operator is very cool. No problem with that either! When I use it, I have to go to Grandfather's classic post and look at the various start/end regex situations.

I often have to write "one-off" programs to convert wierd file formats. I will attach such a program that I wrote a few days ago. For such a thing, efficiency doesn't matter, "general purpose" doesn't matter - I will never see a file like this again. My job was to convert this file as part of a larger project. This is not "perfect" but it did its job.

#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); use Data::Dumper; $|=1; while (my $line = <DATA>) { process_record ($line) if $line =~ /^<CALL/; } sub process_record { my $line = shift; chomp $line; my $data = $line; while ( $line = <DATA>) { last if $line =~ /^<EOR/; chomp $line; $data .= $line; } my %hash = $data =~ /<(\w+):\d+>([\w. ]+)/g; print_Cabrillo_QSO (\%hash); } sub print_Cabrillo_QSO { my $Qref = shift; print "QSO: "; my $freq = $Qref->{FREQ}*100; $freq = int $freq; printf "%i6 ",$freq; print "PH "; my $date = $Qref->{QSO_DATE}; # 29180504 => 2019-05-04 $date =~ s/(\d\d\d\d)(\d\d)(\d\d)/$1-$2-$3/; print "$date "; my $time = $Qref->{TIME_ON}; $time =~ s/^(\d\d\d\d).*/$1/; print "$time "; print "W7RN 59 NVSTO "; printf "%15s ",$Qref->{CALL}; print "59 "; $Qref->{COMMENT}=~ s/ +//g; #assume next field is < print $Qref->{COMMENT}; # my $qth = $Qref->{QTH}; #$qth //= ''; #print $qth; print "\n"; } =Prints QSO: 3816 PH 2019-05-05 0659 W7RN 59 NVSTO W6LVW 5 +9 CO QSO: 3816 PH 2019-05-05 0657 W7RN 59 NVSTO K7CAR 5 +9 UTWSH =cut __DATA__ This ADIF file was created by MacLoggerDX <PROGRAMID:11>MacLoggerDX<PROGRAMVERSION:4>6.22<ADIF_VER:5>3.0.7 <EOH> <CALL:5>W6LVW<NAME:18>Michael J Sparling<QTH:8>MONUMENT<STATE:2>CO<CNT +Y:7>El Paso<QSO_DATE:8>20190505<TIME_ON:6>065952<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>070013 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:4>86.8<RST_SENT:2>59<RST_RCVD:2>59 <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM79nb<LAT:11>N039 0 +4.562<LON:11>W104 53.096 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +2>CO<EMAIL:19>mickspa@comcast.net <EOR> <CALL:5>K7CAR<NAME:13>Kent B O Sell<QTH:9>Hillsboro<STATE:2>OR<CNTY:10 +>Washington<QSO_DATE:8>20190505<TIME_ON:6>065758<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>065814 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:3>124<RST_SENT:2>59<RST_RCVD:2>59<QSL_VIA:10>eQSL +, LoTW <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM44ik<LAT:11>N034 2 +5.359<LON:11>W111 19.869 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +6>UT WSH<EMAIL:17>kent@premier1.net <EOR>

In reply to Re^3: processing file content as string vs array by Marshall
in thread processing file content as string vs array by vinoth.ree

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (6)
    As of 2020-10-30 10:49 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My favourite web site is:












      Results (278 votes). Check out past polls.

      Notices?