http://qs321.pair.com?node_id=11100133


in reply to Re^2: processing file content as string vs array
in thread processing file content as string vs array

Good points.

but what if the final line is supposed to be processed by some other piece of code? You can't just ungetc a readline...

You are correct in that there is no "unget" or "un-read" for a line that has already been read. There are various ways of handling that sort of situation. In the case where the process() sub needs to deal with the first line, I pass that first line as a parameter to the process() sub. Usually these sorts of things are record oriented.... something has to be done with a record that was read and the process() sub's job is to assemble a complete record. If you want the code that "does something to the record" to be in the main driver, then just have process() return a structure or modify a struct ref that is passed in. I don't see any issue here at all. Can't use Perl's single action "if" in that situation, but I don't see any issue.

Also, note that your process_record is making use of a global variable, DATA, and three of your four examples will throw an undef warning if the end-of-file is reached before the closing line is seen.

As far as global DATA goes, I have no issue with that for a short (<1 page) piece of code. In a larger program I would pass a lexical file handle to the sub. Note: You can make a lexical file handle out of DATA like this: my $fh = *DATA; print while (<fh>); Pass $fh to the sub.

In almost all of the situations I deal with, throwing an error for a malformed file input is the correct behaviour. This is a usually good thing and the input file needs to be fixed. It is rare for me to throw away or silently ignore a malformed record. Of course "seldom" does not mean "never". It could certainly be argued that the program that doesn't throw an undef warning is in error! Of course the programs I demoed can be modified to have either behaviour.

I think a state machine type approach would be better, because it is more flexible and can handle the above cases specially, if needed.

I guess we disagree. I don't see any case for "more flexible". However, having said that, there is no real quibble on my part with having a state variable approach. Using a sub() to keep track of the "inside record" state is very clean. I actually think the Perl flip-flop operator is very cool. No problem with that either! When I use it, I have to go to Grandfather's classic post and look at the various start/end regex situations.

I often have to write "one-off" programs to convert wierd file formats. I will attach such a program that I wrote a few days ago. For such a thing, efficiency doesn't matter, "general purpose" doesn't matter - I will never see a file like this again. My job was to convert this file as part of a larger project. This is not "perfect" but it did its job.

#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); use Data::Dumper; $|=1; while (my $line = <DATA>) { process_record ($line) if $line =~ /^<CALL/; } sub process_record { my $line = shift; chomp $line; my $data = $line; while ( $line = <DATA>) { last if $line =~ /^<EOR/; chomp $line; $data .= $line; } my %hash = $data =~ /<(\w+):\d+>([\w. ]+)/g; print_Cabrillo_QSO (\%hash); } sub print_Cabrillo_QSO { my $Qref = shift; print "QSO: "; my $freq = $Qref->{FREQ}*100; $freq = int $freq; printf "%i6 ",$freq; print "PH "; my $date = $Qref->{QSO_DATE}; # 29180504 => 2019-05-04 $date =~ s/(\d\d\d\d)(\d\d)(\d\d)/$1-$2-$3/; print "$date "; my $time = $Qref->{TIME_ON}; $time =~ s/^(\d\d\d\d).*/$1/; print "$time "; print "W7RN 59 NVSTO "; printf "%15s ",$Qref->{CALL}; print "59 "; $Qref->{COMMENT}=~ s/ +//g; #assume next field is < print $Qref->{COMMENT}; # my $qth = $Qref->{QTH}; #$qth //= ''; #print $qth; print "\n"; } =Prints QSO: 3816 PH 2019-05-05 0659 W7RN 59 NVSTO W6LVW 5 +9 CO QSO: 3816 PH 2019-05-05 0657 W7RN 59 NVSTO K7CAR 5 +9 UTWSH =cut __DATA__ This ADIF file was created by MacLoggerDX <PROGRAMID:11>MacLoggerDX<PROGRAMVERSION:4>6.22<ADIF_VER:5>3.0.7 <EOH> <CALL:5>W6LVW<NAME:18>Michael J Sparling<QTH:8>MONUMENT<STATE:2>CO<CNT +Y:7>El Paso<QSO_DATE:8>20190505<TIME_ON:6>065952<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>070013 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:4>86.8<RST_SENT:2>59<RST_RCVD:2>59 <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM79nb<LAT:11>N039 0 +4.562<LON:11>W104 53.096 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +2>CO<EMAIL:19>mickspa@comcast.net <EOR> <CALL:5>K7CAR<NAME:13>Kent B O Sell<QTH:9>Hillsboro<STATE:2>OR<CNTY:10 +>Washington<QSO_DATE:8>20190505<TIME_ON:6>065758<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>065814 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:3>124<RST_SENT:2>59<RST_RCVD:2>59<QSL_VIA:10>eQSL +, LoTW <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM44ik<LAT:11>N034 2 +5.359<LON:11>W111 19.869 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +6>UT WSH<EMAIL:17>kent@premier1.net <EOR>