Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

help: extracting multiple lines from file based on match in one line

by my_perl (Initiate)
on Nov 16, 2004 at 22:15 UTC ( [id://408271]=perlquestion: print w/replies, xml ) Need Help??

my_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i have a file with different types of records in it in following format:
I need to extract all the records that have var2 = 3 into
different file and delte them from existing one. I know that every record that has var2 = 3 is 11 lines long.
So every time i find var2 = 3 i need to get one line before (record header) and next 10 lines into new file, and delete them from original file.
Please help :)
<Rec 1>
var2 = 5
some text
some text
some text
some text
</Rec 1>
<rec 2>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 2>
<Rec 3>
var2 = 7
some text
some text
some text
some text
some text
some text
</Rec 3>
<rec 4>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 4>
so output from this data would be
file_1
<Rec 1>
var2 = 5
some text
some text
some text
some text
</Rec 1>
<Rec 3>
var2 = 7
some text
some text
some text
some text
some text
some text
</Rec 3>
file_2
<rec 2>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 2>
<rec 4>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 4>
  • Comment on help: extracting multiple lines from file based on match in one line

Replies are listed 'Best First'.
Re: help: extracting multiple lines from file based on match in one line
by CountZero (Bishop) on Nov 16, 2004 at 22:48 UTC
    use strict; my $counter=1; $/="</Rec $counter>"; my $first_file; my $second_file; while (my $record=<DATA>) { $first_file .= $record unless $record=~m/var2 = 3/; $second_file .= $record if $record=~m/var2 = 3/; $counter++; $/="</Rec $counter>"; } print "FIRST FILE: $first_file\nSECOND FILE: $second_file\n";

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: help: extracting multiple lines from file based on match in one line
by olivierp (Hermit) on Nov 16, 2004 at 23:55 UTC
    This wouldn't delete lines from your original file, but at least splits it in two separate ones.
    Note that it relies on your input being formatted as your example data.
    use strict; open OF1, ">var2_equal_to_3.txt"; open OF2, ">var2_not_equal_to_3.txt"; my ($flipflop, $flag, $prev); while (<DATA>) { chomp; $flag = 1 if m/var2\s+=\s+3/i; $flipflop = /<rec\s+[0-9]+>/i .. /<\/rec\s+[0-9]+>/i; $prev = $_ if $flipflop == 1; if ($flag) { print OF1 "$prev\n" if $flipflop == 2; print OF1 "$_\n"; }else{ print OF2 "$_\n"; } $flag = $prev = undef if $flipflop =~ /E0$/; } close OF1; close OF2;
    HTH
    --
    Olivier
Re: help: extracting multiple lines from file based on match in one line
by TedPride (Priest) on Nov 17, 2004 at 10:01 UTC
    The problem is that you don't want to read the whole file into memory at once - yet the original file has to be modified. The following is a solution:
    use strict; my $match = 'var2 = 3'; my $in = 'in.txt'; my $tmp = 'temp.txt'; my $ext = 'extract.txt'; my ($inh, $tmph, $exth, $last); open($inh, $in); open($tmph, ">$tmp"); open($exth, ">$ext"); while (<$inh>) { if (index($_, $match) == -1) { print $tmph $last; $last = $_; next; } print $exth $last, $_; $last = ''; print $exth (scalar<$inh>) for (1..9); } close($inh); close($tmph); close($exth); unlink($in); rename($tmp, $in);
    The modified version of the original file is written to a temp file, then the original file is deleted and the temp file is renamed. The script works - I tested it using your sample input data.
      I tried this, and it worked perfect
      Thanks a bunch :)
Re: help: extracting multiple lines from file based on match in one line
by punch_card_don (Curate) on Nov 17, 2004 at 14:52 UTC
    If you don't mind reading the whole file into memory, then arrays are your friend.
    $record_length = 11; $string_sought = 'whatever'; open (SOURCEFILE, "file_and_path"); @SOURCEFILE_LINES = <SOURCEFILE>; close(SOURCEFILE); $m = 0; $n = 0; for $i (0 .. $#SOURCEFILE_LINES-$record_length) { if ($SOURCEFILE_LINES[$i+1] =~ m/$string_sought/) { for $j (0 .. $record_length-1) { $TYPE_1_LINES[$m] = $SOURCEFILE_LINES[$i+$j]; $m++; } $i = $i+$record_length; } else { $OTHER_LINES[$n] = $SOURCEFILE_LINES[$i]; $n++; } } finish off last few lines - if sought record not found by now, there c +an't be another one for $i ($#SOURCEFILE_LINES-$record_length+1 .. $#SOURCEFILE_LINES) { $OTHER_LINES[$n] = $SOURCEFILE_LINES[$i]; $n++; } #now write your two arrays to separate files, overwriting SOURCEFILE i +f you like with @OTHER_LINES.
Re: help: extracting multiple lines from file based on match in one line
by graff (Chancellor) on Nov 18, 2004 at 02:13 UTC
    Now that your basic problem is solved, I'm wondering: whose idea was it to invent this file format, and why was it made to be just sort of like -- but significantly different from -- XML?

    If the tagging for the data structure went like this, it would qualify as valid XML:

    <rec id=4> varFoo = bar some text some text ... </rec>
    Not only would you have the option of using some very handy and powerful XML modules and tools on the data, but you would also find it easy to do "one-liner" stuff using the constant-string close tag as the input record separator -- e.g.:
    # command line perl script to put all "var2 = 3" chunks into a separat +e file: perl -ne 'BEGIN{ $/="</rec>\n" } print if /var2 = 3/' input > var2_3.o +utput # just do the opposite (change "if" to "unless") to save the other chu +nks elsewhere
    As it is, with a space in every close tag, and all close tags being different, your data is not XML, and it's a pain in the neck.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://408271]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-19 18:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found