help: extracting multiple lines from file based on match in one line

my_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i have a file with different types of records in it in following format:
I need to extract all the records that have var2 = 3 into
different file and delte them from existing one. I know that every record that has var2 = 3 is 11 lines long.
So every time i find var2 = 3 i need to get one line before (record header) and next 10 lines into new file, and delete them from original file.
Please help :)
<Rec 1>
var2 = 5
some text
some text
some text
some text
</Rec 1>
<rec 2>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 2>
<Rec 3>
var2 = 7
some text
some text
some text
some text
some text
some text
</Rec 3>
<rec 4>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 4>
so output from this data would be
file_1
<Rec 1>
var2 = 5
some text
some text
some text
some text
</Rec 1>
<Rec 3>
var2 = 7
some text
some text
some text
some text
some text
some text
</Rec 3>
file_2
<rec 2>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 2>
<rec 4>
var2 = 3
some text
some text
some text
some text
some text
some text
some text
some text
</Rec 4>

Comment on help: extracting multiple lines from file based on match in one line

Replies are listed 'Best First'.
Re: help: extracting multiple lines from file based on match in one line by CountZero (Bishop) on Nov 16, 2004 at 22:48 UTC
`use strict; my $counter=1; $/="</Rec $counter>"; my $first_file; my $second_file; while (my $record=<DATA>) { $first_file .= $record unless $record=~m/var2 = 3/; $second_file .= $record if $record=~m/var2 = 3/; $counter++; $/="</Rec $counter>"; } print "FIRST FILE: $first_file\nSECOND FILE: $second_file\n";` [download] Read more... (623 Bytes) CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re: help: extracting multiple lines from file based on match in one line by olivierp (Hermit) on Nov 16, 2004 at 23:55 UTC
This wouldn't delete lines from your original file, but at least splits it in two separate ones. Note that it relies on your input being formatted as your example data. `use strict; open OF1, ">var2_equal_to_3.txt"; open OF2, ">var2_not_equal_to_3.txt"; my ($flipflop, $flag, $prev); while (<DATA>) { chomp; $flag = 1 if m/var2\s+=\s+3/i; $flipflop = /<rec\s+[0-9]+>/i .. /<\/rec\s+[0-9]+>/i; $prev = $_ if $flipflop == 1; if ($flag) { print OF1 "$prev\n" if $flipflop == 2; print OF1 "$_\n"; }else{ print OF2 "$_\n"; } $flag = $prev = undef if $flipflop =~ /E0$/; } close OF1; close OF2;` [download] Read more... (627 Bytes) HTH -- Olivier	[reply] [d/l] [select]
Re: help: extracting multiple lines from file based on match in one line by TedPride (Priest) on Nov 17, 2004 at 10:01 UTC
The problem is that you don't want to read the whole file into memory at once - yet the original file has to be modified. The following is a solution: `use strict; my $match = 'var2 = 3'; my $in = 'in.txt'; my $tmp = 'temp.txt'; my $ext = 'extract.txt'; my ($inh, $tmph, $exth, $last); open($inh, $in); open($tmph, ">$tmp"); open($exth, ">$ext"); while (<$inh>) { if (index($_, $match) == -1) { print $tmph $last; $last = $_; next; } print $exth $last, $_; $last = ''; print $exth (scalar<$inh>) for (1..9); } close($inh); close($tmph); close($exth); unlink($in); rename($tmp, $in);` [download] The modified version of the original file is written to a temp file, then the original file is deleted and the temp file is renamed. The script works - I tested it using your sample input data.	[reply] [d/l]
Re^2: help: extracting multiple lines from file based on match in one line by my_perl (Initiate) on Nov 17, 2004 at 21:03 UTC
I tried this, and it worked perfect Thanks a bunch :)	[reply]
Re: help: extracting multiple lines from file based on match in one line by punch_card_don (Curate) on Nov 17, 2004 at 14:52 UTC
If you don't mind reading the whole file into memory, then arrays are your friend. $record_length = 11; $string_sought = 'whatever'; open (SOURCEFILE, "file_and_path"); @SOURCEFILE_LINES = <SOURCEFILE>; close(SOURCEFILE); $m = 0; $n = 0; for $i (0 .. $#SOURCEFILE_LINES-$record_length) { if ($SOURCEFILE_LINES[$i+1] =~ m/$string_sought/) { for $j (0 .. $record_length-1) { $TYPE_1_LINES[$m] = $SOURCEFILE_LINES[$i+$j]; $m++; } $i = $i+$record_length; } else { $OTHER_LINES[$n] = $SOURCEFILE_LINES[$i]; $n++; } } finish off last few lines - if sought record not found by now, there c +an't be another one for $i ($#SOURCEFILE_LINES-$record_length+1 .. $#SOURCEFILE_LINES) { $OTHER_LINES[$n] = $SOURCEFILE_LINES[$i]; $n++; } #now write your two arrays to separate files, overwriting SOURCEFILE i +f you like with @OTHER_LINES. [download]	[reply] [d/l]
Re: help: extracting multiple lines from file based on match in one line by graff (Chancellor) on Nov 18, 2004 at 02:13 UTC
Now that your basic problem is solved, I'm wondering: whose idea was it to invent this file format, and why was it made to be just sort of like -- but significantly different from -- XML? If the tagging for the data structure went like this, it would qualify as valid XML: `<rec id=4> varFoo = bar some text some text ... </rec>` [download] Not only would you have the option of using some very handy and powerful XML modules and tools on the data, but you would also find it easy to do "one-liner" stuff using the constant-string close tag as the input record separator -- e.g.: `# command line perl script to put all "var2 = 3" chunks into a separat +e file: perl -ne 'BEGIN{ $/="</rec>\n" } print if /var2 = 3/' input > var2_3.o +utput # just do the opposite (change "if" to "unless") to save the other chu +nks elsewhere` [download] As it is, with a space in every close tag, and all close tags being different, your data is not XML, and it's a pain in the neck.	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks