Sorting and subsituting a data file, one pass

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

I can do this with 2 passes over the file, but I was looking for a way to do this in one pass as files are very big. This is what I intend to do. Imagine a txt file with the following data

Some garbage
More garbage
data -start <some string> \
     -intermediate <some string> \
     -intermadiate <some string> \
     .
     .
     -end <some string>
Some garbage
More garbage
data -start <some string> \
     -intermediate <some string> \
     -intermadiate <some string> \
     .
     .
     -end <some string>
Some garbage
More garbage
data -start <some string> \
     -intermediate <some string> \
     -intermadiate <some string> \
     .
     .
     -end <some string>
.
.
.
[download]

I want the output file to contain

data -start <string> -end <string>
data -start <string> -end <string>
data -start <string> -end <string>
.
.
.
[download]

The catch? After removing intermediates, there will be lots of duplicates, which I want to remove. In my current flow, I read in the file, write out an array, and then unique the array 2 pass process seems to be a waste of time. If I can get a one pass algo, it will be great!

Comment on Sorting and subsituting a data file, one pass Select or Download Code

Replies are listed 'Best First'.
Re: Sorting and subsituting a data file, one pass by ikegami (Patriarch) on Jun 21, 2010 at 06:05 UTC
`my %seen; while (<>) { if (s/\\\n\z/ /) { my $next = <>; if (defined($next)) { if ($next =~ /^\s*-end\s/) { $_ .= $next; } elsif ($next =~ /(\\\n)\z/) { $_ .= "\\\n"; } else { $_ .= "\n"; } redo; } } print if /^data\s/ && !$seen{$_}++; }` [download]	[reply] [d/l]
Re^2: Sorting and subsituting a data file, one pass by tsk1979 (Scribe) on Jun 21, 2010 at 09:38 UTC
This should be the solution. But I am stumped at <>. Won't this stop for user input at every stage?	[reply]
Re^3: Sorting and subsituting a data file, one pass by Anonymous Monk on Jun 21, 2010 at 10:42 UTC
Only if you don't pass filenames, or redirect a file `$ perl myprogram.pl file1 file2 file3 $ myprogram.pl < file4` [download]	[reply] [d/l]
Re: Sorting and subsituting a data file, one pass by CountZero (Bishop) on Jun 21, 2010 at 06:27 UTC
Go through your data file line by line, assembling your `data -start <string> -end <string>` as you go along. Once each item is assembled store it as the key of a hash (the value can be anything you like or left empty). Duplicate hash keys will disappear automatically and you can then sort the keys. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks