parsing with regex

2501 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing with regex by dfog (Scribe) on Nov 16, 2001 at 03:51 UTC
If everything is in a single variable you could do it in one line like `#!perl my $Data=<<HTMLend; <HR> 1 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 2 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 3 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 4 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> HTMLend @Results = grep {$_ =~ /is good/i} split (/<HR>/, $Data); $" = "\n\n"; print "@Results";` [download] Dave	[reply] [d/l]
Re: parsing with regex by Sifmole (Chaplain) on Nov 16, 2001 at 02:46 UTC
#!/usr/bin/perl -w use strict; my %foo; while (<DATA>) { $foo{$1}++ if (/(\d+) is good/) } print "$_ :: $foo{$_} \n" foreach (keys %foo); __DATA__ <HR> 1 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 2 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 3 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 4 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> [download]	[reply] [d/l]
Re: Re: parsing with regex by 2501 (Pilgrim) on Nov 16, 2001 at 03:11 UTC
Sorry, I probably wasn't clear enough. I would need everything between the horizontal breaks which would be: `1 is good<BR> data unique to 1<BR> data unique to 1<<BR> data unique to 1<<BR> data unique to 1<<BR> 3 is good<BR> data unique to 3<<BR> data unique to 3<<BR> data unique to 3<<BR> data unique to 3<<BR>` [download] Thank you, 2501	[reply] [d/l]
Re: Re: Re: parsing with regex by Sifmole (Chaplain) on Nov 16, 2001 at 18:39 UTC
Okay how about this? #!/usr/bin/perl -w use strict; my @foo; $/=""; $_ = <DATA>; while (s/(\d+ is good.*?)<HR>//s) { push @foo, $1; } print $_, "\n--------------\n" foreach (@foo); __DATA__ <HR> 1 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 2 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 3 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 4 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> [download]	[reply] [d/l]
Re: parsing with regex by mitd (Curate) on Nov 16, 2001 at 09:02 UTC
Well I'll have a go: #!/bin/perl -w use strict; # slurp it up $/=''; my $slurp = <DATA>; # one nice string might as well split it # adding a little whitespace gobble and case protection my @stuff = split(/\s<\s[Hh][Rr]\s>\s/,$slurp); # and spit it out foreach (@stuff) { print $_,"\n"; } __DATA__ <HR> 1 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 2 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 3 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 4 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> [download] mitd-Made in the Dark 'Interactive! Paper tape is interactive! If you don't believe me I can show you my paper cut scars!'	[reply] [d/l]
Re: parsing with regex by YuckFoo (Abbot) on Nov 16, 2001 at 03:39 UTC
I'm sure someone will post an efficient regex, but in the meantime you can try this. Still a bit ugly, lines are joined then split on the HR tags. YuckFoo #!/usr/bin/perl use strict; my ($line, @keep); for $line ((split(/<HR>\s+/s, join('', (<DATA>))))) { if ($line =~ m{\d+\s+is\s+good}) { push (@keep, $line); } } for $line (@keep) { print "$line\n"; } __DATA__ <HR> 1 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 2 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 3 is good<BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> 4 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> [download]	[reply] [d/l]
Re: parsing with regex by mkmcconn (Chaplain) on Nov 17, 2001 at 01:02 UTC
Everyone is so much quicker than me. Oh well, here's my attempt for what it's worth, constructed to handle some (by no means all) HTML-legal variations in the text. #!/usr/bin/perl -w use strict; $/ = ''; my %h; while (<DATA>){ while ( s/((\d+) is good.+?)<(?:hr\|HR)>//s ){ my $good = $1; my $key = $2; $good =~ s/\n?\s?<(?:BR\|br).?.?>\n?/\|/g; my @pot = split /\\|/, $good; shift @pot; $h{$key} = [@pot]; } } use Data::Dumper; print Data::Dumper->Dump([\%h],[qw(*h)]); __DATA__ <HR> 1 is good<BR> useless data<BR>useless data<BR> useless data <BR>useless data<BR> <hr> 2 is good<BR> useless data<br> useless data<BR> useless data<br> useless data<BR> <hr> 3 is not good <BR> useless data <br />useless data<br />useless data<BR> useless data<BR> <HR> 4 is good<BR> useless data<BR>useless data<BR> useless data<br>useless data<BR> <HR> 5 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR> [download] By the way, you asked your question very well and complete with a good data example. It's appreciated. (better Data::Dumper, thanks to sacked and tilly). mkmcconn	[reply] [d/l]


We don't bite newbies here... much
	PerlMonks