I was asked today to extract some data from some html files ("hey Murray you can probably do this pretty quickly with Perl..."). Once I found out it was a definite once off I got to have some fun :)
The html was pretty horrible, lots of tags and no line breaks in the whole file. I slurped it into a scalar and started hacking an re to grab the first wanted bit, thinking I would have three re's, one for each piece of wanted data, but I noticed that the data was basically in pairs with blobs of html in between. So the re's would be simpler if I removed the html first...
A quick super search (I'm lazy) and I had an re to strip html (thanks Juerd++). Then the lights went fully on and less than 30 seconds later I had everything I wanted:
$temp =~ s/<(?:[^>'"]*|(['"]).*?\1)*>/\n/gs;
my %temp = grep length, split /\n/, $temp;
print "$temp{"Serial Number:"}, $temp{"Model:"}, $temp{"CPU:"}\n";
Yeah I know I was lucky, there were no blank fields, there was an even number of "bits" before the data bits and none after. But it was meant to be a quick and dirty one off :) My colleague was suitably impressed with the speed of the response too!
--
Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho