I know that ... some other module might have easier way to do this. But for now, I want to learn and apply HTML::Parser and regex ...
Ok, so you're committed to drilling all those holes in your head just to prove to yourself for sure that drilling holes in your head is a bad idea. Here's one approach:
c:\@Work\Perl\monks>perl -wMstrict -le
"use warnings;
use strict;
;;
use Regexp::Common;
;;
use Data::Dump qw(dd);
;;
my @lines = (
'Summary</h1><table border=\"1\"><tr><th>Employee John Doe</th><th>
+-0.82</th>',
'Summary</h1><table border=\"1\"><tr><th> Employee Fred D. Poe </th
+><th> -5.03 </th>',
'Summary</h1><table border=\"1\"><tr><th>Employee Billy-Bob Toe</th
+><th> </th>',
'Summary</h1><table border=\"1\"><tr><th>Employee</th><th>999</th>'
+,
'<th>Employee Prince </th><th> 123</th>',
'<th>Employee O</th><th> 1.23 </th>',
);
;;
my $rx_name = qr{ \S+? (?: \s+ \S+)*? }xms;
my $rx_th_open = qr{ \s* < th > \s* }xms;
my $rx_th_close = qr{ \s* < / th > \s* }xms;
;;
my %per_employee;
;;
LINE:
for my $line (@lines) {
my $parsed =
my ($name, $amount) = $line =~ m{
$rx_th_open Employee \s+ ($rx_name) $rx_th_close
$rx_th_open ($RE{num}{real})? $rx_th_close
}xms;
;;
if (not $parsed) {
warn qq{'$line' failed to parse};
next LINE;
}
;;
$amount = 'no amount' unless defined $amount;
$per_employee{$name} = $amount;
}
;;
dd \%per_employee;
"
'Summary</h1><table border="1"><tr><th>Employee</th><th>999</th>' fail
+ed to parse at -e line 1.
{
"Billy-Bob Toe" => "no amount",
"Fred D. Poe" => "-5.03",
"John Doe" => "-0.82",
O => "1.23",
Prince => 123,
}
(Note that the
$rx_name regex for an actual, human name is
very naive.
(Update: See off-site Falsehoods Programmers Believe About Names.))
Update: Significant changes to example code: $rx_th_open $rx_th_close regexes made more elegant (?); added rudimentary error handling; added corner and error test cases.
Give a man a fish: <%-{-{-{-<