pissflaps has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks! I was placed in charge of a ticket-alert system written and perl and cannot get past half of this code.
I have been trying to split a string of lines into variables representing each delimited word within the line. If that is unclear, maybe a visual representation will help:
A flat-file DB system is sitting on an HTML page.Tickets are formatted like such:
(All on one line)
<TR><TD> 371540 | </TD><TD>4/07/2011 | </TD><TD>08:03 | </TD><TD>11
+:03 | </TD><TD>2 | </TD><TD>Company Name(MAIN SITE) | </TD><TD>
+DB PURGE | </TD><TD> </TD></TR> <TR><TD>
which was generated from splitting the input values for creating the tickets. I assign the input into an array and regex off the markup and spaces.
I'm now left with something like this:
371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE|
and want to assign variables to each word in this format:
$ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments
This way, I can access the variables and email alerts based on the time variables to be compared to the current time.
Where I'm having trouble seems to be around the following segment of code. Any help or advice would be GREATLY appreciated, since I am very new to Perl and have been debugging this script line-by-line with warnings and just can't figure out some of the functions I'm applying.
#!C:/Perl/bin/perl.exe
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use Net::SMTP;
use Data::Dumper::Simple;
use warnings;
use diagnostics;
open(DB, '<', "C:/Inetpub/wwwroot/DBase.htm") || die "Error: $!\n";
our ($i, $line);
my @arr = <DB>;
splice (@arr, 0, 14); #this trims off all the html setup and t
+able markup
close (DB);
foreach $line(@arr){
$line =~ s/\|//g;
$line =~ s/<\/TD><\/TR><TR><TD>/\n/g;
$line =~ s/<\/TD><TD>/\|/g;
$line =~ s/ //g;
$line =~ s/<\/TD><\/TR>//g;
$line =~ s/<TR><TD>//g;
chomp($line);
}
my $lines = \@arr;
my ($Ehour, $Emin);
for $i (0 .. $#arr){
$line[$i] = $arr[$i];
}
my @lines = ($line[0], $line[1], $line[2], $line[3], $line[4]
+, $line[5], $line[6]);
print (Dumper (@lines)); #shows all lines are in the array pro
+perly.
my ($ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments)
= split(/\|/,$line[0]);
print "$line->[0]\n"; #prints "371225 |3/23/2011|16:34 | 19:34 |2 |Com
+pany Name ||
You can see that I'm able to split a single line into the variables, but I want to iterate over every line in the $line string to place these variables onto the data.
Am I totally setting myself up for failure, or is there a better way to do this?
Re: Spliting a delimited string into variables
by jethro (Monsignor) on Apr 07, 2011 at 17:07 UTC
|
You print $line->[0] which is quite different from $line[0]. How this prints "371225..." is beyond me.
You also do lots of strange things to my eye, but that is to be expected ;-). For example to copy @arr into @line (for which you used a loop) "@arr= @line;" would have been enough. The same goes for "my @lines= ($line[0]..."
Also using lots of variables called $line, $lines, @line, @lines makes any program into an entry into the obfuscation contest. Differentiate your variables better
What you have been looking for might be this:
# starting with @arr having all the lines
my @ticker;
my @names=('ticket','dateadded','stime','etime','pri','SiteName','comm
+ents');
my $i=0;
foreach my $line (@arr) {
my @items= split(/\|/, $line;
print "ETime is $items[3]\n";
$ticker[$i]{$names{$_}}= $items[$_] for scalar @names;
$i++;
}
# now $ticker[3]{SiteName} is SiteName of the fourth line
Note @ticker and $i are only neccessary if you want to operate on the data after the loop. If you just want to print or store the stuff, do that in the foreach loop above. You can also just use the split line you have in your script to fill the variables with the data inside the loop instead of using @items if it suits you better
| [reply] [d/l] [select] |
|
This is really close to where I was going. I can't seem to initialize {$names{$_}} due to the curly braces. Is this assigning everything into a hash table? If so, I'm receiving an error about either $names or $items that it isn't initialized, which is very similar to the errors my original code pulled. Is there maybe a different notation to assign everything into a hash while still initializing the variables?
| [reply] [d/l] |
|
my %hash= (0, 'ticket', 1 => 'dateadded');
{$names{$_}} on the other hand is not a really useful expression in perl, %{$names{$_}} refers to a hash whose reference is stored as value in another hash (i.e %names). A multidimensional hash or HashOfHashes in perlspeak
If what you wanted was a Hash of Hashes, the syntax looks like this:
$names{$_}{key}= 'value';
#or to initialize with an array:
%{$names{$_}} = ('key1' => 'value1','key2' => 'value2');
if you want further information about HoH (Hash of Hashes), you might want to read perldsc and perllol
| [reply] [d/l] [select] |
|
|
|
my $line="371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB P
+URGE|"
if ($line =~ /(\d+)\|([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4})\|([0-9]{1,2}\:
+[0-9]{2})\|([0-9]{1,2}\:[0-9]{2})\|(\d+)\|((?:\w+)\s\w+\s(?:\()?\w+\s
+\w+\))\|(\w+(?:\s)?\w+)/ )
{
$item1=$1;
$item2=$2;
...
$item7=$7
}
Note that the regexp has to be adapted.
The parentheses captures their content.
First parenthese will be memorised as $1,
Second parenthese will be memorised as $2
etc...
| [reply] [d/l] |
Re: Spliting a delimited string into variables
by sundialsvc4 (Abbot) on Apr 07, 2011 at 17:36 UTC
|
Personally, I am a very big fan of using high-level tools such as HTML::Parser to do as much “heavy lifting” as possible.
My rationale is that: any HTML document does have a known structure (even if it is not obliged to adhere to it strictly, in actual practice), and that, “anytime you are dealing with a complex document having any known structure, the best way to deal with such a thing is to use a parser.”
There are many, many good parsing engines in Perl. One that I recently had the privilege of beating to a bloody pulp (wink... it proceeded to do everything I asked it to, and more!!) was Parse::RecDescent. (I am still in awe of its author!) But in this case, the source-language is “simply HTML,” and HTML-specific tools abound.
All parsing engines are, so to speak, “engines that are really, really good at character-twiddling and which know the lay of the land.” You rely upon them to go about their business and to call your code at strategic points, and to return data structures to you at those times.
This is, IMHO, a much stronger strategy than “regex hell,” which often yields solutions that work fine in initial test-cases but then require constant twiddling and head-banging. Let the CPAN-authors do as much banging on your behalf as possible. It will not, of course, eliminate the considerable amount of work that still remains to be done, but it might well make that work vastly easier.
HTH ...
| |
|
I've got to second the vote for HTML::Parser or similar parsing engines.
A long time ago, before RSS feeds, I wrote a program to parse various newspaper websites and did the regexes by hand. I had 24 different rules for 90+ papers. When I rewrote it, I got it down to 9 rules, mainly based on web page design, since I used a parsing engine.
You're going to save yourself a ton of work since if the data changes you're going to have to rewrite your regexes each time.
To disagree, one doesn't have to be disagreeable - Barry Goldwater
| [reply] |
|
Thank you for such an informative response! I'll be sure to look into more about Parse::RecDescent, but for now that may be too daunting to pick up for a novice. Is there an example using HTML::Parser you could describe for using in this situation? I'm unfamiliar with basically any module outside of CGI. :(
| [reply] |
Re: Spliting a delimited string into variables
by Nikhil Jain (Monk) on Apr 07, 2011 at 17:11 UTC
|
use strict;
use warnings;
use Data::Dumper;
my @data = ("371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|D
+B PURGE1|",
"371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE1|",
"371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE1|",
"371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE1|",
"371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE1|");
print Dumper(@data);
foreach my $field (@data){
my ($ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments) = s
+plit(/\|/,$field);
print"$ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments\n"
+;
}
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
Re: Spliting a delimited string into variables
by pissflaps (Initiate) on Apr 20, 2011 at 16:16 UTC
|
I've since somewhat resolved the issues I was having and resorted learning all about making an array of arrays while in a for loop. Thank you all for your magnificent help and resources! I couldn't have figured any of this out without you.
| [reply] |
|
|