Spliting a delimited string into variables

pissflaps has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! I was placed in charge of a ticket-alert system written and perl and cannot get past half of this code. I have been trying to split a string of lines into variables representing each delimited word within the line. If that is unclear, maybe a visual representation will help: A flat-file DB system is sitting on an HTML page.Tickets are formatted like such: (All on one line)

<TR><TD> 371540 |  </TD><TD>4/07/2011 |  </TD><TD>08:03 |  </TD><TD>11
+:03 |  </TD><TD>2 |  </TD><TD>Company Name(MAIN SITE)    |  </TD><TD>
+DB PURGE |  </TD><TD> </TD></TR> <TR><TD>
[download]

which was generated from splitting the input values for creating the tickets. I assign the input into an array and regex off the markup and spaces. I'm now left with something like this:

371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE| and want to assign variables to each word in this format: $ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments

This way, I can access the variables and email alerts based on the time variables to be compared to the current time. Where I'm having trouble seems to be around the following segment of code. Any help or advice would be GREATLY appreciated, since I am very new to Perl and have been debugging this script line-by-line with warnings and just can't figure out some of the functions I'm applying.

#!C:/Perl/bin/perl.exe
use CGI qw(:standard);
 use CGI::Carp qw(fatalsToBrowser);
use Net::SMTP;
use Data::Dumper::Simple;
use warnings;
use diagnostics;

open(DB, '<', "C:/Inetpub/wwwroot/DBase.htm") || die "Error: $!\n";
        
        our ($i, $line);
        my @arr = <DB>;
        splice (@arr, 0, 14); #this trims off all the html setup and t
+able markup
        
close (DB);    

    foreach $line(@arr){
         $line =~ s/\|//g;
         $line =~ s/<\/TD><\/TR><TR><TD>/\n/g;
         $line =~ s/<\/TD><TD>/\|/g;
         $line =~ s/&nbsp;//g;
         $line =~ s/<\/TD><\/TR>//g;
         $line =~ s/<TR><TD>//g;
         chomp($line);
        }
    
my $lines = \@arr;
my ($Ehour, $Emin);
        for $i (0 .. $#arr){
         $line[$i] = $arr[$i];
         }
         my @lines = ($line[0], $line[1], $line[2], $line[3], $line[4]
+, $line[5], $line[6]);
        
        print (Dumper (@lines)); #shows all lines are in the array pro
+perly.
         
    my ($ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments)
= split(/\|/,$line[0]);

print "$line->[0]\n"; #prints "371225 |3/23/2011|16:34 | 19:34 |2 |Com
+pany Name ||
[download]

You can see that I'm able to split a single line into the variables, but I want to iterate over every line in the $line string to place these variables onto the data. Am I totally setting myself up for failure, or is there a better way to do this?

Comment on Spliting a delimited string into variables Select or Download Code

Replies are listed 'Best First'.
Re: Spliting a delimited string into variables by jethro (Monsignor) on Apr 07, 2011 at 17:07 UTC
You print `$line->[0]` which is quite different from $line[0]. How this prints "371225..." is beyond me. You also do lots of strange things to my eye, but that is to be expected ;-). For example to copy @arr into @line (for which you used a loop) "@arr= @line;" would have been enough. The same goes for "my @lines= ($line[0]..." Also using lots of variables called $line, $lines, @line, @lines makes any program into an entry into the obfuscation contest. Differentiate your variables better What you have been looking for might be this: `# starting with @arr having all the lines my @ticker; my @names=('ticket','dateadded','stime','etime','pri','SiteName','comm +ents'); my $i=0; foreach my $line (@arr) { my @items= split(/\\|/, $line; print "ETime is $items[3]\n"; $ticker[$i]{$names{$_}}= $items[$_] for scalar @names; $i++; } # now $ticker[3]{SiteName} is SiteName of the fourth line` [download] Note @ticker and $i are only neccessary if you want to operate on the data after the loop. If you just want to print or store the stuff, do that in the foreach loop above. You can also just use the split line you have in your script to fill the variables with the data inside the loop instead of using @items if it suits you better	[reply] [d/l] [select]
Re^2: Spliting a delimited string into variables by pissflaps (Initiate) on Apr 07, 2011 at 19:32 UTC
This is really close to where I was going. I can't seem to initialize `{$names{$_}}` due to the curly braces. Is this assigning everything into a hash table? If so, I'm receiving an error about either $names or $items that it isn't initialized, which is very similar to the errors my original code pulled. Is there maybe a different notation to assign everything into a hash while still initializing the variables?	[reply] [d/l]
Re^3: Spliting a delimited string into variables by jethro (Monsignor) on Apr 08, 2011 at 10:47 UTC
Initializing a hash can be done with an array, each pair of values in the array form key and value of the hash. Often '=>' is used as an alias to ',' to make this obvious: `my %hash= (0, 'ticket', 1 => 'dateadded');` [download] `{$names{$_}}` on the other hand is not a really useful expression in perl, `%{$names{$_}}` refers to a hash whose reference is stored as value in another hash (i.e %names). A multidimensional hash or HashOfHashes in perlspeak If what you wanted was a Hash of Hashes, the syntax looks like this: `$names{$_}{key}= 'value'; #or to initialize with an array: %{$names{$_}} = ('key1' => 'value1','key2' => 'value2');` [download] if you want further information about HoH (Hash of Hashes), you might want to read perldsc and perllol	[reply] [d/l] [select]
Re^4: Spliting a delimited string into variables by pissflaps (Initiate) on Apr 08, 2011 at 21:23 UTC
Re^5: Spliting a delimited string into variables by jethro (Monsignor) on Apr 11, 2011 at 08:55 UTC
Re^2: Spliting a delimited string into variables by Anonymous Monk on Nov 28, 2013 at 14:49 UTC
Hi, One other simpler way to make it works is to use regex memory as follow. `my $line="371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|DB P +URGE\|" if ($line =~ /(\d+)\\|([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4})\\|([0-9]{1,2}\: +[0-9]{2})\\|([0-9]{1,2}\:[0-9]{2})\\|(\d+)\\|((?:\w+)\s\w+\s(?:$)?\w+\s +\w+$)\\|(\w+(?:\s)?\w+)/ ) { $item1=$1; $item2=$2; ... $item7=$7 }` [download] Note that the regexp has to be adapted. The parentheses captures their content. First parenthese will be memorised as $1, Second parenthese will be memorised as $2 etc...	[reply] [d/l]
Re: Spliting a delimited string into variables by sundialsvc4 (Abbot) on Apr 07, 2011 at 17:36 UTC
Personally, I am a very big fan of using high-level tools such as HTML::Parser to do as much “heavy lifting” as possible. My rationale is that: any HTML document does have a known structure (even if it is not obliged to adhere to it strictly, in actual practice), and that, “anytime you are dealing with a complex document having any known structure, the best way to deal with such a thing is to use a parser.” There are many, many good parsing engines in Perl. One that I recently had the privilege of beating to a bloody pulp (wink... it proceeded to do everything I asked it to, and more!!) was Parse::RecDescent. (I am still in awe of its author!) But in this case, the source-language is “simply HTML,” and HTML-specific tools abound. All parsing engines are, so to speak, “engines that are really, really good at character-twiddling and which know the lay of the land.” You rely upon them to go about their business and to call your code at strategic points, and to return data structures to you at those times. This is, IMHO, a much stronger strategy than “regex hell,” which often yields solutions that work fine in initial test-cases but then require constant twiddling and head-banging. Let the CPAN-authors do as much banging on your behalf as possible. It will not, of course, eliminate the considerable amount of work that still remains to be done, but it might well make that work vastly easier. HTH ...
Re^2: Spliting a delimited string into variables by Popcorn Dave (Abbot) on Apr 07, 2011 at 20:03 UTC
I've got to second the vote for HTML::Parser or similar parsing engines. A long time ago, before RSS feeds, I wrote a program to parse various newspaper websites and did the regexes by hand. I had 24 different rules for 90+ papers. When I rewrote it, I got it down to 9 rules, mainly based on web page design, since I used a parsing engine. You're going to save yourself a ton of work since if the data changes you're going to have to rewrite your regexes each time. To disagree, one doesn't have to be disagreeable - Barry Goldwater	[reply]
Re^2: Spliting a delimited string into variables by pissflaps (Initiate) on Apr 07, 2011 at 19:55 UTC
Thank you for such an informative response! I'll be sure to look into more about Parse::RecDescent, but for now that may be too daunting to pick up for a novice. Is there an example using HTML::Parser you could describe for using in this situation? I'm unfamiliar with basically any module outside of CGI. :(	[reply]
Re: Spliting a delimited string into variables by Nikhil Jain (Monk) on Apr 07, 2011 at 17:11 UTC
you can try something like use strict; use warnings; use Data::Dumper; my @data = ("371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|D +B PURGE1\|", "371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|DB PURGE1\|", "371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|DB PURGE1\|", "371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|DB PURGE1\|", "371540\|4/07/2011\|08:03\|11:03\|2\|Company Name (MAIN SITE)\|DB PURGE1\|"); print Dumper(@data); foreach my $field (@data){ my ($ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments) = s +plit(/\\|/,$field); print"$ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments\n" +; } [download]	[reply] [d/l]
Re^2: Spliting a delimited string into variables by pissflaps (Initiate) on Apr 07, 2011 at 19:35 UTC
Thank you! This is very close to where I was originally intending to head with the script. I cannot surmise how to access each element to have the variable assigned to it, though. When I run this segment, I fail initializing `$field` within the split. I'm not sure how I got this error because everything is localized in the loop, right? Would I access these variables with `$ticket[0], $ticket[1]`? I could easily adapt the rest of the script if this is the case.	[reply] [d/l] [select]
Re: Spliting a delimited string into variables by pissflaps (Initiate) on Apr 20, 2011 at 16:16 UTC
I've since somewhat resolved the issues I was having and resorted learning all about making an array of arrays while in a for loop. Thank you all for your magnificent help and resources! I couldn't have figured any of this out without you.	[reply]


Syntactic Confectionery Delight
	PerlMonks