Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

split function against a HTML code

by greatshots (Pilgrim)
on Nov 26, 2007 at 05:50 UTC ( [id://652920]=perlquestion: print w/replies, xml ) Need Help??

greatshots has asked for the wisdom of the Perl Monks concerning the following question:

monks,

#!/usr/bin/perl use strict; use warnings; my $string = qq[<th width="200px">KPI</th><th width="100px">Type</th>< +th width="75px">Data Type</th>]; my @result = split('>',$string); print "Check :",$#result,"\n";
When I execute the above code I am able get the expected result (i,e) The split with '>' is working as expected. But If I store the below into a file and through file read if I apply the split it is not working as expected.
<th width="200px">KPI</th><th width="100px">Type</th><th width="75px"> +Data Type</th>
I have no clue what is happening. could any one explain me ?

Replies are listed 'Best First'.
Re: split function against a HTML code
by atcroft (Abbot) on Nov 26, 2007 at 06:38 UTC

    Using Data::Dumper and your code, I was able to get what I interpretted as your reason for believing the code did not work: when the line was read in, it was split into a total of 7 segments (the inline version having been split into 6).

    When read in from a file, unless cleaned up (via chomp() or similar), the line contains a line ending sequence (such as "\n", "\r\n", or "\r\r", depending on platform). This line ending, appearing after the last '>', would result in an extra segment in the split() results. Try chomping the string, and see if that gives you the results you desire. (You may also want to look at the content of the array with Data::Dumper or a similar module to get a better idea of what is occurring.)

    Hope that helps.

Re: split function against a HTML code
by localfilmmaker (Novice) on Nov 26, 2007 at 06:20 UTC
    How are you reading in the text of the file? And how are you doing the split of that text? You should be doing something like this:
    my $file = 'blah.html'; # Slurp the file contents into one big string open my $fh, '<', $file or die "Cannot open $file: $!"; my $str = do { local $/; <$fh>; }; close $fh; # Split on '>' my @results = split '>', $str; print "Check :", $#results, "\n";
    On another note, what is it you are trying to accomplish by splitting on '>'? Are you trying to get at each HTML tag, or a certain tag? Is this HTML trustworthy? Your code may not give you the result you expect if the HTML isn't valid and is missing a > or has an extra > in there. Just something to consider.
      open ( FH , "sample.html") or die "Unable to open the file :$!:"; while ( <FH> ) { chomp; my $line = $_; if ( $line =~ />KPI</i ) { print "$line\n----------------------\n"; my @line_split = ('<',$line); print "LINE_SPLIT :",@line_split,"-----",$#line_split,"\n"; my @result = map ( $_ ne '' , @line_split); print "Join :",join('-',@result); close (FH); } }
      I tried the above code. But the split does not split the input line as expected. :-/
        my @line_split = ('<',$line);
        looks like you neglected to call the split function at all: my @line_split = split('<', $line);
      Are you trying to get at each HTML tag
      Yes.
Re: split function against a HTML code
by erroneousBollock (Curate) on Nov 26, 2007 at 06:21 UTC
    I'll put aside the fact that you're parsing HTML by hand. (please use something like HTML::TreeBuilder.

    As you don't post your code I can't tell whether you're successfully writing/reading the file. (please check the return values of open/close.)

    Also you don't actually state what's not happening.

    If you're get two different results from two instances of the same data (where one is inline and the other comes from a file), they're not the same :-).

    Look at:

    • do you really have what you think in your variables? (use Data::Dumper)
    • are you using reasonable encodings? (utf8?)

    -David

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://652920]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-20 05:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found