Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Getting a chunk of an HTML string?

by Anonymous Monk
on Oct 13, 2001 at 03:48 UTC ( [id://118600]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So I'm trying to write a script that automatically grabs a random quote from the IMDB site. I know that the format of the random quote page is always
    ...page goes for a while...
  <h1>Random Movie Quote</h1>
    ...quote goes here...
  <P><form method=get action="/Games/randomquote.html">
    ...and page keeps going...
I get the site using LWP::Simple's get() command, and store it in a scalar. I then try to regexp the scalar, by saying
$content =~ s#.*<h1>Random Movie Quote</h1>(.*)<P><form method=get act +ion="/Games/randomquote.html">.*#$1#;
which I assumed would get the quote, which is always in between those two strings. However, it always grabs the entire $content string back again. I assume this means it couldn't find a match, but I'm not sure. Any helpful hints? What's the stupendously obvious thing I'm overlooking? (I did try searching the archives and didn't find anything helpful)

Replies are listed 'Best First'.
Re: Getting a chunk of an HTML string?
by wog (Curate) on Oct 13, 2001 at 03:56 UTC
    Your problem is a result of . matching every character but newline, except if the /s option is used. See perlre for more details. Fixing this morphs your code into:

    $content =~ s#.*<h1>Random Movie Quote</h1>(.*)<P><form method=get act +ion="/Games/randomquote.html">.*#$1#s;

    However, for extracting a string from some text, you are better off matching and just using the extracted string, rather then trying to substitute out everything else in one step:

    if ($content =~ m#<h1>Random Movie Quote</h1>(.*)<P><form method=get a +ction="/Games/randomquote.html>#s) $content = $1; # we matched, replace $content with $1. # though it might be clearer to put it # in a different variable. } else { # we didn't match, complain. }

    (Note that similarly you can check the return value of s/// to see if a substitution actually took place.)

Re: Getting a chunk of an HTML string?
by thatguy (Parson) on Oct 13, 2001 at 04:04 UTC
    it would also be benefical to check out Death to Dot Star! by Ovid as well as the Tutorials section.good examples, principles and exercises to build your regex skillz.

    -phill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://118600]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found