Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Lingua Splitter

by downer (Monk)
on Nov 04, 2007 at 21:45 UTC ( [id://648909]=perlquestion: print w/replies, xml ) Need Help??

downer has asked for the wisdom of the Perl Monks concerning the following question:

this is a quick and easy one. I am trying to grab all the words of an html page, in order. of course, rather than use my own heuristics, i'd like to use an established package. here is my code:
#$contents = contents of HTML page my $parsed = $scrubber->scrub($contents); my $splitter = new Lingua::EN::Splitter; my @words = $splitter->words($parsed); foreach my $x (@words) { print "$x\n"; }
the print statement just gives ARRAY(0x1864870). same if i try to print in quotes. what am i doing wrong?

Replies are listed 'Best First'.
Re: Lingua Splitter
by FunkyMonk (Chancellor) on Nov 04, 2007 at 22:01 UTC
    This year-old bug report says qq{The "words" and "paragraphs" methods return references but the documentation portrays them as returning lists}, so try
    my @words = @{ $splitter->words($parsed) };

    or

    my $words = $splitter->words($parsed); foreach my $x (@$words) { print "$x\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://648909]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 13:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found