Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Unwrapping the Dictionary

by hacker (Priest)
on Aug 21, 2007 at 03:08 UTC ( [id://633986]=perlquestion: print w/replies, xml ) Need Help??

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I've got a small project brewing and have been adding features over the last 1.5 weeks. The most-recent feature I've added is a dict.org lookup for each word in my taxonomy.

The code looks like this (and works great):

sub define_term { my $definition = shift; my $dict = Net::Dict->new('test.dict.org'); my $result = $dict->define("$definition")->[1][1]; if (!$result) { $result = $dict->define(substr($definition, 0, -1))->[ +1][1]; } if ($result) { print $cgi->h3("Definition of $definition"); print $cgi->hr({-size=>'1'}); print $cgi->pre("$result"); } }

This returns a definition that looks like the following:

bean n 1: any of various edible seeds of plants of the family Leguminosae [syn: {edible bean}] 2: any of various seeds or fruits suggestive of beans 3: any of various leguminous plants grown for their edible seeds and pods [syn: {bean plant}] 4: informal terms for a human head [syn: {attic}, {bonce}, {noodl +e}, {noggin}, {dome}] v : hit on the head, esp. with a pitched baseball

These words are automagically wrapped in the response that comes back from dict.org, and that presentation isn't exactly ideal for my needs. I tried the various ::Wrap and unwrap modules, and none seem to do the job that I need.

I tried using Text::Flow::Wrap, Text::Wrap, Text::Wrap::Smart, and some hand-rolled regexes... without much success. It seems these modules are all designed to WRAP text, not UNwrap text.

In the above case, I'm trying to get the result that looks like (PM will wrap this somewhat, but what I'm trying to do is keep each numbered item on its own line, unwrapping each \d\..*, essentially):

bean n 1: any of various edible seeds of plants of the family. Legumin +osae [syn: {edible bean}] 2: any of various seeds or fruits suggestive of beans 3: any of various leguminous plants grown for their edible seeds +and pods [syn: {bean plant}] 4: informal terms for a human head [syn: {attic}, {bonce}, {noodl +e}, {noggin}, {dome}] v : hit on the head, esp. with a pitched baseball

Can others suggest a better way of doing what I'm trying to do?

Replies are listed 'Best First'.
Re: Unwrapping the Dictionary
by BrowserUk (Patriarch) on Aug 21, 2007 at 03:45 UTC

    This might suffice. It relies on the continuation lines starting (after the leading whitespace) with at least 3 non-space, non-':' characters.

    #! perl -slw use strict; my $def =<<'EOD'; bean n 1: any of various edible seeds of plants of the family Leguminosae [syn: {edible bean}] 2: any of various seeds or fruits suggestive of beans 3: any of various leguminous plants grown for their edible seeds and pods [syn: {bean plant}] 4: informal terms for a human head [syn: {attic}, {bonce}, {noodl +e}, {noggin}, {dome}] v : hit on the head, esp. with a pitched baseball EOD $def =~ s[\n\s+(?=[^\s:]{3})][ ]smg or warn 'no match'; print $def; __END__ c:\test>junk bean n 1: any of various edible seeds of plants of the family Legumino +sae [syn: {edible bean}] 2: any of various seeds or fruits suggestive of beans 3: any of various leguminous plants grown for their edible seeds +and pods [syn: {bean plant}] 4: informal terms for a human head [syn: {attic}, {bonce}, {noodl +e}, {noggin}, {dome}] v : hit on the head, esp. with a pitched baseball

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://633986]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 16:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found