Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Towards a Text::Wrap::Meaningful?

by Cody Fendant (Hermit)
on Dec 11, 2010 at 13:04 UTC ( [id://876588]=perlquestion: print w/replies, xml ) Need Help??

Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

This is something I've been thinking about for a while because I used Perl to post classic literature to Twitter.

Text::Wrap wraps text at a given number. But in order to improve the human-readability of long text divided into shorter sections, I wanted to wrap the text, while adhering to a numerical limit, at the most meaningful point. So here's some code which attempts that.

The text is from Henry James, who's known for long, long sentences. So we take those long sentences and we look for a point where we can break them up at punctuation -- after all, the author himself has told us there's a pause at that point.

That's the first pass. If that fails, as it does in the text given here, because there's a 150-character passage right there in the first line, with no punctuation at all, we try and break at a conjunction, a word like 'and' or 'but' which forms a syntactic break point, the start of a sub-clause for instance.

If that fails we have no option but to break on whitespace. One conjunction-free, unpunctuated sentence included at the end to test that.

So, comments please? I don't like the use of dummy variables to hold the success or failure of the successive passes, but I can't see any other way to do it. I'd appreciate the thoughts of my fellow Monks.

#!/usr/local/bin/perl use Data::Dumper; my @phrases; my %conjunctions = map { $_ => 1 } qw (than that which who and but); while ( $line = <DATA> ) { # contains sentences; first catch your sen +tence chomp($line); while ( length($line) > 140 ) { $punctuation_split = 0; $conjunction_split = 0; # split the text of the first 140 chars into words # using a negative LIMIT so a trailing space isn't ignored @words = split( /\s+/, substr( $line, 0, 140 ), -1 ); # find a punctuated word to split on, # going backward so the string will be as long as possible for ( $i = @words ; $i > -1 ; $i-- ) { if ( $words[$i] =~ /[,:;]$/ ) { push( @phrases, join( ' ', @words[ 0 .. $i ] ) ); $line = join( ' ', @words[ ( $i + 1 ) .. $#words ] ) . substr( $line, 140 ); $punctuation_split = 1; last; } } unless ($punctuation_split) { # find a conjunction like 'that' to split on, # going backward as before for ( $i = @words ; $i > -1 ; $i-- ) { if ( $conjunctions{ $words[$i] } ) { push( @phrases, join( ' ', @words[ 0 .. ( $i - 1 ) ] ) ); $line = join( ' ', @words[ $i .. $#words ] ) . substr( $line, 140 ); $conjunction_split = 1; last; } } } unless ( $punctuation_split || $conjunction_split ) { # no meaningful split has been found, # split at leftmost space for ( $i = 140 ; $i > 0 ; $i-- ) { if ( substr( $line, $i, 1 ) eq ' ' ) { push( @phrases, substr( $line, 0, $i ) ); substr( $line, 0, $i ) = ''; last; } } } ##$line = undef; } push @phrases, $line; } print Dumper( \@phrases ); __DATA__ The Golden Bowl. The Prince had always liked his London, when it had come to him; he wa +s one of the modern Romans who find by the Thames a more convincing i +mage of the truth of the ancient state than any they have left by the + Tiber. Brought up on the legend of the City to which the world paid tribute, +he recognised in the present London much more than in contemporary Ro +me the real dimensions of such a case. If it was a question of an Imperium, he said to himself, and if one wi +shed, as a Roman, to recover a little the sense of that, the place to + do so was on London Bridge, or even, on a fine afternoon in May, at +Hyde Park Corner. It was not indeed to either of those places that these grounds of his +predilection, after all sufficiently vague, had, at the moment we are + concerned with him, guided his steps; he had strayed, simply enough, + into Bond Street, where his imagination, working at comparatively sh +ort range, caused him now and then to stop before a window in which o +bjects massive and lumpish, in silver and gold, in the forms to which + precious stones contribute, or in leather, steel, brass, applied to +a hundred uses and abuses, were as tumbled together as if, in the ins +olence of the Empire, they had been the loot of far-off victories. The young man's movements, however, betrayed no consistency of attenti +on--not even, for that matter, when one of his arrests had proceeded +from possibilities in faces shaded, as they passed him on the pavemen +t, by huge beribboned hats, or more delicately tinted still under the + tense silk of parasols held at perverse angles in waiting victorias. And the Prince's undirected thought was not a little symptomatic, sinc +e, though the turn of the season had come and the flush of the street +s begun to fade, the possibilities of faces, on the August afternoon, + were still one of the notes of the scene. He was too restless--that was the fact--for any concentration, and the + last idea that would just now have occurred to him in any connection + was the idea of pursuit. Bork bork bork bork bork bork bork bork bork bork bork bork bork bork +bork bork bork bork bork bork bork bork bork bork bork bork bork bork + bork bork bork bork bork bork bork bork bork.

Replies are listed 'Best First'.
Re: Towards a Text::Wrap::Meaningful? (simpler)
by tye (Sage) on Dec 12, 2010 at 10:12 UTC

    I'd probably do that more like this:

    #!/usr/local/bin/perl use strict; my @phrases; my $conj= '\b(?:than|that|which|who|and|but)\b'; while( <DATA> ) { chomp; while( 140 < length($_) ) { if( s/^(.{1,139}\W)\s+// # Break after punctuation || s/^(.{1,136}$conj\S*)\s+// # Break after conjunction || s/^(.{1,139}\S)\s+// # Break on space ) { push @phrases, $1; } elsif( s/^(.{139})// ) { # Break in word push @phrases, "$1-"; } else { die "Impossible!"; } } push @phrases, $_; } print $_, $/ for @phrases; __DATA__ The Golden Bowl. The Prince had always liked his London, when it had come to him; he wa +s one of the modern Romans who find by the Thames a more convincing i +mage of the truth of the ancient state than any they have left by the + Tiber. Brought up on the legend of the City to which the world paid tribute, +he recognised in the present London much more than in contemporary Ro +me the real dimensions of such a case. If it was a question of an Imperium, he said to himself, and if one wi +shed, as a Roman, to recover a little the sense of that, the place to + do so was on London Bridge, or even, on a fine afternoon in May, at +Hyde Park Corner. It was not indeed to either of those places that these grounds of his +predilection, after all sufficiently vague, had, at the moment we are + concerned with him, guided his steps; he had strayed, simply enough, + into Bond Street, where his imagination, working at comparatively sh +ort range, caused him now and then to stop before a window in which o +bjects massive and lumpish, in silver and gold, in the forms to which + precious stones contribute, or in leather, steel, brass, applied to +a hundred uses and abuses, were as tumbled together as if, in the ins +olence of the Empire, they had been the loot of far-off victories. The young man's movements, however, betrayed no consistency of attenti +on--not even, for that matter, when one of his arrests had proceeded +from possibilities in faces shaded, as they passed him on the pavemen +t, by huge beribboned hats, or more delicately tinted still under the + tense silk of parasols held at perverse angles in waiting victorias. And the Prince's undirected thought was not a little symptomatic, sinc +e, though the turn of the season had come and the flush of the street +s begun to fade, the possibilities of faces, on the August afternoon, + were still one of the notes of the scene. He was too restless--that was the fact--for any concentration, and the + last idea that would just now have occurred to him in any connection + was the idea of pursuit. Bork bork bork bork bork bork bork bork bork bork bork bork bork bork +bork bork bork bork bork bork bork bork bork bork bork bork bork bork + bork bork bork bork bork bork bork bork bork.

    Producing:

    The Golden Bowl. The Prince had always liked his London, when it had come to him; he was one of the modern Romans who find by the Thames a more convinci +ng image of the truth of the ancient state than any they have left by the Tiber. Brought up on the legend of the City to which the world paid tribute, he recognised in the present London much more than in contemporary Rom +e the real dimensions of such a case. If it was a question of an Imperium, he said to himself, and if one wi +shed, as a Roman, to recover a little the sense of that, the place to do so was on London Bridge, or even, on a fine afternoon +in May, at Hyde Park Corner. It was not indeed to either of those places that these grounds of his +predilection, after all sufficiently vague, had, at the moment we are concerned with him, guided his steps; he had stra +yed, simply enough, into Bond Street, where his imagination, working at comparatively short range, caused him now and then to stop +before a window in which objects massive and lumpish, in silver and gold, in the forms to which precious stones contribute, +or in leather, steel, brass, applied to a hundred uses and abuses, were as tumbled together as if, in the insolence of the Empire, they h +ad been the loot of far-off victories. The young man's movements, however, betrayed no consistency of attenti +on--not even, for that matter, when one of his arrests had proceeded from possibilities in faces shad +ed, as they passed him on the pavement, by huge beribboned hats, or more delicately tinted still under the tense silk of parasols held +at perverse angles in waiting victorias. And the Prince's undirected thought was not a little symptomatic, sinc +e, though the turn of the season had come and the flush of the streets be +gun to fade, the possibilities of faces, on the August afternoon, were still one of the notes of the scene. He was too restless--that was the fact--for any concentration, and the last idea that would just now have occurred to him in any conn +ection was the idea of pursuit. Bork bork bork bork bork bork bork bork bork bork bork bork bork bork +bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork.

    Or, cutting the max width in half, produces the following output that fits much better in my brower:

    The Golden Bowl. The Prince had always liked his London, when it had come to him; he was one of the modern Romans who find by the Thames a more convincing image of the truth of the ancient state than any they have left by the Tiber. Brought up on the legend of the City to which the world paid tribute, he recognised in the present London much more than in contemporary Rome the real dimensions of such a case. If it was a question of an Imperium, he said to himself, and if one wished, as a Roman, to recover a little the sense of that, the place to do so was on London Bridge, or even, on a fine afternoon in May, at Hyde Park Corner. It was not indeed to either of those places that these grounds of his predilection, after all sufficiently vague, had, at the moment we are concerned with him, guided his steps; he had strayed, simply enough, into Bond Street, where his imagination, working at comparatively short range, caused him now and then to stop before a window in which objects massive and lumpish, in silver and gold, in the forms to which precious stones contribute, or in leather, steel, brass, applied to a hundred uses and abuses, were as tumbled together as if, in the insolence of the Empire, they had been the loot of far-off victories. The young man's movements, however, betrayed no consistency of attention--not even, for that matter, when one of his arrests had proceeded from possibilities in faces shaded, as they passed him on the pavement, by huge beribboned hats, or more delicately tinted still under the tense silk of parasols held at perverse angles in waiting victorias. And the Prince's undirected thought was not a little symptomatic, since, though the turn of the season had come and the flush of the streets begun to fade, the possibilities of faces, on the August afternoon, were still one of the notes of the scene. He was too restless--that was the fact--for any concentration, and the last idea that would just now have occurred to him in any connection was the idea of pursuit. Bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork bork.

    - tye        

      Very much tighter than mine, without the marker variables. Nice. I think the only feature of mine not reproduced is that I put the conjunction into the second half, i.e. you get

      a more convincing image of the truth of the ancient state than any they have left by the Tiber.

      in yours but

      a more convincing image of the truth of the ancient state than any they have left by the Tiber.

      in mine. Plus I'm wary of the \W in your "break on punctuation" picking up unwanted things. But thanks!

Re: Towards a Text::Wrap::Meaningful?
by Tux (Canon) on Dec 12, 2010 at 10:24 UTC

    The only CPAN module for text wrapping I even found useful was Text-Format+NWrap, which sadly isn't on CPAN anymore. You might find a lot of good ideas in that though. Last update was in March 1998.


    Enjoy, Have FUN! H.Merijn

      For what it's worth, it looks like the functionality of of the ::NWrap module was all handled by Text::Format anyway. According to the Text::Format's page:

      Text::NWrap requires Text::Format since it uses Text::Format->format to do the actual wrapping but gives you the interface of Text::Wrap

      So even if ::NWrap has gone away, you should be able to use Text::Format to do all that you wanted.

Re: Towards a Text::Wrap::Meaningful?
by pobocks (Chaplain) on Dec 13, 2010 at 08:26 UTC

    Just out of field-related curiosity (I'm a Library Science Grad student with a B.A. in British/American Lit), why were you posting classics to Twitter? I'm betting that's an interesting use case.

    for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";

      It was really just a fun thing to do. My degree is in English, and I've always been interested in language processing by computers: Markov Chaining, sentence parsing and so on.

      The contrast between the enforced brevity of Twitter and the long meandering sentences of writers like James and Proust led me to create Proustr which posted "Swann's Way" over about a year and a half.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://876588]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2024-04-24 10:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found