http://qs321.pair.com?node_id=614837

Cap'n Steve has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to "nicely" truncate some text, which means not cutting any words in half. I haven't been able to find a module that does what I want, and this is what I've tried so far without success:
sub truncate_string($$) { my $string = shift; my $maxlength = shift; $string =~ s/^(.{,$maxlength})\b/$1.../s; return $string; }

Replies are listed 'Best First'.
Re: How do I truncate a string while preserving words?
by Tomte (Priest) on May 11, 2007 at 08:03 UTC

    Try

    $string =~ s/^(.{0,$maxlength})\b.*$/$1.../s;
    that should work as expected (at least as far as the presented code goes...).

    Edit: explanation: you can ommit the upper bound, but not the lower bound, as you did. From perlre:

    {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m times
    and you have to match the rest of the input (.*$) for it to be replaced too - otherwise you would simply insert the ellipsis (...) without truncating the input.

    regards,
    tomte


    An intellectual is someone whose mind watches itself.
    -- Albert Camus

      Darn it, I swear I remember being able to omit the minimum number of matches. I wonder what it was doing with my regular expression since it didn't issue a warning. Thanks for pointing out the other error, too, I hadn't even thought about that yet.
Re: How do I truncate a string while preserving words?
by daviddd (Acolyte) on May 11, 2007 at 15:06 UTC

    What about the standard module Text:: Wrap?

    Here's a promising-sounding snippet from its docs:

    Text::Wrap::wrap() is a very simple paragraph formatter. It formats a single paragraph at a time by breaking lines at word boundaries.

    Or if that's not sufficient, Text::Autoformat is capable of more complex things.

      Maybe I read it wrong, but it looks like those modules are geared toward displaying all the text, just divided into lines. I wanted to only display the first part, kind of like a preview.
        Try Text::Autformat and split to get the first line.
        use strict; use warnings; use Text::Autoformat; my $string = 'A lengthy text with more than xxx characters to demonstr +ate truncate'; print "Before: $string\n"; print 'Length: ' , length($string) , "\n"; my $truncated = &truncate($string,20); print "After: $truncated\n"; print 'Length: ' , length($truncated) , "\n"; ########################################################### sub truncate { my $string = shift @_; my $length = 75; $length = shift if @_; $string = autoformat($string, { left => 0, right => $length , widow => 0, }); ($string) = split(/\n/,$string); # Just the first element return $string; } ## ###########################################################
        Just my two cents. Hope this helps
Re: How do I truncate a string while preserving words?
by DrHyde (Prior) on May 11, 2007 at 09:06 UTC
    1. Does your string contain a space in the first N characters? If not, abort.
    2. Truncate at N+1 characters
    3. Truncate at last space or in string (use the rindex function)
    sub truncate_string { my($string, $maxlength) = @_; $string = substr($string, 0, $maxlength+1); die("Can't truncate, no spaces\n") if(index($string, ' ') == -1); return substr($string, 0, rindex($string, ' ') - 1); }
    All code is, of course, untested.
      You needn't cut it if the string is not longer than maxlength. Just return the original string then.
      return substr($string, 0, rindex($string, ' ') - 1);
      I'd cut it on any whitespace, or at least, in front of the last partial word, like this:
      $string =~ s/\s*[\w\-]*$/.../; return $string;
      I'm assuming that a hyphen is a part of a word. Your idea may differ from mine.

      p.s. Hmm, that'll leave any trailing nonword/nonspace characters, for example punctuation characters, intact, making the string just one character too long.

      Perhaps this is better?

      $string =~ s/\s*(?:[\w\-]+|\W)$/.../;
      or
      $string =~ s/\s*[\w\-]+$/.../ or $string =~ s/\s*\W$/.../;
Re: How do I truncate a string while preserving words?
by rkrieger (Friar) on May 11, 2007 at 10:31 UTC

    Assuming you're working on a paragraph of text and want to also break within words. If that's not what you meant with your OP, this probably isn't what you're looking for.

    Doing a quick CPAN search, would Tex::Hyphen be something for you? From the list of changes, it would also seem to support other languages than English.

Re: How do I truncate a string while preserving words?
by cdarke (Prior) on May 11, 2007 at 10:23 UTC
    A different way of looking at this could be to use formats (see perlform), maybe through formline and $^A. A format of ^<<<<<<<< (etc) should split a line at the correct place, and that can be adjusted using $:
Re: How do I truncate a string while preserving words?
by Moron (Curate) on May 14, 2007 at 13:38 UTC
    Although others have solved the problem with the presented code, I'd have separated out the functional behaviour for maintenability and potential change impacting each subrule rather than implement multiple functions in a single regexp. e.g.:
    sub truncate_string($$) { my ( $string, $max ) = @_; # always do nothing if already short enough ( length( $string ) <= $max ) and return $string; # issue warning if forced to chop a word anyway if ( $string =~ /\s/ ) { warn "cannot truncate string on word boundary"; return substr( $string, 0, $max ); } # truncate pre-existing trailing whitespace $string =~ /^(.*)\s+$/ and return $1; # otherwise truncate on word boundary $string =~ s/\S+$// and return $string; die; # unreachable }
    __________________________________________________________________________________

    ^M Free your mind!