http://qs321.pair.com?node_id=315468

dino has asked for the wisdom of the Perl Monks concerning the following question:

Its been a very very long time since I've visited here, so apologies in advance for any major blunders or asking an obvious question. Q: Basically I want to truncate a big string, but I want to do it 'in place' for efficiency reasons. What is the most efficient method to do this please?

Replies are listed 'Best First'.
Re: Efficienty truncating a long string
by BrowserUk (Patriarch) on Dec 18, 2003 at 09:53 UTC

    A quick test shows that substr is pretty intelligent about the way it operates.

    # OS reports memory use 3336k my $s = ' ' x 1_000_000; # OS reports memory use 4320k $s = substr $s, 0, 999_999; # OS continues to report 4320k

    If substr was acting as a copy operator the memory would have to grow again to accomodate the copy. That is doesn't, even in this non-lvalue usage tends to indicate that the code has the smarts to recognise when the destination of a substr assignment is the same and the source and it performs a simple adjustment to the length of the SV in-situ, which is about a fast as is possible to be.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      not sure, but i think this happen because OS copy-on-write memory pages, not by substr

        That's an intersting thought, but doesn't appear to be the case.

        # 3332k; $s = ' ' x 1_000_000; # 4316k; $s = substr $s, 0, 999_000; # 4316k; $s .= '?' x 2000; # 4316k;

        Had that been the case, I would have expected to see memory growth when I appended to the copied scalar, but this doesn't happen. (On win32 anyway.)

        Conversly, if it were a copy-on-write phenonema, then assigning the truncated substring to another scaler would likewise defer the copy until the new scalar was modified, which doesn't happen.

        # 3336k; $s = ' ' x 1_000_000; # 4320; $t = substr $s, 0, 999_999; # 5308k;

        Tracking the sources, I can't see any explicit step taken in pp_substr or sv_setpvn to avoid copying when the source and target are the same. However, the address of the target is known to the code at this point and a call is made to sv_GROW to ensure that the target (in this case the same as the source), is large enough, and it is here where any extra memory allocation would be performed. In this case, the target SV is the same as the source, and as the "growth" required is actually shrinkage, no allocation is necessary.

        The actual copy of the data is (eventually) performed using the C-library call memmove().

        This is the memcpy() look-alike that has extra nounce to deal with overlapping copies. In the case of a simple truncation, the logic -- which I don't have access to, but I can guess at -- probably results in simply copying a single null byte to the insertion point.

        What actually happens is also dependant upon the C runtime used, but this is an obvious optimisation that probably exists in all versions of memmove()


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!

Re: Efficienty truncating a long string
by tune (Curate) on Dec 18, 2003 at 09:25 UTC
    I think substr is the most efficient.

    --
    tune

Re: Efficienty truncating a long string
by PodMaster (Abbot) on Dec 18, 2003 at 09:28 UTC
      Well I've looked at substr and can understand its normal non lvalue usage, but as far as I can see its a copying operator, returning a substr of the original. I want to truncate a string without the copying overhead. The lvalue version seems to be ok, but I dont quite understand how to use it to truncate a string. Any suggestions?
        I don't know whether substr is optimized not to bother returning anything in void context (probably is), but this is how you do it
        my $long_string = 'string' x 5; print $long_string,$/; substr( $long_string, 6 ) = ''; # truncate to 6 characters # # same thing, only using 4 arg substr # substr( $long_string, 6, length($long_string) - 6, '' ); # print $long_string,$/; __END__ stringstringstringstringstring string

        MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
        I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
        ** The third rule of perl club is a statement of fact: pod is sexy.