comment on

My fellow monks,
I have an interesting text processing task before (Not homework). What I need to do is open a file, skip the first 4 lines, then on all the remaining lines, duplicate each character except for the '^' and '#' characters, and rewrite the file.

On an input file of:

andromeda:davidj perl_test > cat f.txt
^this^
^is^
^a^
^test^
^david#jenkins^
^ cinea#jenkins ^
[download]

the output should be:

andromeda:davidj perl_test > cat out.txt
^this^
^is^
^a^
^test^
^ddaavviidd#jjeennkkiinnss^
^  cciinneeaa#jjeennkkiinnss  ^
[download]

I currently have the following code which works perfectly well:

#!/usr/bin/perl

use strict;

open(FILE, "<f.txt");
open(OUT, ">out.txt");
while(<FILE>) {
    my $str = "";
    chomp $_;
    if( 1 .. 4 ) {
        print OUT "$_\n";
        next;
    }
    while( $_ =~ m/(.)/g ) {
        if( $1 =~ m/(\^|\#)/ ) {
            $str .= "$1";
        } else {
            $str .= "$1$1";
        }
    }
    print "$str\n";
    print OUT "$str\n";
}
close(FILE);
close(OUT);
[download]

I didn't like the idea of creating a temporary string, so I have the following which modifies the text as it is processing it, and also works perfectly well:

#!/usr/bin/perl

use strict;

open(FILE, "<f.txt");
open(OUT, ">out.txt");
while(<FILE>) {
    chomp $_;
    if( 1 .. 4 ) {
        print OUT "$_\n";
        next;
    }

    for( my $i = 0; $i < length($_); $i++ ) {
        if( substr($_, $i, 1) =~ m/(\^|\#)/ ) {
            substr($_, $i, 1) = "$1";
        } elsif( substr($_, $i, 1) =~ m/(.)/ ) {
            substr($_, $i, 1) = "$1$1";
            $i++;
        }
    }
    print OUT "$_\n";
}
close(FILE);
close(OUT);
[download]

I don't like this solution because it breaks the cardinal rule of not modifying a for loop counter inside the loop. (Not that I'm any kind of coding purist, mind you :)

Benchmarking the solutions indicates that (not surprisingly) using a temporary string is quicker. The following results are on 250000 iterations of a file with 1750 lines, each line no more than 50 characters.

andromeda:davidj perl_test > perl test.pl
              Rate 2nd string   In place
2nd string 28969/s         --       -17%
In place   35112/s        21%         --
[download]

Now to my curiosity: Both of these solutions work and I am satisfied with using either of them. What I'd like to have, purely for the educational value, is a more "Perlish" way of doing this, and/or a more efficient way.

as always thank you for your assistance,

davidj

In reply to modifying a string in place by davidj

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks