Re: Fast Replacement
by muba (Priest) on Jun 14, 2013 at 05:21 UTC
|
The reason it is slow is that for every call to index, Perl has to go through the process of checking each character in the string whether it is an "!". Again and again and again. So you could use the $index variable to tell Perl not to bother about the characters it has already checked:
while ( index($group, "!", $index) > -1 and $index<50000 ) {
Alternatively, because TIMTOWTDI:
use strict;
use warnings;
my $string = "abc!def!ghi!jkl!mno!pqr!stu!vwx!yz";
my $limit = 3;
$string = join("\n", split(/!/, $string, $limit));
print $string;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Fast Replacement
by hdb (Monsignor) on Jun 14, 2013 at 06:32 UTC
|
The call to index returns the position of the found character, so you can replace it directly using substr if you capture the output from index. Additionally, as pointed out by muba, you do not need to start from the beginning every time, but save time by starting from the last position found.
use strict;
use warnings;
my $group = "a!" x 50001;
my $count = 0;
my $pos = 0;
substr $group, $pos, 1, "\n" while( ($pos = index( $group, "!", $pos )
+) > -1 and $count++ < 50000 );
print $group;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Fast Replacement
by choroba (Cardinal) on Jun 14, 2013 at 07:26 UTC
|
If you are reading $group from a file, you can probably replace the exclamation marks when reading the file already. Something like { local $/ = '!';
while (<>) {
chomp;
print "$_\n";
last if 50_000 <= $.;
}
}
print <>;
| [reply] [Watch: Dir/Any] [d/l] |
Re: Fast Replacement
by davido (Cardinal) on Jun 14, 2013 at 04:22 UTC
|
If I'm reading correctly (Update: I wasn't reading correctly), you're substituting all occurrences of "!" with a "\n" newline as long as it falls within the portion of the string that comes before the 50,000th position.
substr( $group, 0, 50000 ) =~ tr/!/\n/;
That's about the best I can come up with. You're using substr as an lvalue so that the change propagates back to $group, but is constrained to just the range specified. And you're using tr/// which is faster than s/// for single-character transliteration, where a search pattern isn't required.
Update: Bah, I can see already that I misread what you're doing. Looks more like you want to replace the first 50k "!" characters with newlines, not all "!" characters that reside in the first 50k positions. Pardon me. ;)
Update 2: Here's a version that will substitute all '!' characters with \n, up to 50k times. After that, it will no longer match. It will be faster, but not necessarily legible:
$group =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)' })/\n/g
+;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Fast Replacement (0.000025s)
by BrowserUk (Patriarch) on Jun 14, 2013 at 10:20 UTC
|
Try it this way. This replaces all the '!'s in the first 50k bytes of the string with newlines in 25 microseconds:
$x = '1234!' x 11000;;
say length $x;;
55000
$t=time;
substr( $x, 0, 50e3 ) =~ tr[!][\n];
printf "%.9f\n", time() - $t;;
0.000025034
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
. | [reply] [Watch: Dir/Any] [d/l] |
|
While normally I greatly respect your insight, appreciate your input, and value your code, in this case I feel I have to point out that sathishselvam doesn't seem to want to replace any "!" occuring in the first 50k bytes of the input string, but rather he wants to replace the first 50k occurances of "!". davido seems to agree with me on this one.
| [reply] [Watch: Dir/Any] |
|
Looking again I see you're right.
But still, rather than invoking the regex engine 50,000 times, better to search for the position of the 50,000th ! and then replace in one pass.
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my $s = '1234!' x 55e3;
my $start = time;
my( $p, $c ) = ( 0, 50e3 );
1 while --$c and $p = 1+ index $s, '!', $p;
substr( $s, 0, $p ) =~ tr[!][\n];
printf "Took %f seconds\n", time() - $start;
__END__
C:\test>junk71;;
Took 0.011771 seconds
C:\test>junk71;;
Took 0.009690 seconds
That could probably be sped up with a binary chop for the position, but it hardly seems worth it.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] |
|
|
|
|
Re: Fast Replacement
by gurpreetsingh13 (Scribe) on Jun 14, 2013 at 04:44 UTC
|
Assuming you want to replace all occurrences within the file, why not just use perl command line.
perl -pi -e 's/!/\n/g' <filename>
| [reply] [Watch: Dir/Any] [d/l] |