Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Massive regexp search and replace

by albert.llorens (Initiate)
on Feb 10, 2005 at 13:31 UTC ( [id://429705]=note: print w/replies, xml ) Need Help??


in reply to Re: Massive regexp search and replace
in thread Massive regexp search and replace

Thanx Hena. I will try what you suggest and see if it reduces processing time sufficiently.

As for your assumtions, a sample replacement patterns list (REGEX) could be:
\b([a-z])([a-z]*)ung\b \u$1\l$2ung Treecontrol Tree Control [Tt]abreiter Reiterelement [Tt]ile Teilbild
And a sample input text (INPUT) for the replacements could be:
Die Segnung ist gestern erfolgt. Die segnung ist gestern erfolgt. Die Rechnung wird geschickt. Die rechnung wird geschickt. Die Treecontrol. Die Tabreiter. Die tabreiter. Die Tile. Die tile.
I wonder if this changes anything in what you suggest...

Replies are listed 'Best First'.
Re^3: Massive regexp search and replace
by Hena (Friar) on Feb 10, 2005 at 14:05 UTC
    Well, all direct text translations might be handled faster... but unless there is a lot of them compared to others then it probably won't help (might actually be slower). The actual help would be better to be tested as this is pure speculation :).

    Basicly make to hashes instead of one. Something like this.
    while (<REGEX>) { chomp; my ($key,$value) = split (\t,$_); $value = "\"$value\""; if ($key=~s/^\w+$/) { $simple{$key}=$value; } else { $regex{$key}=$value; } } while (<INPUT>) { s/$key/$regex{$key}/gee foreach my $key (keys %regex); foreach (split (/\s+/,$_)) { if (exists($simple{$_})) { push (@line,$simple{$_}); } else { push (@line,$_); } } print OUT "@line\n"; }
    Note that in the given examples, you might write out the '[Tt]ile' pattern to Tile and tile rows. Which would move it from slower pattern group to faster. But as I said, I'm not sure how much this would help.
Re^3: Massive regexp search and replace
by hsinclai (Deacon) on Feb 10, 2005 at 14:07 UTC
    Expanding on Hena's idea I wonder if it would be even more efficient to use Tie::File to go through, writing replacements as you go (untested):
    use Tie::File; my $inputfile = "samplein.txt"; &replacer($inputfile); sub replacer { tie my @currentfile, 'Tie::File', $inputfile or die "$!"; my $inputline; foreach $inputline ( $currentfile[0] .. $#currentfile ) { foreach my $key (keys %regex) { $inputline =~ s/$key/$regex{$key}/gee; } } untie @currentfile; } ## Totally untested

    Seems like the write operation would be faster with Tie::File

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://429705]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-03-29 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found