Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: If there a way to find the location of the first difference between two strings?

by flexvault (Monsignor)
on Mar 28, 2012 at 16:08 UTC ( [id://962219]=note: print w/replies, xml ) Need Help??


in reply to Re: If there a way to find the location of the first difference between two strings?
in thread If there a way to find the location of the first difference between two strings?

jaredor,

Thanks for your input. If you notice in the original post I said "use bytes" to eliminate concerns about UCA.

perl -e 'use bytes;$s1="abcd";$s2="abcz";$dif=$cmp=$s1 cmp $s2;print " +$dif\t$cmp\n";'

The performance hit for using UCA is just too great. In some of my tests, the performance was degradated by as much as 10,000%. As for "bit-wise operations on strings", I have a math background and started programming by writing code in machine language, and later assembler, Basic, Fortran, C, and a lot of others, until I had the good fortune of being introduced to Perl.

To explain why performance is so critical, I have been writing a "pure-perl" data base engine, to replace Oracle's BerkeleyDB and MySQL in all of our products. So our goal was to come within 20% of the performance of Oracle products. As it turns out, our clients will see enhanced performance when we switch them over, and we will be able to provide database support on any platform that Perl runs on. (An area where Perl excels!)

I have been very impressed with the performance of Perl since 5.8.x. So in profiling( -d:NTYProf ) of the code, the routine I asked about, is called 14,595,348 times on a test of writing 100K records. So even a slight improvement would be welcome. Thanks to the PM answers, I got a 376% increase in performance. Great!
(Note: Some of our clients have databases with billions of records.)

When I wrote Perl performance just gets better and better! my intent was in showing that Perl has improved over the years. It was the first time that I had an actual test case to run on several versions of Perl from 5.6.1 to 5.12.2. Since then I have tested with 5.14.2 with even better results. I don't know why Perl performance is improving for this type of work, but I can demonstrate that it is. I also have incorrectly used the term "modern Perl" in the past, since I didn't realize that a module "Modern::Perl" existed.

Thank you and Good Luck!

"Well done is better than well said." - Benjamin Franklin

  • Comment on Re^2: If there a way to find the location of the first difference between two strings?
  • Download Code

Replies are listed 'Best First'.
Re^3: If there a way to find the location of the first difference between two strings?
by chromatic (Archbishop) on Apr 02, 2012 at 06:58 UTC
    I didn't realize that a module "Modern::Perl" existed.

    It's just a silly little shortcut to enable new (and should-have-been-on-by-default) features in the most recent releases of Perl 5. "Modern" is deliberately vague.

Re^3: If there a way to find the location of the first difference between two strings?
by jaredor (Priest) on Apr 02, 2012 at 06:44 UTC

    Thanks for the background flexvault, I doubt I would have posted anything had I known you were doing something with database keys. I thought you might be writing some sort of diff routine for a homebrew editor or some such. (I should have checked you out anyway to see that you've way too much history and mojo to need to be told about iterators.)

    I looked more at JavaFan and jwkrahn's solutions than your initial statement of the problem, so overlooked your use of the bytes pragma. I guess I'm conditioned to look for the -M and -m options. I've never used the bytes module, which seems to make all strings just byte vectors. Modding out by endianess, do you think there's some sort of bit-wise C idiom out there to capitalize on the fact one and only one of ($s1 & ($s1 ^ $s2)) or ($s2 & ($s1 ^ $s2)) will have the "high order bit"? You might be able to get away from using a regexp by, e.g., craftily using bit shifts. But I'm unfamiliar with issues such as if using numerical ordering in database keys impacts performance with things that might have a different lexicographic ordering.

    I don't think you need to apologize for using "modern Perl" in a general sense. chromatic puts that include at the top of his responses in PM and it's good PR for his excellent book, Modern Perl 2011-2012 Edition, but knowledgeable folk such as yourself are given lots of latitude by students such as myself, who learn a lot whenever you produce a "modern Perl" example.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://962219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 15:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found