Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
#!/usr/bin/perl -w use strict; my $var = "xxx:12345 yyy:54321 zzz:13245"; my @items = split("\:",$var); @items = split(" ",$items[1]); print "$items[0]\n";
xxx, yyy, zzz are basically completely random.
What I am looking for is a way to grab the value of xxx without having to use multiple split's.
Thanks in advance.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: split question
by Anonymous Monk on Sep 20, 2002 at 12:58 UTC | |
$var =~ m/.+\:(.+?)\s.+\:(.+?)\s.+\:(.+?)/; | [reply] [Watch: Dir/Any] |
by RMGir (Prior) on Sep 20, 2002 at 13:20 UTC | |
I compared your regex (called greedy) with a version using the +? nongreedy qualifier, both against your string (where both will give correct results) and a much longer string, where the non-greedy version will match the first 3 codes, and the greedy version will match the first code and the last 2. As you can see, the non-greedy version runs considerably faster, since it doesn't wind up trying as many alternatives (a.k.a. backtracking). Here's the comparison code: Read more... (959 Bytes)
Those results were with 5.6.1 on Cygwin, your results may vary.
-- Mike | [reply] [Watch: Dir/Any] [d/l] [select] |
by sauoq (Abbot) on Sep 20, 2002 at 20:31 UTC | |
One way to do it with split:
-sauoq "My two cents aren't worth a dime."; | [reply] [Watch: Dir/Any] [d/l] |
Re: split question
by helgi (Hermit) on Sep 20, 2002 at 13:36 UTC | |
This results in the following output:
Regards, Helgi Briem | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: split question
by George_Sherston (Vicar) on Sep 20, 2002 at 13:04 UTC | |
For the particular case you are dealing with, and if you are sure that the form of the input data will always be "space, non-space characters, colon, digits" then you could use a regex thus: The output is: The disadvantage of this is that it does depend on regular input and won't tell you if there is a breakdown in the input, but just spit out rubbish. Better to get a module for general use. § George Sherston | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: split question
by BrowserUk (Patriarch) on Sep 20, 2002 at 19:26 UTC | |
If as both your words and code imply, you are only after xxx and xxx always has length 3 then: print substr($var,0,3); is much simpler and faster than any regex. If the length of xxx can vary then
is almost as simple and still more efficient that a regex. (And easier to get right first time:) If you actually want to get xxx, yyy, and zzz and they are always length 3, then
The use of a regex
Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring! | [reply] [Watch: Dir/Any] [d/l] [select] |
by blakem (Monsignor) on Sep 20, 2002 at 20:43 UTC | |
a little harder to get right first timehehe, I guess so. Your while condition doesn't fail until *after* we've printed out the substr using a value of -1 for $p. Therefore you get a phantom match of '324' given the sample input. You might also be surprised at how this benchmarks against a well crafted regex. The regex engine has some clever optimizations under the hood. This benchmark surprised me as well... I tossed in a sexegersolution that I thought would perform well, since we are looking for stuff in front of a known character. Anyway, it didn't perform as well as either of the other solutions, but the regex did win the race: And here is the Benchmark code
-Blake | [reply] [Watch: Dir/Any] [d/l] [select] |
by BrowserUk (Patriarch) on Sep 20, 2002 at 21:28 UTC | |
That'll teach me to try and one-line my original solution.:( For what it's worth, I didn't say that the last one would be more efficient, but I did say it would work ;(. The original was
but I didn't like the double test against -1, so I tried to get rid of it. Don't know how I missed that it printed the extra one. A case of seeing what I wanted to see I guess. I'm not that surprised that doing the looping inside the regex engine is more efficient than at user level. I'm guessing that it makes a single pass looking for fixed anchors like the : when the /g options is used. I am surprised how much more efficient it is. Nice benchmark BTW. Something I need to get better at. Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring! | [reply] [Watch: Dir/Any] [d/l] |