Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Simple line parse question

by jimmy.pl (Initiate)
on Aug 07, 2010 at 00:53 UTC ( [id://853501]=perlquestion: print w/replies, xml ) Need Help??

jimmy.pl has asked for the wisdom of the Perl Monks concerning the following question:

Calling all perlmonks.

I want to compose a single word from the 3rd and 4th words of an input line. ie:

input: "a b c d e f g" output: "cd"

My current solution is using split and join like so:

my $result = join("", (split(" ", $line, 5))[2,3])

I've tried many different ways but this way seems to perform the fastest. Can anyone tell be a better/faster way to do this?

Cheers

Replies are listed 'Best First'.
Re: Simple line parse question
by AnomalousMonk (Archbishop) on Aug 07, 2010 at 01:33 UTC

    Don't know about speed, but my own preference would be for something along the lines of:

    >perl -wMstrict -le "my $s = 'aa b: CCC DD. eee, ff ggg?'; my $word = qr{ [[:alpha:]]+ }xms; my @words = $s =~ m{ $word }xmsg; my $result = join q{}, @words[2,3]; print qq{'$result'}; " 'CCCDD'

    This allows better definition and control of what a 'word' is.

    Updates:

    1. One can also avoid the intermediate  @words array as in the OP with the slightly faster
          my $result = join q{}, ($s =~ m{ $word }xmsg)[2,3];
    2. Improved code example slightly to try to show that naive splitting on whitespace might produce unintended results. Better, IMO, to define and extract the thing itself rather than try to define and eliminate everything you're not interested in.

Re: Simple line parse question
by GrandFather (Saint) on Aug 07, 2010 at 01:45 UTC

    Why do you think the solution you have provided is inadequate to the task? Maybe if you tell us something of the bigger problem we can help you find a better higher level solution?

    True laziness is hard work
Re: Simple line parse question
by nvivek (Vicar) on Aug 07, 2010 at 04:28 UTC

    Yeah,you could do with split and join but it needs two functions.Instead, you can use the simple regular expression to achieve it.You check the following code.I have taken the space as a delimiter between each word in the string.

    use strict; use warnings; my $string="one two three four five six"; $string=~/^(\w+ ){2}(\w+) (\w+)/; print $2$3; #it prints the words correctly as you expected
Re: Simple line parse question
by Marshall (Canon) on Aug 07, 2010 at 10:26 UTC
    Can anyone tell be a better/faster way to do this?

    To me "better" means more clear. The number one goal of software should be clarity..."hey, is it easy to understand what this code does?"

    Performance is usually a secondary goal. However strange as it may be, if your code is clear, you will often achieve high performance.

    Search for "benchmark" and you will find ways to measure the performance of version X vs Y.

    Your code:
    join("", (split(" ", $line, 5))[2,3])
    is not easy to understand. Do not mistake fewer lines as meaning higher performance.

    I think the following is clear and works well. Don't be shy about giving some intermediate variable a name.

    #!/usr/bin/perl -w use strict; my $input = "a b c d e f g"; my @words = split(/\s+/,$input); print @words[2,3], "\n"; __END__ Prints: cd
Re: Simple line parse question
by jimmy.pl (Initiate) on Aug 07, 2010 at 11:14 UTC

    Thanks for comments.

    My goal is basically to achieve similar timing to the following in awk:

    echo "a b c d e f g" | awk '{print $3$4}'

    I find it hard to believe that my split&join solution is the fastest perl has to offer to achieve this. This one little line of code in my script is actually turning out to be quite the performance hotspot. So i thought, why not ask here to see if there's a faster way that i'm unaware of. I've already try the following, but they're all slower than my split&join:

    1: ... | perl -ne 'printf("%s%s", (split(" ", $_, 5))[2,3]);' 2: ... | perl -ne 'print /(?:\S+ ){2}(\S+) (\S+)/ 3: ... | perl -ane 'print "$F[2]$F[3]";' 4: I even wrote my own subroutine using index/substr to extract what i + need ...

    I guess i'm hoping someone will introduce me to a new technique. We can't let the awk'ers have this one so easily can we?

      jimmy.pl:

      We can't let the awk'ers have this one so easily can we?

      Keep in mind that awk is a more specialized tool than perl, so it's really not important if awk can do some things faster than perl. It's fine to care about runtime speed, but it can waste your time. Until a program must be faster, spending time optimizing it is simply a waste of your own time. If you enjoy working overtime, then have at it. But I find it better to spend that time with family, friends, goofing off, etc.

      Remember: first make it work. Then make it work correctly. Next, check if it meets requirements. If, and only if, it fails to meet speed requirements, make it faster.

      ...roboticus

      Assembly language: Fun and runs fastest!. I haven't had to use it since around 1995.

      C/C++: Fun and runs fast! I use it for everything I need to make faster.

      Perl: Fun and fastest to write! Fast enough runtime for 95+% of everything I do.

        I think that roboticus is "on it"!

        From my experience, the coding efficiency of Perl vs C is in the range of 3x-10x:1. Recoding a 5 page C program into a one page Perl program that achieves the same functionality would not be a surprising result.

        The Perl program will run at something like <1/3 the speed of the C program, but often (and VERY often), this does not matter at all! Perl OO vs say C++ is a different thing and it has an additional performance penalty.

        My only slight "nit" with this would be about assembly. In the past decade, the C "super optimizing" compilers have become so good, that you have to be a real guru at ASM to beat them. It is possible to do for very focused tasks, but it is certainly not easy! Some folks can actually wind up writing slower ASM code than the compiler can do.

      I find it hard to believe that my split&join solution is the fastest perl has to offer to achieve this.

      Believe, is that Swahili for Benchmark?

      Whoa!
      This is very "awk_weird"
      Give us an input file and an expected result.

        You can generate the input yourself. For example:

        xxx@xxx:~/test/perl$ seq 100 1000000 | perl -ne 'print int(rand($_)), +"\n"' | xargs -n10 echo > a xxx@xxx:~/test/perl$ wc -l a 99991 a xxx@xxx:~/test/perl$ for i in {1..100}; do cat a; done > b xxx@xxx:~/test/perl$ wc -l b 9999100 b xxx@xxx:~/test/perl$ cat b | time -p awk '{print $3$4}' > /dev/null real 8.78 user 7.89 sys 0.38 xxx@xxx:~/test/perl$ cat b | time -p perl -ne 'print join("", (split(" + ", $_, 5))[2,3]),"\n";' > /dev/null real 13.78 user 12.93 sys 0.32

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://853501]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-16 11:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found