comment on

Hi All,

I want to find a way to parse a string in a real performant way. This is what I came up with.
I found 2 methods so far, but there must be a better way to do it.
The string is position delimited, i.e. from the 3the to the 16the it contains something, and so far. Now, I want only the text in these fields, not the blanks.
Let me show you

our @data=<DATA>;
 
# some code comes here...
sub dosubstr {
   foreach my $i ( 0..$#data ) {
      my $line=$data[$i];
      my $jcpu=substr($line,2,16);
      my $j=substr($line,18,48);
      my $s=substr($line,290,16);
      $jcpu =~ s/\s//g;
      $s =~ s/\s//g;
      $j =~ s/\s//g;
      #  warn "$jcpu $j $s";
      # ..store the values in a hash, but that is not important here
      }
   }
 
sub doregex {
   foreach my $i ( 0..$#data ) {
      my $line=$data[$i];
      $line =~ m/^04(?=(\S+)).{16}(?=(\S+)).{40}.{216}.{16}(?=(\S+)).{
+16}/ ;
      my $jcpu=$1;
      my $j=$2;
      my $s=$3;
      #  warn "$jcpu $j $s";
      # ..store the values in a hash, but that is not important here
      }
   } 

__DATA__
04A12345          RELEASE
 
                A12345          RELEASE         A12345
04FTOP            DD_BUIL+
 
        
        FTOP            DD_REKL+        FTOP
04FTOP            DD_PLAN+
 
                FTOP            DD_REKL+        FTOP
[download]

Now, in a simple benchmark study, the doregex function is 3 times faster than the substr. But it just look so complex doesn't it.
So, I am asking the wisdom for my fellow monks to make it more performant.
I am talking about a data of thousands of lines and every second counts, as my operators don't like to wait for webpages :-) Thanks in advance,
Update: fixed substr value to correct Abigail-II comment
---------------------------
Dr. Mark Ceulemans
Senior Consultant
BMC, Belgium

In reply to fast string parser: regex versus substr by mce

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


XP is just a number
	PerlMonks