Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Trying to avoid line noise (brain cramp)

by Ovid (Cardinal)
on Mar 09, 2001 at 16:59 UTC ( [id://63232]=perlquestion: print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

People often tell me that Perl resembles line noise and they don't want to use it as a result. I've written a small script to analyze some SQL statements and I've written "line noise" and am soliciting advice on how to clean the code.

I have a sample of SQL from the MS-SQL profiler. I need to identify all unique "INSERT INTO Photo..." statements. My thought was to read the file, select lines that have the approriate sql and break them into a three part array with the middle element containing the field names. Then I push a reference to that array onto another array. Next, I sort that array on the field names and print unique SQL inserts (tracking them by stuffing the first instance of each combination of field names in a hash). Here's the code:

use warnings; use strict; my ( @results, %fields ); while(<>){ s/\0//g; if (/(?<!from )INSERT INTO Photo/) { my @s = ( /([^(]+)(\([^)]+\))(.*)/ ); push @results, \@s; } } my @final = sort{ $a->[1] cmp $b->[1] } @results; open OUT, ">results.txt" or die $!; foreach( @final ) { my $fieldNames = $_->[1]; if ( ! exists $fields{ $fieldNames } ) { $fields{ $fieldNames } = 1; print OUT @{ $_ } } } close OUT;
Here's some sample input (heavily edited because the SQL statements are HUGE):
exec new_int_id 'documents', 'did' go INSERT documents ( did, DocID, Template ) VALUES ( 3093, '000000000000 +000000003093', 'Photo' ) go SELECT * FROM documents WHERE did=3093 go INSERT INTO Photo (AccessLevelID, ANRNumber, ColorID, ContactSheetID) +VALUES (1, '225058', 1, 4) go exec new_int_id 'Photo', 'PhotoID' go exec new_str_id 'documents', 'DocId' go INSERT INTO Photo (QualityID, Remark, UniqueID, VolNum) VALUES (1, 'St +. Nicolaasfeest -v. Moorsel', '211357', '1') go
My code works fine, but it's a great example of the "line noise" problem. Unfortuanately, I really can't work my mind around the problem of cleaning the code to the point where it's comprehensible to someone with a low level of Perl knowledge. I could break the regex out with the /x modifier, but that's only a partial solution. Any advice would be appreciated.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Trying to avoid line noise (brain cramp)
by Tyke (Pilgrim) on Mar 09, 2001 at 17:18 UTC
    This isn't going to be much help, but regular expressions are always going to look like line noise to someone who doesn't grok them.

    If you're writing for someone with little Perl knowledge, then all you can do is

    1. Write clean tight code. (which you have)
    2. Document it (which you were about to), even if it's on a line by line basis.


    _rant_on_
    I'm personally getting to the stage where I am p**d off about pandering to developpers who want to use a programming language without making an effort to learn, not just the syntax, but also the idioms. If regular expressions are part of the language, then use them as they were meant to be used. If conditional statement modifiers are there, then what's the point of pretending that they aren't?
    _rant_off_

    Just about the only thing that you can do is to teach the perl-deprived; and one of the most immediate ways to do that is to fully document the code.

    Just my two centimes.

    -- I knock my pate and fancy wit will come Knock as I please, there's no one at home a pontiff paraphrased
Re: Trying to avoid line noise (brain cramp)
by Corion (Patriarch) on Mar 09, 2001 at 17:23 UTC

    The main problem I see is, that you are using idiomatic Perl to solve the problem, which makes the solution elegant but requires Perl knowledge to understand the stuff.

    I see only one good solution to your problem, which does not change the code, because I see only minor things to change there (in fact, I'd change the die part to something like ... or die "couldn't open output file '$filename' : $!\n".

    I would put a comment that describes what each of the idiomatic constructs does, like this :

    my ( @results, %fields ); while(<>){ # Strip all NULL characters out of the current line s/\0//g; if (/(?<!from )INSERT INTO Photo/) { # Split that line into three parts and save a reference to i +t my @s = ( /([^(]+)(\([^)]+\))(.*)/ ); push @results, \@s; } } my @final = sort{ $a->[1] cmp $b->[1] } @results; open OUT, ">results.txt" or die $!; foreach( @final ) { # Print only one occurence of each statement my $fieldNames = $_->[1]; if ( ! exists $fields{ $fieldNames } ) { $fields{ $fieldNames } = 1; print OUT @{ $_ } } } close OUT;

    This will of course need some learning of Perl on the recipient side, but I think that those comment lines should help the "casual reader" to understand the program flow (if the "casual reader" knows english that is.

Re: Trying to avoid line noise (brain cramp)
by clemburg (Curate) on Mar 09, 2001 at 18:38 UTC

    Avoiding line noise in Perl is mainly about how to handle regexes, IMHO.

    A good tip here is to use variables in the regexes:

    # original: /([^(]+)(\([^)]+\))(.*)/ my $not_paren = "[^(]+"; my $paren = "\\("; my $rest = ".*"; # now reads: $string =~ /($not_paren)($paren$not_paren$paren)($rest)/;

    This will do a good job to make the code readable to other programmers, in my experience.

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

      I like your code, but there's a small error:
      # original: /([^(]+)(\([^)]+\))(.*)/ my $not_paren = "[^(]+"; my $leftparen = "\\("; my $rightparen = "\\)"; my $rest = ".*"; # now reads: $string =~ /($not_paren)($leftparen$not_paren$rightparen)($rest)/;

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        Ah, yes, sorry. So much for not testing everything.

        Christian Lemburg
        Brainbench MVP for Perl
        http://www.brainbench.com

        Here's one more way to de-line-noise-ify the code:
        my $leftparen = quotemeta "("; my $rightparen = quotemeta ")";
        This is more legible than "\\(" or "\Q(\E". -- Frag.
      On the same subject, don't forget about the "x"-modifier for regexen. It basically allows you to split a regex into multiple lines and comment on every line.

      Allow me to quote from perlman:perlfaq6, Section 1


      /x lets you turn this:

      s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
      into this:

      s{ < # opening angle bracket (?: # Non-backreffing grouping paren [^>'"] * # 0 or more things that are neither > nor +' nor " | # or else ".*?" # a section between double quotes (stingy +match) | # or else '.*?' # a section between single quotes (stingy +match) ) + # all occurring one or more times > # closing angle bracket }{}gsx; # replace with nothing, i.e. delete
      It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern.

      Again, hope that helps,
      Brother Ade

Re: Trying to avoid line noise (brain cramp)
by mirod (Canon) on Mar 09, 2001 at 17:58 UTC

    Instead of one big regexp that splits the line you could consume it chunk by chunk, which lets you break the regexp into smaller pieces and give names to the fields.

    In any case you will have to deal with the fact that (s and )s have to be escaped so it will look ugly no matter what you try. Is there a Perl 6 RFC that would allow us to change that character?

    Update: I have worked on the display part too, and figured that as naming things is good I should go all the way and use a hash to store the 3 parts of the SQL instruction. Plus I am not very familiar with hash slices so it was a good opportunity to use one ;--)

    #!/bin/perl -w use strict; my ( @results, %fields ); my $TABLE= 'Photo'; while(<DATA>){ s/\0//g; if ( s/(?<!from )(INSERT INTO $TABLE\s*)//) { my $table= $1; # easy to get +it here my $fields= $1 if( s/(\([^(]+\))//); # the complex +one my $values= $_; # just grab th +e rest push @results, { table => $table, # naming thing +s is good! fields => $fields, values => $values }; } } my @final = sort{ $a->{fields} cmp $b->{fields} } @results; # the so +rt is easier to understand open OUT, ">results.txt" or die $!; foreach my $insert ( @final ) { my $fieldNames = $insert->{fields}; if ( ! exists $fields{ $fieldNames } ) { $fields{ $fieldNames } = 1; print @$insert{qw(table fields values)}; # and n +ow the hash slice } } close OUT; __DATA__ go INSERT documents ( did, DocID, Template ) VALUES ( 3093, '000000000000 +000000003093', 'Photo' ) go SELECT * FROM documents WHERE did=3093 go INSERT INTO Photo (AccessLevelID, ANRNumber, ColorID, ContactSheetID) +VALUES (1, '225058', 1, 4) go exec new_int_id 'Photo', 'PhotoID' go exec new_str_id 'documents', 'DocId' go INSERT INTO Photo (QualityID, Remark, UniqueID, VolNum) VALUES (1, 'St +. Nicolaasfeest -v. Moorsel', '211357', '1') go
Re: Trying to avoid line noise (brain cramp)
by Masem (Monsignor) on Mar 09, 2001 at 17:40 UTC
    A couple more comments:

    Comment, comment, comment. If you interdisperse enough plain english (or whatever spoken language) amongst your code, it will look less like line noise and more like reasonable coding. In particular for lines that may seem to be 'line noise' due to just how perl handles them, in this case the regex lines, and the sort call. It makes it easy for the non-perl programmers to understand the logic here.

    Avoid using the 'shortcut' identifiers like $_ and $1, at least, assign other more descriptive variable names to these values, and if you have to specifically say what those are, comment them as well. It might add overhead, but if you need to design for maintainence and usability of the code, it's probably worth the tradeoff.

    For functions like open, exists, etc, where the parans around the arguments aren't necessary, it might help those more used to C and other languages to use parens to group arguements appropriately, such that they look less like 'loose words' and more like function calls.


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      If you try to avoid $_, $' and the likes, you can't avoid the "English.pm" module.

      So, after use English;, instead of $_ you can use $ARG, for $´ you have $PREMATCH, $. is $INPUT_LINE_NUMBER... you get the idea

      See perlvar for details.

      HTH
      Brother Ade

Re: Trying to avoid line noise (brain cramp)
by japhy (Canon) on Mar 09, 2001 at 19:20 UTC
    <plug type="shameless">You could suggest they check out the OGRE for an explanation of the regex, too. ;) </plug>

    But honestly, you might want to avoid the regex altogether. My solution might be construed as worse line noise, though, but it depends on how you comment it.

    use warnings; use strict; my (@results, @final, %seen); while(<>){ tr/\0//d; # strip NULs # if we're INSERTing into Photo (but not from)... if (/(?<!from )INSERT INTO Photo/) { # get: BEFORE (INSIDE) AFTER from string my @s = split /[()]/; push @results, [ $s[0], "($s[1])", $s[2] ]; } } # sort by the values in parens @final = sort { $a->[1] cmp $b->[1] } @results; open OUT, ">results.txt" or die $!; for my $line (@final) { my $fieldNames = $line->[1]; # if we've never seen these values, print if (not $seen{$fieldNames}) { print OUT @$line; $seen{$fieldName}++; } } close OUT;
    Of course, you could split more trickily, but it looks more like line noise then.
    # split BEFORE a ( or AFTER a ) my @s = split /(?=\()|(?<=\))/; push @results, \@s; ### that even lets you get away with leaving out @s push @results, [ split /(?=\()|(?<=\))/ ];
    And you could condense the "seen" logic to:
    # display the line if we've never seen the fields print OUT @$line if not $seen{$line->[1]}++;
    But that might be too idiomatic for lazy people to spend time understanding.

    japhy -- Perl and Regex Hacker
Re: Trying to avoid line noise (brain cramp)
by ChOas (Curate) on Mar 09, 2001 at 17:23 UTC
    Hi!

    I am sorry, but..

    I might have misunderstood the question...
    would this work: ?
    #!/usr/bin/perl -w use strict; my %DataBase; while(<>) { $DataBase{$_}++ if /(?<!from )INSERT INTO Photo/; }; open OUT, ">results.txt" or die $!; print OUT for sort keys %DataBase; close OUT;


    GreetZ!,
      ChOas

    print "profeth still\n" if /bird|devil/;
Re: Trying to avoid line noise (brain cramp)
by satchboost (Scribe) on Mar 10, 2001 at 01:11 UTC

    One issue I would be concerned with is why you have non-PERL coders looking at PERL code. As a test tools developer for a major corporation, we have allowed our users to write "privates" for their own testing needs. We also provided an English-like scripting language (gotta love eval!!) for them to automate most of their activities. Because of this, we have maybe 3 or 4 users who actually bother trying to write their own privates and the rest simply use the provided scripting language.

    I guess what I'm saying is that if you treat PERL as a scripting language that anyone with half a brain should be able to read without a problem (like BASIC), then you're going to run into problems. If you treat it as a full-fledged programming language that happens to be interpreted and not compiled, then your users will treat it with the appropriate respect. A given user wouldn't try and read your C, FORTRAN, or COBOL code, would they?

      I have a boss that want to "see how it works" and although he is not a Perl coder, he does program on Progress DBs.
      My point is that sometimes you don't have a choice about showing the code. So I use the "Comment liberally" rule.

      And yes, though he is my boss, he's a user. (duh)

Re: Trying to avoid line noise (brain cramp)
by markjugg (Curate) on Mar 12, 2001 at 07:09 UTC
    Out of all the ideas posted above, the one that resonates most with me so far is the idea of using a hash instead of an array for part of it. I would start here:
    my @s = ( /([^(]+)(\([^)]+\))(.*)/ );
    Using a hash would clarify what's being matched here (I'm not sure myself. ). Maybe it would be something like this (using a hash slice).
    my @s{qw/table fields values/}) = ( /([^(]+)(\([^)]+\))(.*)/ );
    that means that later on in the code you would be comparing {fields} instead of [1], which would be more intuitive. That s usually a trigger point for me-- If I'm doing much with the arrays by calling out specific positions like that, I usually ask myself "would be clearer to use a hash here?".

    Perhaps related, one of my favorite uses for hash slices at the moment is to put the results of a DBI selectrow_array statement into a hash in a single step like:

    @row{qw/name company title/} = $DBH->selectrow_array("...");

    -mark

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://63232]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-23 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found