Trying to avoid line noise (brain cramp)

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Trying to avoid line noise (brain cramp) by Tyke (Pilgrim) on Mar 09, 2001 at 17:18 UTC
This isn't going to be much help, but regular expressions are always going to look like line noise to someone who doesn't grok them. If you're writing for someone with little Perl knowledge, then all you can do is Write clean tight code. (which you have) Document it (which you were about to), even if it's on a line by line basis. _rant_on_ I'm personally getting to the stage where I am p*d off about pandering to developpers who want to use a programming language without making an effort to learn, not just the syntax, but also the idioms. If regular expressions are part of the language, then use them as they were meant to be used. If conditional statement modifiers are there, then what's the point of pretending that they aren't? _rant_off_* Just about the only thing that you can do is to teach the perl-deprived; and one of the most immediate ways to do that is to fully document the code. Just my two centimes. `-- I knock my pate and fancy wit will come Knock as I please, there's no one at home a pontiff paraphrased` [download]	[reply] [d/l]
Re: Trying to avoid line noise (brain cramp) by Corion (Patriarch) on Mar 09, 2001 at 17:23 UTC
The main problem I see is, that you are using idiomatic Perl to solve the problem, which makes the solution elegant but requires Perl knowledge to understand the stuff. I see only one good solution to your problem, which does not change the code, because I see only minor things to change there (in fact, I'd change the `die` part to something like `... or die "couldn't open output file '$filename' : $!\n"`. I would put a comment that describes what each of the idiomatic constructs does, like this : my ( @results, %fields ); while(<>){ # Strip all NULL characters out of the current line s/\0//g; if (/(?<!from )INSERT INTO Photo/) { # Split that line into three parts and save a reference to i +t my @s = ( /([^(]+)($[^)]+$)(.*)/ ); push @results, \@s; } } my @final = sort{ $a->[1] cmp $b->[1] } @results; open OUT, ">results.txt" or die $!; foreach( @final ) { # Print only one occurence of each statement my $fieldNames = $_->[1]; if ( ! exists $fields{ $fieldNames } ) { $fields{ $fieldNames } = 1; print OUT @{ $_ } } } close OUT; [download] This will of course need some learning of Perl on the recipient side, but I think that those comment lines should help the "casual reader" to understand the program flow (if the "casual reader" knows english that is.	[reply] [d/l]
Re: Trying to avoid line noise (brain cramp) by clemburg (Curate) on Mar 09, 2001 at 18:38 UTC
Avoiding line noise in Perl is mainly about how to handle regexes, IMHO. A good tip here is to use variables in the regexes: `# original: /([^(]+)($[^)]+$)(.)/ my $not_paren = "[^(]+"; my $paren = "\\("; my $rest = "."; # now reads: $string =~ /($not_paren)($paren$not_paren$paren)($rest)/;` [download] This will do a good job to make the code readable to other programmers, in my experience. Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply] [d/l]
Re: Re: Trying to avoid line noise (brain cramp) by Ovid (Cardinal) on Mar 09, 2001 at 18:45 UTC
I like your code, but there's a small error: `# original: /([^(]+)($[^)]+$)(.)/ my $not_paren = "[^(]+"; my $leftparen = "\$"; my $rightparen = "\$"; my $rest = "."; # now reads: $string =~ /($not_paren)($leftparen$not_paren$rightparen)($rest)/;` [download] Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l]
Re: Re: Re: Trying to avoid line noise (brain cramp) by clemburg (Curate) on Mar 09, 2001 at 18:47 UTC
Ah, yes, sorry. So much for not testing everything. Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply]
Re: Re: Re: Trying to avoid line noise (brain cramp) by frag (Hermit) on Mar 09, 2001 at 20:51 UTC
Here's one more way to de-line-noise-ify the code: `my $leftparen = quotemeta "("; my $rightparen = quotemeta ")";` [download] This is more legible than "\\(" or "\Q(\E". -- Frag.	[reply] [d/l]
Re: Re: Trying to avoid line noise (brain cramp) by BrotherAde (Pilgrim) on Mar 09, 2001 at 18:53 UTC
On the same subject, don't forget about the "x"-modifier for regexen. It basically allows you to split a regex into multiple lines and comment on every line. Allow me to quote from perlman:perlfaq6, Section 1 /x lets you turn this: `s{<(?:[^>'"]\|".?"\|'.?')+>}{}gs;` [download] into this: `s{ < # opening angle bracket (?: # Non-backreffing grouping paren [^>'"] # 0 or more things that are neither > nor +' nor " \| # or else ".?" # a section between double quotes (stingy +match) \| # or else '.?' # a section between single quotes (stingy +match) ) + # all occurring one or more times > # closing angle bracket }{}gsx; # replace with nothing, i.e. delete` [download] It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern. Again, hope that helps, Brother Ade	[reply] [d/l] [select]
Re: Trying to avoid line noise (brain cramp) by mirod (Canon) on Mar 09, 2001 at 17:58 UTC
Instead of one big regexp that splits the line you could consume it chunk by chunk, which lets you break the regexp into smaller pieces and give names to the fields. In any case you will have to deal with the fact that `(`s and `)`s have to be escaped so it will look ugly no matter what you try. Is there a Perl 6 RFC that would allow us to change that character? Update: I have worked on the display part too, and figured that as naming things is good I should go all the way and use a hash to store the 3 parts of the SQL instruction. Plus I am not very familiar with hash slices so it was a good opportunity to use one ;--) #!/bin/perl -w use strict; my ( @results, %fields ); my $TABLE= 'Photo'; while(<DATA>){ s/\0//g; if ( s/(?<!from )(INSERT INTO $TABLE\s)//) { my $table= $1; # easy to get +it here my $fields= $1 if( s/($[^(]+$)//); # the complex +one my $values= $_; # just grab th +e rest push @results, { table => $table, # naming thing +s is good! fields => $fields, values => $values }; } } my @final = sort{ $a->{fields} cmp $b->{fields} } @results; # the so +rt is easier to understand open OUT, ">results.txt" or die $!; foreach my $insert ( @final ) { my $fieldNames = $insert->{fields}; if ( ! exists $fields{ $fieldNames } ) { $fields{ $fieldNames } = 1; print @$insert{qw(table fields values)}; # and n +ow the hash slice } } close OUT; __DATA__ go INSERT documents ( did, DocID, Template ) VALUES ( 3093, '000000000000 +000000003093', 'Photo' ) go SELECT FROM documents WHERE did=3093 go INSERT INTO Photo (AccessLevelID, ANRNumber, ColorID, ContactSheetID) +VALUES (1, '225058', 1, 4) go exec new_int_id 'Photo', 'PhotoID' go exec new_str_id 'documents', 'DocId' go INSERT INTO Photo (QualityID, Remark, UniqueID, VolNum) VALUES (1, 'St +. Nicolaasfeest -v. Moorsel', '211357', '1') go [download]	[reply] [d/l]
Re: Trying to avoid line noise (brain cramp) by Masem (Monsignor) on Mar 09, 2001 at 17:40 UTC
A couple more comments: Comment, comment, comment. If you interdisperse enough plain english (or whatever spoken language) amongst your code, it will look less like line noise and more like reasonable coding. In particular for lines that may seem to be 'line noise' due to just how perl handles them, in this case the regex lines, and the sort call. It makes it easy for the non-perl programmers to understand the logic here. Avoid using the 'shortcut' identifiers like $_ and $1, at least, assign other more descriptive variable names to these values, and if you have to specifically say what those are, comment them as well. It might add overhead, but if you need to design for maintainence and usability of the code, it's probably worth the tradeoff. For functions like open, exists, etc, where the parans around the arguments aren't necessary, it might help those more used to C and other languages to use parens to group arguements appropriately, such that they look less like 'loose words' and more like function calls. Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain	[reply]
Re: Re: Trying to avoid line noise (brain cramp) by BrotherAde (Pilgrim) on Mar 09, 2001 at 18:43 UTC
If you try to avoid $_, $' and the likes, you can't avoid the "English.pm" module. So, after `use English;`, instead of $_ you can use $ARG, for $� you have $PREMATCH, $. is $INPUT_LINE_NUMBER... you get the idea See perlvar for details. HTH Brother Ade	[reply] [d/l]
Re: Trying to avoid line noise (brain cramp) by japhy (Canon) on Mar 09, 2001 at 19:20 UTC
`<plug type="shameless">`You could suggest they check out the OGRE for an explanation of the regex, too. `;) </plug>` But honestly, you might want to avoid the regex altogether. My solution might be construed as worse line noise, though, but it depends on how you comment it. use warnings; use strict; my (@results, @final, %seen); while(<>){ tr/\0//d; # strip NULs # if we're INSERTing into Photo (but not from)... if (/(?<!from )INSERT INTO Photo/) { # get: BEFORE (INSIDE) AFTER from string my @s = split /[()]/; push @results, [ $s[0], "($s[1])", $s[2] ]; } } # sort by the values in parens @final = sort { $a->[1] cmp $b->[1] } @results; open OUT, ">results.txt" or die $!; for my $line (@final) { my $fieldNames = $line->[1]; # if we've never seen these values, print if (not $seen{$fieldNames}) { print OUT @$line; $seen{$fieldName}++; } } close OUT; [download] Of course, you could split more trickily, but it looks more like line noise then. `# split BEFORE a ( or AFTER a ) my @s = split /(?=$)\|(?<=$)/; push @results, \@s; ### that even lets you get away with leaving out @s push @results, [ split /(?=$)\|(?<=$)/ ];` [download] And you could condense the "seen" logic to: `# display the line if we've never seen the fields print OUT @$line if not $seen{$line->[1]}++;` [download] But that might be too idiomatic for lazy people to spend time understanding. `japhy` -- Perl and Regex Hacker	[reply] [d/l] [select]
Re: Trying to avoid line noise (brain cramp) by ChOas (Curate) on Mar 09, 2001 at 17:23 UTC
Hi! I am sorry, but.. I might have misunderstood the question... would this work: ? `#!/usr/bin/perl -w use strict; my %DataBase; while(<>) { $DataBase{$_}++ if /(?<!from )INSERT INTO Photo/; }; open OUT, ">results.txt" or die $!; print OUT for sort keys %DataBase; close OUT;` [download] GreetZ!, ChOas print "profeth still\n" if /bird\|devil/;	[reply] [d/l]
Re: Trying to avoid line noise (brain cramp) by satchboost (Scribe) on Mar 10, 2001 at 01:11 UTC
One issue I would be concerned with is why you have non-PERL coders looking at PERL code. As a test tools developer for a major corporation, we have allowed our users to write "privates" for their own testing needs. We also provided an English-like scripting language (gotta love eval!!) for them to automate most of their activities. Because of this, we have maybe 3 or 4 users who actually bother trying to write their own privates and the rest simply use the provided scripting language. I guess what I'm saying is that if you treat PERL as a scripting language that anyone with half a brain should be able to read without a problem (like BASIC), then you're going to run into problems. If you treat it as a full-fledged programming language that happens to be interpreted and not compiled, then your users will treat it with the appropriate respect. A given user wouldn't try and read your C, FORTRAN, or COBOL code, would they?	[reply]
Re: Re: Trying to avoid line noise (brain cramp) by AltGrendel (Acolyte) on Mar 10, 2001 at 07:24 UTC
I have a boss that want to "see how it works" and although he is not a Perl coder, he does program on Progress DBs. My point is that sometimes you don't have a choice about showing the code. So I use the "Comment liberally" rule. And yes, though he is my boss, he's a user. (duh)	[reply]
Re: Trying to avoid line noise (brain cramp) by markjugg (Curate) on Mar 12, 2001 at 07:09 UTC
Out of all the ideas posted above, the one that resonates most with me so far is the idea of using a hash instead of an array for part of it. I would start here: `my @s = ( /([^(]+)($[^)]+$)(.)/ );` [download] Using a hash would clarify what's being matched here (I'm not sure myself. ). Maybe it would be something like this (using a hash slice). `my @s{qw/table fields values/}) = ( /([^(]+)($[^)]+$)(.)/ );` [download] that means that later on in the code you would be comparing {fields} instead of [1], which would be more intuitive. That s usually a trigger point for me-- If I'm doing much with the arrays by calling out specific positions like that, I usually ask myself "would be clearer to use a hash here?". Perhaps related, one of my favorite uses for hash slices at the moment is to put the results of a DBI selectrow_array statement into a hash in a single step like: `@row{qw/name company title/} = $DBH->selectrow_array("...");` [download] -mark	[reply] [d/l] [select]


more useful options
	PerlMonks