Re: Cleaning Whitespace from Source Code
by davido (Cardinal) on Dec 08, 2004 at 22:35 UTC
|
It's not specifically related to just whitespace, but have you looked at Perl::Tidy? It will beautify and format your code for you.
Cleaning up whitespace in Perl code just with regexps is problematic since Perl's code can include both nonessential whitespace, and essential whitespace. For example, unless you're using a real parser, it's difficult to tell the difference between a HERE doc (which presumably has significant whitespace) and code (which presumably has some insignificant whitespace).
| [reply] |
Re: Cleaning Whitespace from Source Code
by Eimi Metamorphoumai (Deacon) on Dec 08, 2004 at 22:33 UTC
|
Have you looked at Perltidy? It can do a lot of source code reformatting, and if it doesn't do what you want, it should at least be a good starting point for modification. | [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Cleaning Whitespace from Source Code
by graff (Chancellor) on Dec 09, 2004 at 04:12 UTC
|
It's not clear (at least, not to me) what you mean by "cleaning whitespace from source code". The first interpretation that comes to my mind would be something like "removing all whitespace that is not syntactically significant to the compiler/interpreter" for each progamming language in question.
If this is what you mean, there's still the question of what you want to do with comments in the source code (remove them entirely, or just normalize whitespace?); presumably, you'll need to be able to identify the beginnings and endings of quoted strings, so you can leave the enclosed whitespace as is (assuming you don't want to change the output of the program as a side effect of "cleaning" the source code). In any case, you're probably going to need something like Parse::RecDescent, which is, effectively, the perl version of "yacc".
It'll be a challenge, and I wish you luck, but once you work out how the rules are stated, and figure out the rules you need for each language, switching from one rule set to another should be pretty trivial.
If you mean something else by "cleaning whitespace", it would be hard for me to guess what that might be.
| [reply] |
|
s{
( # $1 for whole match
' (?: \\. | [^']+ )* ' # 'string'
| " (?: \\. | [^"]+ )* " # "string"
| (?:^|\r*\n)[^\S\n]* # leading spaces
| ([^\S\n]+) # $2 is trivial spaces
)
}{
$2 ? ' ' : $1
}gsex
| [reply] [d/l] |
Re: Cleaning Whitespace from Source Code
by Fletch (Bishop) on Dec 08, 2004 at 23:59 UTC
|
Not perl, but many flavours of *NIX come with a program indent which reformats C code. If it's not available out of the box on your particular platform there's also a GNU version.
| [reply] |
Re: Cleaning Whitespace from Source Code
by Prior Nacre V (Hermit) on Dec 09, 2004 at 08:55 UTC
|
As already stated above, it is unclear exactly what you trying to achieve.
The following is probably too simplistic as a complete solution but the two methods described may be useful in parts of your final script.
- $/ is used here to remove duplicate blank lines
- y/// (alias tr///) is used here to remove duplicate spaces
(These are not their only uses.)
[ ~/tmp ] $ cat spacey_text
1 12 23 32 21 1
a b c d e
1 blank:
2 blanks:
3 blanks:
end_blanks
[ ~/tmp ] $ perl -we 'use strict; { local $/ = ""; while (<>) { $_ =~
+y/ / /s; print $_; } }' spacey_text
1 12 23 32 21 1
a b c d e
1 blank:
2 blanks:
3 blanks:
end_blanks
[ ~/tmp ] $
| [reply] [d/l] [select] |
Re: Cleaning Whitespace from Source Code
by zentara (Archbishop) on Dec 09, 2004 at 11:58 UTC
|
As others have pointed out, you can only strip leading and trailing whitespace, unless you want to risk messing up the code. Do a google search for the C-C++ Beautifier HOW-TO.
It's very good. Indent and bcpp have options for specifying how much internal whitespace you want.
Basically for perl use perltidy, for c use indent, for c++ use bcpp, and htmltidy for html. What I have is a little perl script that strips all leading whitespace( and possibly leading line numbers followed by zero or 1 colon(or semi-colon typo) , then I run it thru the above mentioned beautifiers. And yes, occasionally I need to manually fix something, when the thing breaks on an odd line or 2.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
Re: Cleaning Whitespace from Source Code
by jonadab (Parson) on Dec 09, 2004 at 01:37 UTC
|
This isn't a Perl solution, but I'd just hook some
custom elisp functions up to cperl-mode for this. The
advantage of this approach is that the functions can look
at the syntax information in the text properties, so that
you can for example choose not to alter the whitespace
inside strings or comments, without writing any code to
parse where strings and comments begin or end -- one less
wheel to reinvent. The disadvantage is that you have to
know elisp, in addition to Perl.
"In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."
— Pratico & Van Pelt, BBHG, p68
| [reply] |
Re: Cleaning Whitespace from Source Code
by teabag (Pilgrim) on Dec 09, 2004 at 15:30 UTC
|
perltidy -dws yourscript.pl
will create yourscript.pl.tdy with deleted whitespace.
you can try -mangle to delete newlines and make your script smaller but it will make your script look absolutely awfull. ;)
See "perltidy -h" or "man perltidy" for more usefull options
kind regards
teabag
Thank god there's only one way to peel a banana - Teabag
| [reply] [d/l] |