Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Cleaning Whitespace from Source Code

by hackdaddy (Hermit)
on Dec 08, 2004 at 22:24 UTC ( #413352=perlquestion: print w/replies, xml ) Need Help??

hackdaddy has asked for the wisdom of the Perl Monks concerning the following question:

I am creating a Perl script to remove whitespace from source code using regular expressions.

Without reinventing the wheel, are there any existing Perl scripts, modules, or tools for cleaning whitespace from source code?

What is the best way to create a program to solve this problem? It must have a specific set of rules that can be turned on and off.


Update: The script must support and have rule sets for different languages such as C, C++, C#, etc.

Replies are listed 'Best First'.
Re: Cleaning Whitespace from Source Code
by davido (Cardinal) on Dec 08, 2004 at 22:35 UTC

    It's not specifically related to just whitespace, but have you looked at Perl::Tidy? It will beautify and format your code for you.

    Cleaning up whitespace in Perl code just with regexps is problematic since Perl's code can include both nonessential whitespace, and essential whitespace. For example, unless you're using a real parser, it's difficult to tell the difference between a HERE doc (which presumably has significant whitespace) and code (which presumably has some insignificant whitespace).


Re: Cleaning Whitespace from Source Code
by Eimi Metamorphoumai (Deacon) on Dec 08, 2004 at 22:33 UTC
    Have you looked at Perltidy? It can do a lot of source code reformatting, and if it doesn't do what you want, it should at least be a good starting point for modification.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Cleaning Whitespace from Source Code
by graff (Chancellor) on Dec 09, 2004 at 04:12 UTC
    It's not clear (at least, not to me) what you mean by "cleaning whitespace from source code". The first interpretation that comes to my mind would be something like "removing all whitespace that is not syntactically significant to the compiler/interpreter" for each progamming language in question.

    If this is what you mean, there's still the question of what you want to do with comments in the source code (remove them entirely, or just normalize whitespace?); presumably, you'll need to be able to identify the beginnings and endings of quoted strings, so you can leave the enclosed whitespace as is (assuming you don't want to change the output of the program as a side effect of "cleaning" the source code). In any case, you're probably going to need something like Parse::RecDescent, which is, effectively, the perl version of "yacc".

    It'll be a challenge, and I wish you luck, but once you work out how the rules are stated, and figure out the rules you need for each language, switching from one rule set to another should be pretty trivial.

    If you mean something else by "cleaning whitespace", it would be hard for me to guess what that might be.

      Parser, smarser. Just use a simple regex. As a start:

      s{ ( # $1 for whole match ' (?: \\. | [^']+ )* ' # 'string' | " (?: \\. | [^"]+ )* " # "string" | (?:^|\r*\n)[^\S\n]* # leading spaces | ([^\S\n]+) # $2 is trivial spaces ) }{ $2 ? ' ' : $1 }gsex

      - tye        

Re: Cleaning Whitespace from Source Code
by Fletch (Chancellor) on Dec 08, 2004 at 23:59 UTC

    Not perl, but many flavours of *NIX come with a program indent which reformats C code. If it's not available out of the box on your particular platform there's also a GNU version.

Re: Cleaning Whitespace from Source Code
by Prior Nacre V (Hermit) on Dec 09, 2004 at 08:55 UTC

    As already stated above, it is unclear exactly what you trying to achieve.

    The following is probably too simplistic as a complete solution but the two methods described may be useful in parts of your final script.

    • $/ is used here to remove duplicate blank lines
    • y/// (alias tr///) is used here to remove duplicate spaces

    (These are not their only uses.)

    [ ~/tmp ] $ cat spacey_text 1 12 23 32 21 1 a b c d e 1 blank: 2 blanks: 3 blanks: end_blanks [ ~/tmp ] $ perl -we 'use strict; { local $/ = ""; while (<>) { $_ =~ +y/ / /s; print $_; } }' spacey_text 1 12 23 32 21 1 a b c d e 1 blank: 2 blanks: 3 blanks: end_blanks [ ~/tmp ] $



Re: Cleaning Whitespace from Source Code
by zentara (Archbishop) on Dec 09, 2004 at 11:58 UTC
    As others have pointed out, you can only strip leading and trailing whitespace, unless you want to risk messing up the code. Do a google search for the C-C++ Beautifier HOW-TO. It's very good. Indent and bcpp have options for specifying how much internal whitespace you want.

    Basically for perl use perltidy, for c use indent, for c++ use bcpp, and htmltidy for html. What I have is a little perl script that strips all leading whitespace( and possibly leading line numbers followed by zero or 1 colon(or semi-colon typo) , then I run it thru the above mentioned beautifiers. And yes, occasionally I need to manually fix something, when the thing breaks on an odd line or 2.

    I'm not really a human, but I play one on earth. flash japh
Re: Cleaning Whitespace from Source Code
by jonadab (Parson) on Dec 09, 2004 at 01:37 UTC
    This isn't a Perl solution, but I'd just hook some custom elisp functions up to cperl-mode for this. The advantage of this approach is that the functions can look at the syntax information in the text properties, so that you can for example choose not to alter the whitespace inside strings or comments, without writing any code to parse where strings and comments begin or end -- one less wheel to reinvent. The disadvantage is that you have to know elisp, in addition to Perl.

    "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68
Re: Cleaning Whitespace from Source Code
by teabag (Pilgrim) on Dec 09, 2004 at 15:30 UTC
    perltidy -dws
    will create with deleted whitespace.
    you can try -mangle to delete newlines and make your script smaller
    but it will make your script look absolutely awfull. ;)

    See "perltidy -h" or "man perltidy" for more usefull options

    kind regards

    Thank god there's only one way to peel a banana - Teabag

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://413352]
Approved by Eimi Metamorphoumai
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2021-01-16 09:44 GMT
Find Nodes?
    Voting Booth?