Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Range of chars except one

by Marcello (Hermit)
on Jan 19, 2002 at 20:59 UTC ( [id://140088]=perlquestion: print w/replies, xml ) Need Help??

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I use the following regexp to remove all unwanted characters (ord 0..31)
my $startCh = chr(0); my $endCh = chr(31); $text =~ s/[$startCh-$endCh]//g;
This works fine, but I want to preserve the newlines (ord 10). Is there any way to specify elements in the regexp which should NOT be included in the specified range?
I know I could do
$startCh = chr(0); $endCh = chr(9); $text =~ s/[$startCh-$endCh]//g; $startCh = chr(11); $endCh = chr(31); $text ==~ s/[$startCh-$endCh]//g;
but then I run into problems when I want to exclude another character.

TIA!

Replies are listed 'Best First'.
Re: Range of chars except one
by japhy (Canon) on Jan 19, 2002 at 21:16 UTC
    You'll be able to do that soonish. I'm working on a patch to Perl that will allow you to do character class set operations: s/[$start-$end^^$keep]//. That is, if $start = "a", and $end = "z", and $keep = "aeiou", the character class would be everything from "a" to "z" except "a", "e", "i", "o", and "u".

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Range of chars except one
by Juerd (Abbot) on Jan 19, 2002 at 21:49 UTC
    Until japhy's patch is implemented (japhy++ btw), you'll have to use two ranges (you can do that within a single character class). For clarity, I won't use variables, but the literal characters (\x00 .. \x1f) (I know \x00 can be written like \0, but i like consistency)

    $text =~ s/[\x00-\x09\x0b-\x1f]+//g; # + for speed # or $text =~ tr/\x00-\x09\x0b-\x1f//d; # tr/// for even more speed

    Another solution, but quite a bit slower, would be using a negative look-ahead assertion:
    Update (200201191932+0100) danger has a better solution: negative look-behind. This saves a lot of time, because only the characters that match the first range are subject for the assertion.
    # OLD:$text =~ s/(?!\n)[^\x00-\x1f]//g; $text =~ s/[^\x00-\x1f](?<!\n)//g; # (Can't use + now) # or (if you need to exclude another character, # let's say \n and \r # OLD:$text =~ s/(?![\r\n])[^\x00-\x1f]//g; $text =~ s/[^\x00-\x1f](?<![\r\n])//g; # Note: it's usually better to avoid \r and \n and use # literals like \cM and \cJ or \015, \012 or \x0d, \x0a.


    HTH. (Warning: untested code)

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Range of chars except one
by danger (Priest) on Jan 19, 2002 at 23:08 UTC

    Two ways come to mind. First, like Juerd's, use an assertion (lookbehind in this example):

    my $delete = join '','a'..'z'; # remove all lower case letters my $exclude = 'aeiou'; # exluding vowels $_ = 'Just Another Perl Hacker'; s/(?:[$delete](?<![$exclude]))+//g; print;

    Or, slightly more complex, remove the exclusion characters from the deletion set first. This should prove more efficient, especially if you are repeatedly applying the deletion regex after you create it:

    my $delete = join '','a'..'z'; # remove all lower case letters my $exclude = 'aeiou'; $delete =~ s/[$exclude]+//g; # excluding vowels $_ = 'Just Another Perl Hacker'; s/[$delete]+//g; print;

    Either way, it is relatively simple to add or remove characters from your exclusion set.

    Update: By "slightly more complex" I mean that it entails an extra step on your part --- in truth, I think it is the logically simpler version.

Re: Range of chars except one
by synapse0 (Pilgrim) on Jan 20, 2002 at 03:06 UTC
    I'm surprised no one's mentioned tr///; You'll still have to give it two ranges, but it works well.. It also has the added advantage (so far as i know) of being more efficient than s///;
    You can give it the hex values, for example:
    $string =~ tr/\x00-\x04\x07-\x09//d;

    This'll get rid of the range \x00 through \x04, leave characters \x05 and \x06, and delete the range \x07 through \x09...
    Of course.. it's too early for me to think straight, so the ranges given are bogus, but you get the idea..

    -SynZero
      This can be made even simpler. Remember that if a character is listed in a tr/// searchlist more than once, only the first occurance is meaningful. This means you can do something like:
      $string =~ tr/\n\000-\037/\n/d;
      (Note I'm using octal here). This specifies that the newline character is replaced by itself, while everything else in that range is deleted.

      There are three advantages to this method: <NL>

    • It's (IMHO) cleaner and easier to read.
    • It's easier to modify, e.g. to exempt additional characters from being deleted.
    • It doesn't care what character is used for newline on the current platform. For example, I seem to remember that \n and \r have their meanings reversed on one platform (Macintosh?) because the platform standard is to use CR for line breaks. </NL>
        True i thought about tr/// too... but "a range of chars except one", could also be read "a range of chars PLUS one" (i like to reverse things up). tr/// also offer a complement modifier to it.
        So when i need to delete chars, it seems more simple to write the one i actually want to keep, preventing me to forget some:
        $_ = "123,Let's keep letters!\nAnd new lines..."
        tr/A-Za-z \n//cd;
        print;
        will print:
        Lets keep letters
        And new lines

        Creating that range could be easier...

        It's just my two cents.
        Freddo
      I was thinking along the same lines, but the fact that Marcello is using variables rather than constants to define the range implies that his search ranges may vary. To use tr here with differing ranges, he must use eval. As we read in perlop,
      Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
      eval "tr/$oldlist/$newlist/"; die $@ if $@; eval "tr/$oldlist/$newlist/, 1" or die $@;
      HTH,

      Josh
Re: Range of chars except one
by petdance (Parson) on Jan 20, 2002 at 00:28 UTC
    All of the suggestions that have been posted are good ones, and you'll note that they're nothing like what you were trying. Perl is its own idiosyncratic self.

    Perl has an entirely different feel from what you're used to now (I'm guessing VB or C). Let it wash over you and enjoy it.

    xoxo,
    Andy
    --
    <megaphone> Throw down the gun and tiara and come out of the float! </megaphone>

Re: Range of chars except one
by I0 (Priest) on Jan 20, 2002 at 00:26 UTC
    $text =~ s/[^\x20-\xff\x0a]//g;
      Not all data is ascii. Your piece of code will remove all unicode characters, which may or may not be what you want.

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Range of chars except one
by Anonymous Monk on Jan 21, 2002 at 03:12 UTC
    I'm supprised no one has suggested a more complex range, try something like this:
    $startCh = chr(0); $midstopCh = chr(9); $midstartCh = chr(11); $endCh = chr(31); $text =~ s/[$startCh-$midstopCh,$midstartCh-$endCh]//g;
    which should work just fine, although I haven't tested it for speed. You can jsut keep specifing more ranges for those chars you wish to exclude, although it becomes rather clunky looking if you have a large number of ranges.

    CAVEMAN
      Beware of ',' (comma) in this regexp. It should not be there. Otherwise all commas will be stripped from $text.

      Update:

      I'm supprised no one has suggested a more complex range

      Actually Juerd's reply does suggest solution which uses complex range.

      --
      Ilya Martynov (http://martynov.org/)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://140088]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2024-04-26 01:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found