Range of chars except one

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I use the following regexp to remove all unwanted characters (ord 0..31)

my $startCh = chr(0);
my $endCh = chr(31);

$text =~ s/[$startCh-$endCh]//g;
[download]

This works fine, but I want to preserve the newlines (ord 10). Is there any way to specify elements in the regexp which should NOT be included in the specified range?
I know I could do

$startCh = chr(0);
$endCh = chr(9);

$text =~ s/[$startCh-$endCh]//g;

$startCh = chr(11);
$endCh = chr(31);

$text ==~ s/[$startCh-$endCh]//g;
[download]

but then I run into problems when I want to exclude another character.

TIA!

Comment on Range of chars except one Select or Download Code

Replies are listed 'Best First'.
Re: Range of chars except one by japhy (Canon) on Jan 19, 2002 at 21:16 UTC
You'll be able to do that soonish. I'm working on a patch to Perl that will allow you to do character class set operations: `s/[$start-$end^^$keep]//`. That is, if `$start = "a"`, and `$end = "z"`, and `$keep = "aeiou"`, the character class would be everything from "a" to "z" except "a", "e", "i", "o", and "u". _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply] [d/l]
Re: Range of chars except one by Juerd (Abbot) on Jan 19, 2002 at 21:49 UTC
Until japhy's patch is implemented (japhy++ btw), you'll have to use two ranges (you can do that within a single character class). For clarity, I won't use variables, but the literal characters (\x00 .. \x1f) (I know \x00 can be written like \0, but i like consistency) `$text =~ s/[\x00-\x09\x0b-\x1f]+//g; # + for speed # or $text =~ tr/\x00-\x09\x0b-\x1f//d; # tr/// for even more speed` [download] Another solution, but quite a bit slower, would be using a negative look-ahead assertion: Update (200201191932+0100) danger has a better solution: negative look-behind. This saves a lot of time, because only the characters that match the first range are subject for the assertion. `# OLD:$text =~ s/(?!\n)[^\x00-\x1f]//g; $text =~ s/[^\x00-\x1f](?<!\n)//g; # (Can't use + now) # or (if you need to exclude another character, # let's say \n and \r # OLD:$text =~ s/(?![\r\n])[^\x00-\x1f]//g; $text =~ s/[^\x00-\x1f](?<![\r\n])//g; # Note: it's usually better to avoid \r and \n and use # literals like \cM and \cJ or \015, \012 or \x0d, \x0a.` [download] HTH. (Warning: untested code) `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l] [select]
Re: Range of chars except one by danger (Priest) on Jan 19, 2002 at 23:08 UTC
Two ways come to mind. First, like Juerd's, use an assertion (lookbehind in this example): `my $delete = join '','a'..'z'; # remove all lower case letters my $exclude = 'aeiou'; # exluding vowels $_ = 'Just Another Perl Hacker'; s/(?:[$delete](?<![$exclude]))+//g; print;` [download] Or, slightly more complex, remove the exclusion characters from the deletion set first. This should prove more efficient, especially if you are repeatedly applying the deletion regex after you create it: `my $delete = join '','a'..'z'; # remove all lower case letters my $exclude = 'aeiou'; $delete =~ s/[$exclude]+//g; # excluding vowels $_ = 'Just Another Perl Hacker'; s/[$delete]+//g; print;` [download] Either way, it is relatively simple to add or remove characters from your exclusion set. Update: By "slightly more complex" I mean that it entails an extra step on your part --- in truth, I think it is the logically simpler version.	[reply] [d/l] [select]
Re: Range of chars except one by synapse0 (Pilgrim) on Jan 20, 2002 at 03:06 UTC
I'm surprised no one's mentioned tr///; You'll still have to give it two ranges, but it works well.. It also has the added advantage (so far as i know) of being more efficient than s///; You can give it the hex values, for example: $string =~ tr/\x00-\x04\x07-\x09//d; This'll get rid of the range \x00 through \x04, leave characters \x05 and \x06, and delete the range \x07 through \x09... Of course.. it's too early for me to think straight, so the ranges given are bogus, but you get the idea.. -SynZero	[reply]
Re: Re: Range of chars except one by kjherron (Pilgrim) on Jan 20, 2002 at 11:29 UTC
This can be made even simpler. Remember that if a character is listed in a tr/// searchlist more than once, only the first occurance is meaningful. This means you can do something like: `$string =~ tr/\n\000-\037/\n/d;` [download] (Note I'm using octal here). This specifies that the newline character is replaced by itself, while everything else in that range is deleted. There are three advantages to this method: <NL> It's (IMHO) cleaner and easier to read. It's easier to modify, e.g. to exempt additional characters from being deleted. It doesn't care what character is used for newline on the current platform. For example, I seem to remember that \n and \r have their meanings reversed on one platform (Macintosh?) because the platform standard is to use CR for line breaks. </NL>	[reply] [d/l]
Re: Re: Re: Range of chars except one by Anonymous Monk on Jan 21, 2002 at 08:47 UTC
True i thought about tr/// too... but "a range of chars except one", could also be read "a range of chars PLUS one" (i like to reverse things up). tr/// also offer a complement modifier to it. So when i need to delete chars, it seems more simple to write the one i actually want to keep, preventing me to forget some: $_ = "123,Let's keep letters!\nAnd new lines..." tr/A-Za-z \n//cd; print; will print: Lets keep letters And new lines Creating that range could be easier... It's just my two cents. Freddo	[reply]
Re: Re: Range of chars except one by jlf (Scribe) on Jan 20, 2002 at 06:08 UTC
I was thinking along the same lines, but the fact that Marcello is using variables rather than constants to define the range implies that his search ranges may vary. To use tr here with differing ranges, he must use eval. As we read in perlop, Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval(): `eval "tr/$oldlist/$newlist/"; die $@ if $@; eval "tr/$oldlist/$newlist/, 1" or die $@;` [download] HTH, Josh	[reply] [d/l]
Re: Range of chars except one by petdance (Parson) on Jan 20, 2002 at 00:28 UTC
All of the suggestions that have been posted are good ones, and you'll note that they're nothing like what you were trying. Perl is its own idiosyncratic self. Perl has an entirely different feel from what you're used to now (I'm guessing VB or C). Let it wash over you and enjoy it. xoxo, Andy -- <megaphone> Throw down the gun and tiara and come out of the float! </megaphone>	[reply]
Re: Range of chars except one by I0 (Priest) on Jan 20, 2002 at 00:26 UTC
`$text =~ s/[^\x20-\xff\x0a]//g;`	[reply] [d/l]
Re: Re: Range of chars except one by Juerd (Abbot) on Jan 20, 2002 at 01:06 UTC
Not all data is ascii. Your piece of code will remove all unicode characters, which may or may not be what you want. `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l]
Re: Range of chars except one by Anonymous Monk on Jan 21, 2002 at 03:12 UTC
I'm supprised no one has suggested a more complex range, try something like this: `$startCh = chr(0); $midstopCh = chr(9); $midstartCh = chr(11); $endCh = chr(31); $text =~ s/[$startCh-$midstopCh,$midstartCh-$endCh]//g;` [download] which should work just fine, although I haven't tested it for speed. You can jsut keep specifing more ranges for those chars you wish to exclude, although it becomes rather clunky looking if you have a large number of ranges. CAVEMAN	[reply] [d/l]
Re: Re: Range of chars except one by IlyaM (Parson) on Jan 21, 2002 at 05:04 UTC
Beware of ',' (comma) in this regexp. It should not be there. Otherwise all commas will be stripped from $text. Update: I'm supprised no one has suggested a more complex range Actually Juerd's reply does suggest solution which uses complex range. -- Ilya Martynov (http://martynov.org/)	[reply]

Back to Seekers of Perl Wisdom