Re: Range of chars except one
by japhy (Canon) on Jan 19, 2002 at 21:16 UTC
|
You'll be able to do that soonish. I'm working on a patch to Perl that will allow you to do character class set operations: s/[$start-$end^^$keep]//. That is, if $start = "a", and $end = "z", and $keep = "aeiou", the character class would be everything from "a" to "z" except "a", "e", "i", "o", and "u".
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
Re: Range of chars except one
by Juerd (Abbot) on Jan 19, 2002 at 21:49 UTC
|
Until japhy's patch is implemented (japhy++ btw), you'll have to use two ranges (you can do that within a single character class). For clarity, I won't use variables, but the literal characters (\x00 .. \x1f) (I know \x00 can be written like \0, but i like consistency)
$text =~ s/[\x00-\x09\x0b-\x1f]+//g; # + for speed
# or
$text =~ tr/\x00-\x09\x0b-\x1f//d; # tr/// for even more speed
Another solution, but quite a bit slower, would be using a negative look-ahead assertion:
Update (200201191932+0100) danger has a better solution: negative look-behind. This saves a lot of time, because only the characters that match the first range are subject for the assertion.
# OLD:$text =~ s/(?!\n)[^\x00-\x1f]//g;
$text =~ s/[^\x00-\x1f](?<!\n)//g; # (Can't use + now)
# or (if you need to exclude another character,
# let's say \n and \r
# OLD:$text =~ s/(?![\r\n])[^\x00-\x1f]//g;
$text =~ s/[^\x00-\x1f](?<![\r\n])//g;
# Note: it's usually better to avoid \r and \n and use
# literals like \cM and \cJ or \015, \012 or \x0d, \x0a.
HTH. (Warning: untested code)
2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
| [reply] [d/l] [select] |
Re: Range of chars except one
by danger (Priest) on Jan 19, 2002 at 23:08 UTC
|
Two ways come to mind. First, like Juerd's, use an assertion
(lookbehind in this example):
my $delete = join '','a'..'z'; # remove all lower case letters
my $exclude = 'aeiou'; # exluding vowels
$_ = 'Just Another Perl Hacker';
s/(?:[$delete](?<![$exclude]))+//g;
print;
Or, slightly more complex, remove the exclusion characters from the
deletion set first. This should prove more efficient, especially if
you are repeatedly applying the deletion regex after you create it:
my $delete = join '','a'..'z'; # remove all lower case letters
my $exclude = 'aeiou';
$delete =~ s/[$exclude]+//g; # excluding vowels
$_ = 'Just Another Perl Hacker';
s/[$delete]+//g;
print;
Either way, it is relatively simple to add or remove characters from
your exclusion set.
Update: By "slightly more complex" I mean that it entails an extra
step on your part --- in truth, I think it is the logically simpler
version.
| [reply] [d/l] [select] |
Re: Range of chars except one
by synapse0 (Pilgrim) on Jan 20, 2002 at 03:06 UTC
|
I'm surprised no one's mentioned tr///; You'll still have to give it two ranges, but it works well.. It also has the added advantage (so far as i know) of being more efficient than s///;
You can give it the hex values, for example:
$string =~ tr/\x00-\x04\x07-\x09//d;
This'll get rid of the range \x00 through \x04, leave characters \x05 and \x06, and delete the range \x07 through \x09...
Of course.. it's too early for me to think straight, so the ranges given are bogus, but you get the idea..
-SynZero | [reply] |
|
This can be made even simpler. Remember that if a character is listed in a tr/// searchlist more than once, only the first occurance is meaningful. This means you can do something like:
$string =~ tr/\n\000-\037/\n/d;
(Note I'm using octal here). This specifies that the newline character is replaced by itself, while everything else in that range is deleted.
There are three advantages to this method:
<NL>
It's (IMHO) cleaner and easier to read.
It's easier to modify, e.g. to exempt additional characters from being deleted.
It doesn't care what character is used for newline on the current platform. For example, I seem to remember that \n and \r have their meanings reversed on one platform (Macintosh?) because the platform standard is to use CR for line breaks.
</NL> | [reply] [d/l] |
|
True i thought about tr/// too...
but "a range of chars except one", could also be read "a range of chars PLUS one" (i like to reverse things up). tr/// also offer a complement modifier to it. So when i need to delete chars, it seems more simple to write the one i actually want to keep, preventing me to forget some:
$_ = "123,Let's keep letters!\nAnd new lines..."
tr/A-Za-z \n//cd;
print; will print:Lets keep letters
And new lines
Creating that range could be easier...
It's just my two cents.
Freddo | [reply] |
|
I was thinking along the same lines, but the fact that Marcello is using variables rather than constants to define the range implies that his search ranges may vary. To use tr here with differing ranges, he must use eval. As we read in perlop,
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $@ if $@;
eval "tr/$oldlist/$newlist/, 1" or die $@;
HTH,
Josh | [reply] [d/l] |
Re: Range of chars except one
by petdance (Parson) on Jan 20, 2002 at 00:28 UTC
|
All of the suggestions that have been posted are good ones, and you'll note that they're nothing like what you were trying. Perl is its own idiosyncratic self.
Perl has an entirely different feel from what you're used to now (I'm guessing VB or C). Let it wash over you and enjoy it.
xoxo,
Andy
--
<megaphone>
Throw down the gun and tiara and come out of the float!
</megaphone>
| [reply] |
Re: Range of chars except one
by I0 (Priest) on Jan 20, 2002 at 00:26 UTC
|
$text =~ s/[^\x20-\xff\x0a]//g; | [reply] [d/l] |
|
| [reply] [d/l] |
Re: Range of chars except one
by Anonymous Monk on Jan 21, 2002 at 03:12 UTC
|
I'm supprised no one has suggested a more complex range, try something like this:
$startCh = chr(0);
$midstopCh = chr(9);
$midstartCh = chr(11);
$endCh = chr(31);
$text =~ s/[$startCh-$midstopCh,$midstartCh-$endCh]//g;
which should work just fine, although I haven't tested it for speed. You can jsut keep specifing more ranges for those chars you wish to exclude, although it becomes rather clunky looking if you have a large number of ranges.
CAVEMAN | [reply] [d/l] |
|
Beware of ',' (comma) in this regexp. It should not be there. Otherwise all commas will be stripped from $text.
Update:
I'm supprised no one has suggested a more complex range
Actually Juerd's reply does suggest solution which uses complex range.
--
Ilya Martynov
(http://martynov.org/)
| [reply] |