Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: cleaning up control characters

by blackmateria (Chaplain)
on Oct 27, 2001 at 00:49 UTC ( [id://121693]=note: print w/replies, xml ) Need Help??


in reply to cleaning up control characters

Looks like you've almost got it right. I think the problem is this regexp: $line =~ s/^\D{0,2}|\s{0,2}//;

I'm not sure what you're trying to do there, especially with the \s trimming (are some of these junk characters spaces?) Assuming you actually want to purge control characters (i.e. ascii range 0-31 & friends) and spaces, use the POSIX [:cntrl:] character class, like this (see perlre for more information): $line =~ s/^([[:cntrl:]]|\s){2,}//;

This should delete all control characters and spaces from the beginning of any lines that start with two or more of them. (Unfortunately it will also strip lines with just leading spaces and no control characters, e.g. indented lines -- without seeing the data I don't know if this matters to you.) But why not just forget the {2,} and eliminate any leading control characters? $line =~ s/^([[:cntrl:]]|\s)+//;

If you want to keep leading spaces unless they're also mixed in with control characters: $line =~ s/^([[:cntrl:]]|\s)+// if ($line =~ /^([[:cntrl:]]|\s)+/ && $1 =~ /[[:cntrl:]]/);

I'm not sure if that "clever" trick with the "$1=~" is legit (it syntax checks OK at least); maybe some other monk could clarify this. Unfortunately I don't know what your data looks like, so I can't really test these too well. Hope this helps though.

Replies are listed 'Best First'.
Re: Re: cleaning up control characters
by skinnymofo (Scribe) on Oct 27, 2001 at 01:06 UTC
    Hey blackmateria, thanks! Your suggestion for using the POSIX control class did the trick. FYI, the data is an application error log, so I'm not worried about indents or leading spaces.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://121693]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-16 21:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found