You can read the entire file into a scalar variable like this
{
open(FILE, "$filename") or die "Cant open $filename\n";
local $/ = undef;
$lines = <FILE>;
close(FILE);
}
Then you can just use your normal regular expression, but you'll probably want to use at least one of the following modifiers (from the perlre manpage):
m
Treat string as multiple lines. That is, change ``^'' and ``$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string,
s
Treat string as single line. That is, change ``.'' to match any character whatsoever, even a newline, which it normally would not match. The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms,
they let the ``.'' match any character whatsoever, while yet allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string. | [reply] [d/l] |
After you've opened and read the file (or web page) into an array, join all lines with join().
open(FILE, "$filename");
@lines = <FILE>;
close(FILE);
$content = join('', @lines);
After this, $content will be single-line and it is easy to do regexp with your existing functions. | [reply] |
You might not want to have your WHOLE file in one variable. Depending on the size of the file, it could eat a LOT of your memory. From my own experience, it is usually enough for me to do $/ = '\n\n' and then the linebreak is 2 new lines, not one.
I was parsing a bounce file when I was doing this, which was about 300megs in size, daily.
thats a LONG 300mb line.
$/ = '\n\n'; took care of it. i ended up with having.. smaller big lines, and was able to do what I wanted to do without consuming a lot of RAM.
| [reply] |
The key is two get the whole file into one scalar( the first 'while' loop). Then the 'g' modifier ( the condition in the second 'while' loop ) will keep the place of the last match found and continue from there until there are no matches found.
open( FH, "filename" ) || die "couldn't open\n";
while ( <FH> ) {
$data .= $_;
}
while ( $data =~ m/PATTERN/g ) {
# executed code
# executed code...etc.
}
-kel
| [reply] [d/l] |
If the only trouble you are having is that it isn't writing to a file is that you are not printing to a filehandle. Look at the open() docs (perldoc -f open) and perlopentut to learn the different ways to open a file and write to it.
Cheers,
KM | [reply] |
Remember that if the unwanted stuff appears more than one per line you'll need a /g to match globally.
$lines =~s/^unwantedstuff//gsm | [reply] [d/l] [select] |