Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Reading in a text file with "multi line lines"

by tsk1979 (Scribe)
on Nov 01, 2007 at 09:08 UTC ( [id://648453]=perlquestion: print w/replies, xml ) Need Help??

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a large text file which I want to read. I want to read it line by line ignoring all the lines which are blank or contain just "space". Now this text file has multi-line lines I.e this is a good \ nice day but not a good nice \ night This should be read in as one line, if "\" is the last character. Part one of the problem is relatively simple I simply set
$_ = '[^\\]\n'
But the problem is that with this I will read in
"this is a good \nice day but not a good nice \night"
I want it to be read as "this is a good nice day but not a good nice night". Any tips?

Replies are listed 'Best First'.
Re: Reading in a text file with "multi line lines"
by GrandFather (Saint) on Nov 01, 2007 at 09:27 UTC

    Perhaps you could show us the code that you have at present and a very small sample of data that demonstrates the problem. The code below may be a good starting point (you'll have to fill in bits though):

    use strict; use warnings; my $data = <<DATA; this is a good \ nice day but not a good nice \ night DATA open my $IN, '<', \$data; ... while (<$IN>) { ... } close $IN;

    with the intent being that you fill in the ... bits to match your current code with the code above providing a nice framework for the sample.


    Perl is environmentally friendly - it saves trees
Re: Reading in a text file with "multi line lines"
by jmcnamara (Monsignor) on Nov 01, 2007 at 12:42 UTC

    Here is one way to do it by setting $/ = '' to read the lines in paragraph mode (see perlvar) and then substituting out the eol characters:
    #!/usr/bin/perl -w use strict; $/ = ''; while (<DATA>) { next unless /\S/; chomp; s{\\\n}{}g; print "Line: ", $_,"\n"; } __DATA__ this is a good \ nice day but not a good nice \ night This is a test line This is a another test \ Line This is \ a final test line
    Which prints:
    Line: this is a good nice day but not a good nice night Line: This is a test line Line: This is a another test Line Line: This is \ a final test line

    --
    John.

Re: Reading in a text file with "multi line lines"
by Skeeve (Parson) on Nov 01, 2007 at 09:26 UTC
    while (<>) { # read line while (s/\\$/ /) { # while the current line ends with \ # replace it by blank and append a new line. chomp; # thanks to ww $_.= <>; }

    Update: the chomp should do it. Thanks to ww


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      Close, but the output doesn't join the lines, as OP asked. I think a big part of the problem is rooted in the thought-process behind your first comment, "current line ends with \," which, strictly speaking, is not the case. The lines in question actually end with a "\" and a newline.

      This helps (but does NOT solve; caveats below) if added after your second while :

      while (s/\n/ /s) { # remove the newlines my $out = $_; print $out; }

      Tested, more-or-less using GrandFather's "framework" (below). And just BTW or nitpicking, your (Skeeve's) comment "append a new line" would perhaps be clearer or less ambiguous if written as "append the next line from DATA.

      The problem (yep, this is the caveat) is that with multiple sets of data (which I infer from the OP's 'ignoring all the lines which are blank or contain just "space".') this still doesn't separate the sets, as does jmcnamara's, below, qv (and ++).

Re: Reading in a text file with "multi line lines"
by halley (Prior) on Nov 01, 2007 at 19:23 UTC
    To begin with, single quotes don't allow most escape-codes, including newlines. So in your example, $_ = '[^\\]\n', you must use double-quotes:
    $_ = "[^\\]\n";
    With single quotes, your string will have backslashes and letter-n characters in them.

    --
    [ e d @ h a l l e y . c c ]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://648453]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-19 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found