Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Just for completeness:

4) I don't know how these test cases were generated.

The windows text file was generated using the Notepad application on Windows 7, on a Samba share mapped as drive H: from a Slackware Linux 14.1 server. The Linux file was generated using joe on the Linux server. The old Mac file was generated by using Notepad++ on the Windows machine (I have no old Mac). Re^5: Regular expressions across multiple lines shows the hexdumps of the files.

There is no way to do that without being in Perl bin mode or writing a C program.

With my setup, there is a way to generate all three files on Linux, without using binmode. This trick abuses the fact that there is absolutely no difference between text mode and binary mode on unix:

/tmp/demo2>cat three-os.pl #!/usr/bin/perl use strict; use warnings; open OUT,'>','unix.txt' or die "unix.txt: $!"; print OUT "line 1\x0Aline 2\x0Aline 3\x0A"; close OUT; open OUT,'>','oldmac.txt' or die "oldmac.txt: $!"; print OUT "line 1\x0Dline 2\x0Dline 3\x0D"; close OUT; open OUT,'>','windows.txt' or die "windows.txt: $!"; print OUT "line 1\x0D\x0Aline 2\x0D\x0Aline 3\x0D\x0A"; close OUT; exec "file *.txt" or die "exec failed: $!"; /tmp/demo2>perl three-os.pl oldmac.txt: ASCII text, with CR line terminators unix.txt: ASCII text windows.txt: ASCII text, with CRLF line terminators /tmp/demo2>

This won't work on Windows, because for C and Perl on Windows, \n and \x0A are equal. Then, text mode translation happens and every \x0A is replaced with CRLF. Running the same script on Windows (again using the Samba share) will complain about a missing "file" utility and gives this result:

/tmp/demo2>file *.txt oldmac.txt: ASCII text, with CR line terminators unix.txt: ASCII text, with CRLF line terminators windows.txt: ASCII text, with CRLF, CR line terminators /tmp/demo2>od -tx1 -c windows.txt 0000000 6c 69 6e 65 20 31 0d 0d 0a 6c 69 6e 65 20 32 0 +d l i n e 1 \r \r \n l i n e 2 \ +r 0000020 0d 0a 6c 69 6e 65 20 33 0d 0d 0a \r \n l i n e 3 \r \r \n 0000033 /tmp/demo2>

The output from the file utility is a little bit misleading. The lines in windows.txt are terminated by CR CR LF, this can be seen in the output of od.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re^5: Regular expressions across multiple lines by afoken
in thread Regular expressions across multiple lines by abcd

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-20 03:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found