Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: PERL Command line to batch add filename to start of file in UTF-16le

by graff (Chancellor)
on Dec 06, 2015 at 00:10 UTC ( [id://1149485]=note: print w/replies, xml ) Need Help??


in reply to Re^2: PERL Command line to batch add filename to start of file in UTF-16le
in thread PERL Command line to batch add filename to start of file in UTF-16le

If you ran something like this:
perl -i -pe '(code that runs but does the wrong thing)' `find . -name + '*.TXT'`
I hope you had a backup copy of those text files, so that you could start over with the original data. If you can't restore the versions of the files as they were before that command line was run, well... you've got a much harder problem now. For one thing, if the files had been pure UTF-16 before you ran that command, then they probably had a mix if utf8 (ASCII) and UTF-16 content after you ran it -- and other things may have gone wrong as well.

If you can restore the original files, and if that version of the data was all pure UTF-16LE, then something like the following would do what you want:

perl -i.bak -M'open IO => ":encoding(UTF-16LE)"' -pe 'BEGIN{undef $/} +s/(?<=\x{feff})/Filename: $ARGV\n/' `find . -name '*.TXT'`
Note the following:
  • The "-i" option includes a file extension to be added to the name of the original file, so that it will not be overwritten by the new version. (That is, an original file X.TXT is renamed to "X.TXT.bak" before the new version of X.TXT is created.)
  • The "-M" option invokes the "open" pragma, to set all IO handles to UTF-16LE encoding (for all files on the command line, text is converted from UTF-16LE to perl-internal utf8 on input, and back to UTF-16LE on output).
  • The s/// operator is used with a look-behind assertion for the BOM, so that the BOM is preserved and new text is added immediately after it.

But again, if you now have to work from corrupted versions of the files (because you don't have a restorable backup of the originals), then there's rather more work you have to do (and it'll need fair bit more perl code -- you're not likely to solve it with one-liners on the command line).

A couple other points: (1) Since you are using the "find" command, I would expect that you also have access to "xargs", so that you could use a pipeline command (which tends to be preferable in many situations), like this:

find . -name '*.TXT' | xargs perl -i.bak ...
(2) If this is something you do repeatedly (e.g. at regular intervals on new sets of text files), why not save the perl code as a script? (Typing the name of a script file would be less troublesome than re-typing or copy-pasting the perl code itself.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149485]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (9)
As of 2024-03-28 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found