http://qs321.pair.com?node_id=652562

wherethewild has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm a total newbie, not just to perl but to any form of scripting.

I'm trying to create a script which will take a text file that I have and run through it, changing bits here and there eg. on a line that starts with HEADER I want a number appended; I want an entire line inserted just before a line which starts with REMARK; and many other things.

So my question: is perl able to do this, or do I need to seek out another language/tool/something?

like I said...total newbie! Any words of wisdom greatly appreciated!

cheers,
wherethewild

Replies are listed 'Best First'.
Re: Newbie: uses/limits of perl in editing files
by tirwhan (Abbot) on Nov 23, 2007 at 14:07 UTC

    Welcome to the monastery. From the task description I'd say perl is very well suited. I'll give you an example for a program that roughly does what you describe

    #!/usr/bin/perl use warnings; use strict; my $filename = "whateveryourfileiscalled.txt"; my $newfile = "whateveryouwantthechangedfiletobecalled.txt"; open (my $rfh,"<",$filename) or die "Can't open file $filename : $!"; open (my $wfh,">",$newfile) or die "Can't open file $newfile : $!"; while (my $line = <$rfh>) { if ($line =~ m/^HEADER/) { chomp $line; my $number = 42; # change to whatever number you want to use $line .= $number."\n"; } if ($line =~ m/^REMARK/) { print {$wfh} "Extra line\n" # Change to whatever extra line yo +u want } print {$wfh} $line; } close $rfh or die "Can't close $filename : $!"; close $wfh or die "Can't close $newfile : $!";
    Or you could do this in a perl oneliner (which will change the original file):
    perl -pi -e 'chomp;s/^(HEADER.*)$/${1}42/;s/^(REMARK.*)$/Extra line\n$ +1/;$_.="\n"' whateveryourfileiscalled.txt
    Caveat: Both of these are for systems where the line ending is "\n" (i.e. not Windows), adjust appropriately for other OSes. Update: the caveat is not actually correct, as pointed out by naikonta and wfsp, except possibly for the case outlined by Sixtease, also fixed in his update. Thanks to all of you.

    All dogma is stupid.
      Wow, I didn't know the print {$wfh} $line; construct. Does that disambiguate $wfh to be interpreted as a filehandle?

        Yes; because the "filehandle argument slot" (for lack of a better name) has to be a simple scalar value or a BLOCK. While it's superfluous in this particular case the block form is useful if you (for instance) have a hash of filehandles and want to use print { $handles->{$somekey} } "Yadda yadda yadda.\n"; directly rather than pulling it out into a tmp variable. The docs for print cover this.

        Yep, I picked that up from thedamians Perl Best Practices (a must-read for every Perl programmer IMO). As Fletch rightly points out, it's not necessary in this case, but I just use it wherever I print to a filehandle (easier to do than figure out what's wrong if I ever forget it :-)


        All dogma is stupid.
      Caveat: Both of these are for systems where the line ending is "\n" (i.e. not Windows), adjust appropriately for other OSes.
      I see no caveat in your example regarding \n. This character is just the Perl internal representative of a thing that constitutes line ending. So it will be whatever the underlying OSes (perl is run on) actually use to terminate lines. See how newlines are addressed in perlport.

      Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      That's it! Wow, let's see how I go at adjusting it all for everything else I have to do to it. The weekend shall be fun.

      One problem though, and I guess it has something to do with the \n point you made at the end. Now I have <cr> appearing at the end of every line. I'm sitting on a Linux workstation running RedHat if that is at all helpful (I DID say I know nothing about this!)

      cheers
      wherethewild

        Here's my guess:

        Your file has Windows newlines (cr/lf), which your editor/viewer can deal with and shows it correctly. Then you add unix newlines (lf) on the lines you edit. Now there are mixed cr/lf and lf newlines, which confuses the editor and it shows the cr characters.

        If I'm correct, then I recommend either preprocess the file with the dos2unix tool or address this in the perl script itself

        update: The modified while loop could look like this:

        while (my $line = <$rfh>) { chomp $line; if ($line =~ m/^HEADER/) { my $number = 42; # change to whatever number you want to use $line .= $number; } if ($line =~ m/^REMARK/) { print {$wfh} "Extra line\n" # Change to whatever extra line yo +u want } print {$wfh} $line, "\n"; }
Re: Newbie: uses/limits of perl in editing files
by Dominus (Parson) on Nov 23, 2007 at 15:06 UTC
    Tie::File is nice for stuff like that.

    It makes the file look like an array, with one line in each element. Then you modify the array. As you do, the changes appear in the file.

Re: Newbie: uses/limits of perl in editing files
by Sixtease (Friar) on Nov 23, 2007 at 14:08 UTC

    You can do this with pretty much any scripting / programming language (if it has input/output capabilities and is turing-complete). And Perl may be the most comfortable one for this.

    The code to do something like you said could look like

    perl -pe '/^REMARK/ and print "the line you want to add\n"' < input_file > output_file
Re: Newbie: uses/limits of perl in editing files
by cdarke (Prior) on Nov 23, 2007 at 14:42 UTC
    Exetending your requirements a little, there is a neat feature that is useful when replacing tokens, like your HEADER and REMARK. You can execute code from within a substitute statement, for example:
    $line =~ s/(HEADER|REMARK)/mysub($1)/ge;
    That will call user-written subroutine mysub every time HEADER and REMARK are found in the text. The argument passed is the text matched inside (). Whatever is returned by mysub will replace the token. It probably would not be worth it for the simple substitution you mentioned, but for more complex combinations it can be very powerful.
      I was looking at those s/// but I wasn't sure I how I could get it to do some of the things I need as the text which has to be substitued is different from file to file and it also appears elswhere in the file, where it's not to be adjusted.

      Was that clear?
      This is a theoretical line is my file:
      BOBBY X66666 A 345 674 A 123 488

      The X66666 has to be changed to B22222. But the next file might have U33333 there instead of X66666, or worse still 666D3P, or even worse absolutely nothing at all. And I don't know what it might be unless I open up each of the text files and look what the previous program did to it (something I'm trying to avoid by learning this!). It SHOULD be that the spacing is constant across that line, but that's not guaranteed.

      Anyway, that's some of what I'm trying to do. Thanks everyone for the speed and friendliness in helping me out!

        BOBBY X66666 A 345 674 A 123 488

        Assuming the piece you want to replace is the 2nd token in the line then you can do something like:

        s/^(\S+)(\s+)(\S+)(.*)/${1}${2}B22222${4}/;

        This reads like:
        (NOT WHITESPACE)(WHITESPACE)(NOT WHITESPACE)(EVERYTHING)

        That collects the first 3 pieces into variables $1-$3 then the remainder of the line into $4, then reassemblies the line with the pieces and the replacement.

Re: Newbie: uses/limits of perl in editing files
by dwm042 (Priest) on Nov 23, 2007 at 14:57 UTC
    Perl is as close to a Swiss Army knife of a scripting language as exists. If you can't write the code yourself, you can, in most circumstance, find a solution written for you on CPAN.

    Having said that, at this stage, you probably need a beginning text on writing Perl. Something like Learning Perl would be appropriate.