Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Simple Text Manipulation

by Albannach (Monsignor)
on Feb 05, 2011 at 15:21 UTC ( [id://886409]=note: print w/replies, xml ) Need Help??


in reply to Simple Text Manipulation

I can’t use a simple search and replace because the Article numbers change. Can Perl be used to do this?

Well fortunately Perl doesn't restrict you to simple search and replace, instead you have arguably the most powerful regular expression engine in the known universe! Perlre's eat changing numbers for breakfast whilst blindfolded with all eight limbs tied behind their backs. Some suggestions to add to Corion's sage advice:

  • figure out exactly how you identify the lines you want to change. Corion had to assume they were all starting with Article since you weren't specific about that
  • decide exactly how the lines may be constructed: are they always just letters and spaces, or may they have numerals, punctuation, carriage returns?

As a trivial solution I offer this simple modification of JavaFan's solution which you would apply line by line as you read through the file, but note it will not work if you have a different specification for the lines. I prefer to be as specific as I can in regular expressions, so know your data.

s/^(Article\s+[0-9]+\s+[\w\s]+$)/\\subsection*{$1}/;

--
I'd like to be able to assign to an luser

Replies are listed 'Best First'.
Re^2: Simple Text Manipulation
by Anonymous Monk on Feb 05, 2011 at 18:40 UTC

    I must be doing something wrong, when I type:

    $perl script.pl file1.txt

    I'm returned to the command prompt. I'm using mac os, if that makes any difference.

        I should have been more specific. I'm returned to the command line with no processing having been done to the file and no new file created.

Re^2: Simple Text Manipulation
by pseingalt (Initiate) on Feb 06, 2011 at 11:08 UTC

    Here's the script; I added : because the word "Article" which always begins the line, is followed immediately by a colon. Using the syntax, "$ perl script.pl test.tex -w" I'm returned to the command line and the file was not processed.

    #!/usr/bin/perl

    s/^(Article\s+{0-9}+{:}+\s+{\w\s}+$)/\\subsection*{$1}/;

    (I've replaced brackets [] with braces {} since they don't seem to show up here.

      I gave you three steps you need to implement for your program. Which of these steps is your program supposed to implement?

      As an aside, if you use <code>...</code> tags around your code (and data), as suggested when you compose a node, your code will render as code, without HTML or bracket interpolation..

      I think you need to read our responses more carefully. Corion gave you most of the answer, and I helped you with the substitution, noting in my response: "...which you would apply line by line as you read through the file". If you don't understand what Corion wrote, then ask a specific question, but please don't just ignore the advice. As you seem to need a little more fundamental guidance, I'll say this: Perl will let you do what you want in many possible ways, but a basic approach to run your code in a file (your script.pl) from the Perl command line as you appear to have chosen, that code will have to:

      • open the input file (tip: check out the "null filehandle" in perlop)
      • read lines one by one,
      • apply the substitution regex you want, and
      • write the result somewhere. Your output could go to the console, pipe to a file, get written to a file you opened in your code, or you could even investigate Perl's handy feature for editing a file in place via the -i command line option (read up on this in perlrun).
      All your code (and please do use code tags - see Writeup Formatting Tips just above the editing box when you're composing your message) does is apply a substitution to nothing and ends, so of course you get no result.

      Quick tips on your regex modification: The colon has no special meaning so you need not put it in square brackets. Functionally there is no difference, but using unnecessary characters makes it harder for others to easily pick out what you mean to do. Also, unless you want to accept cases of more than one colon, don't put a + after the colon. Please read up on regular expressions; the + symbol is not used to assemble parts of a regex, it is a quantifier and means 'match one or more of the preceding element'.

      --
      I'd like to be able to assign to an luser

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://886409]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (1)
As of 2024-04-24 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found