Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

substitution on

by Anonymous Monk
on Aug 24, 2009 at 16:27 UTC ( [id://790859]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How to convert all '<' to < if the '>' doesnot exists. For example: In the text,
Read them in one <place with Google Reader,<a href="www.google.com"> n +ew<
How to get the output as
Read them in one &lt;place with Google Reader,<a href="www.google.com" +> new&lt;
while(<DATA>){ s/</&lt;/g; } __DATA__ Read them in one <place with Google Reader,<a href="www.google.com"> n +ew<

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: substitution on
by ikegami (Patriarch) on Aug 24, 2009 at 17:01 UTC

    Your question is next to impossible to understand since we can't tell the difference between when you meant to say "<" and when you meant to say "&lt;". Please repost your question (using Preview until it's readable).

    To get <To get &lt;
    Outside of code tags&lt;&amp;lt;
    Inside of code tags<&lt;
Re: substitution on
by Taulmarill (Deacon) on Aug 24, 2009 at 17:09 UTC
    If you want to replace < with &lt; where it is not part of an HTML tag, a simple solution would be using negative look ahead in your regex like this:
    s/<(?![^<]*>)/&lt;/g;
    A better solution would be using some HTML parsing module from CPAN.
Re: substitution on
by jethro (Monsignor) on Aug 24, 2009 at 17:15 UTC

    < and > are special characters for html. Your text is nearly unreadable, please edit your node and use &lt; or code-tags to print those characters.

    To answer your question, you might use something like this:

    s/<(?=[^>]*(<|$))/&lt;/g;

    This substitutes all < not followed by > but instead by < or end of the string

    Naturally this won't work if single < could occur between a valid < ... > construct. It also won't work if a > might be on the next line to its opening <. In that case you could slurp in the whole file into one string.

Re: substitution on
by Sewi (Friar) on Aug 24, 2009 at 17:57 UTC
    You could do something like
    while (s/((^|\>)[^\<]*?)\>/$1\&gt\;/g) { 1; }
    Explained:
    Look for the begin of the string or any closing > in place. From here, grab everything which is no opening < until you find the next closing >. Return the grabbed thing but replace the unmatched > by &gt;.
    Do this in a while loop, because the first run may not catch all unmatched >'s until no new places are found.

    The only 100% working solution would be using an XML/HTML parser. The sample above will mix up HTML-code like <a name=">here">

    edit: Sorry, I just saw that my solution is working on the wrong side.

    edit2: Here is the correct solution:

    while (s/\<([^\>]*?(\<|$))/\&lt\;$1/g) { 1; }
    Explained:
    Every < which has no > (^\&gt; - match everything but >, *? = match 0 to as few as possible chars) until the earliest next < or and end-of-string is replaced.
    Has been tested using the string
    <<html><x<y</html><

    Notice:
  • The same thing (using a XML/HTML parser for best results) is also true for this one.
  • Remember using /s if you're processing a multiline string

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://790859]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-24 16:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found