How can I delete characters between

boom has asked for the wisdom of the Perl Monks concerning the following question:

I need to write a Perl script to read in a file, and delete anything inside < >, even if they're on different lines. That is, if the input is:

Hello, world. I <enjoy eating bagels. They are quite tasty.I prefer when I ate a bagel to when I >ate a sandwich. bananas. I want the output to be: Hello, world. I ate a sandwich. bananas. I know how to do this if the text is on 1 line with a regex. But I don't know how to do it with multiple lines. Ultimately I need to be able to conditionally delete parts of a template so I can generate parametrized files for config files. I thought perl would be a good language but I am still getting the hang of it.

Comment on How can I delete characters between < and > in Perl?

Replies are listed 'Best First'.
Re: How can I delete characters between < and > in Perl? by Anonymous Monk on Apr 18, 2009 at 13:54 UTC
`use File::Slurp; my $text = read_file( 'filename' ) ; $text =~ s!<[^>]+>!!g;` [download]	[reply] [d/l]
Re^2: How can I delete characters between < and > in Perl? by Anonymous Monk on Apr 18, 2009 at 14:04 UTC
needs to be non-greedy `$text =~ s!<[^>]+?>!!g;`	[reply] [d/l]
Re^3: How can I delete characters between < and > in Perl? by kyle (Abbot) on Apr 18, 2009 at 16:39 UTC
What's the difference? The character class (`[^>]`) is never going to accidentally slurp up closing hoinkies anyway.	[reply] [d/l]
Re^3: How can I delete characters between < and > in Perl? by Your Mother (Archbishop) on Apr 18, 2009 at 16:44 UTC
That is incorrect. That's part of the reason for using negated match classes. It cannot over-match.	[reply]
Re: How can I delete characters between < and > in Perl? by roboticus (Chancellor) on Apr 18, 2009 at 14:00 UTC
boom: You should review these two links: Perl Monks Approved HTML tags perlre, look at the Quantifiers section. You're wanting "non-greedy" matches. ...roboticus	[reply]
Re: How can I delete characters between < and > in Perl? by ambrus (Abbot) on Apr 18, 2009 at 21:18 UTC
If you want to delete matches spanning multiple lines, just delete the rest of the line if there's an unmatched `<` sign with an s substitution, and check the return value of that substitution to see if it's happened. If it has, set a flag and keep throwing lines away until you find one with a `>` sign, where you delete the part up to that sign and then continue applying the ordinary replacements. Note however that if you are attempting to strip tags from a html or xml file, you'd better use a proper module instead of regexen written by hand. These will work better with more unusual html constructs and also malformed but usual html like one with unescaped angle brackets. Eg. try something like `perl -we 'use 5.010; use XML::Twig; binmode STDOUT, "encoding(iso8859- +2)"; $twig = XML::Twig->new->parsefile_html($ARGV[0]); say $twig->roo +t->text;' somefile.html` [download]	[reply] [d/l] [select]
Re^2: How can I delete characters between < and > in Perl? by Anonymous Monk on Apr 19, 2009 at 02:13 UTC
HTML::StripScripts, strip HTML tags	[reply]


XP is just a number
	PerlMonks

How can I delete characters between < and > in Perl?