Tricky regex problem

nysus has asked for the wisdom of the Perl Monks concerning the following question:

Got a regex:

$input_text =~ s/^###*\s+[^\n]+\s+(?=^##)//gsm;
[download]

I want it to strip out markdown headers from a file that don't contain anything:

## This gets stripped

## This doesn't
Because it contains this line

## More headers
blah blah
[download]

But I also don't want it to strip this:

## This should not get stripped, but it does

### This should prevent it from getting stripped, but it doesn't
stuff

#### This should also not get stripped

##### But it does
[download]

Is there any way at all to pull this off with a regex?

$PM = "Perl Monk's";
$MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Comment on Tricky regex problem Select or Download Code

Replies are listed 'Best First'.
Re: Tricky regex problem by Eily (Monsignor) on Jul 22, 2020 at 15:55 UTC
Not a single regex but one way to do what you want is to read your file in paragraph mode, with $/ and split the logic: `{ local $/ = ""; # Edit: added use of local for good practice while (<DATA>) { s/^##.*//s unless /^\w/m; print; } } __DATA__ ## Remove ## Keep this ## Remove ## Also keep that` [download] Otherwise you could use a negative look ahead assertion `(?!^\w)` Edit: seems like I really didn't read the requirement well enough ^^"	[reply] [d/l] [select]
Re: Tricky regex problem by tybalt89 (Monsignor) on Jul 22, 2020 at 16:00 UTC
`#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11119658 use warnings; $_ = <<END; ## This gets stripped ## This doesn't Because it contains this line ## More headers blah blah ## this also should be stripped ? ## This should not get stripped, but it does ### This should prevent it from getting stripped, but it doesn't stuff #### This should also not get stripped ##### But it does isn't some text needed here? END s/^(#+).*\n\n(?!^\1#)//gm; print;` [download] Outputs: `## This doesn't Because it contains this line ## More headers blah blah ## This should not get stripped, but it does ### This should prevent it from getting stripped, but it doesn't stuff #### This should also not get stripped ##### But it does isn't some text needed here?` [download] I think a larger test case may be needed...	[reply] [d/l] [select]
Re: Tricky regex problem (updated) by AnomalousMonk (Archbishop) on Jul 22, 2020 at 15:34 UTC
If ~~that~~ (update: oops... I meant to reply to that node) works for you, try this as a simplification (untested): `$input_text =~ s{ ^ ([#]{2,5}) \s+ [^\n]+ \s+ (?= ^ \1 [^#]) } {}xmsg;` [download] Beyond that, I have to say I don't understand your requirements. Can you express them more clearly? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: Tricky regex problem (updated) by nysus (Parson) on Jul 22, 2020 at 15:47 UTC
All headers that are followed by only whitespace, with the whitespace getting followed by a header with the same or fewer pounds signs as the initial header, should get stripped. $PM = "Perl Monk's"; $MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar"; $nysus = $PM . ' ' . $MCF; Click here if you love Perl Monks	[reply]
Re: Tricky regex problem by LanX (Saint) on Jul 22, 2020 at 16:01 UTC
use strict; use warnings; local $/ = "\n##"; #record separator while (<DATA>) { chomp; #print "\n<<<$_>>>\n"; # check input my ($head,$rest) = /^ (.?) \n (.) $/xs; # print record only if $rest contains alphanumerics print "##$_" if $rest =~ /\w/; } __DATA__ ## This gets stripped ## This doesn't Because it contains this line ## More headers blah blah ## This should not get stripped, but it does ### This should prevent it from getting stripped, but it doesn't stuff #### This should also not get stripped ##### But it does [download] `C:/Perl_524/bin\perl.exe -w d:/exp/pm_headers.pl ## This doesn't Because it contains this line ## More headers blah blah ### This should prevent it from getting stripped, but it doesn't stuff Compilation finished at Wed Jul 22 17:59:57` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: Tricky regex problem by nysus (Parson) on Jul 22, 2020 at 15:11 UTC
I came up with this, which is faintly ridiculous: `$input_text =~ s/^##\s+[^\n]+\s+(?=^##[^#])//gsm; $input_text =~ s/^###\s+[^\n]+\s+(?=^###[^#])//gsm; $input_text =~ s/^####\s+[^\n]+\s+(?=^####[^#])//gsm; $input_text =~ s/^#####\s+[^\n]+\s+(?=^#####[^#])//gsm;` [download] I just googled "conditional regular expression" and it seems that is a thing and it might help me. Not sure yet. $PM = "Perl Monk's"; $MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar"; $nysus = $PM . ' ' . $MCF; Click here if you love Perl Monks	[reply] [d/l]


XP is just a number
	PerlMonks