Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
In the Web and in different books everybody can find smth like "connon regex lists" containig dozens of reallife regular expressions simple enough to be understood by everybody.

Being quite simple, these regexes are generally used to solve routine problems every Perl programmer meet.

But there are some tasks, which require regular expressions of much more complexity.

I want to make a list of complicated (obfuscated, odd etc.) regular expressions used to solve diffucult real problems (and then i plan to make it availible online somewhere outside this thread :) ). I will be very obliged if you post here examples of your most interesting regexes combined with chunks of data they were intended to match against.

My own favourite (it is combined from two regexes, one of which is recursive):

$brackets_pattern = qr{ # recursive pattern to search brackets lik +e [mmm[ hh[f]]ll] \[ (?: (?>[^\[\]]+ ) # non-brackets | (??{$brackets_pattern}) #new pattern for inside brackets )* \] }x; my $pat = qr/(?-xism:(?-xism:[ab?x][DLSRX?]Glc(?:[pfa?]|-ol|)N\(1\-4\) +)(?x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?x-ism:\[(?: +(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*\[(?x-ism:(?:(?>[^\[\]]+?)| +(??{$brackets_pattern}))*)(?:t\)|(?<![\])]))(?x-ism:\[(?:(?>[^\[\]]+) +|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][DLSRX?]Glcp\(1\-6\))\](? +x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][D +LSRX?]GalpN(?=\(|$)))/;
Of course, i didn't type the second regex myself; it is generated by my substructure search engine for the Bacterial Carbohydrate Structure Database as a response to a usual request. That's why i used the word "created" instead of "wrote" in the title — some interesting regexes are never typed, but are used intensively :)
Sample data to match against:
-6)[xR3HOBut(1-3)]aDGlcpN(1-4)[aDGlcp(1-6),Ac(1-2)]aDGalpN(1-3)[Ac(1-2 +)]bDGalpN(1-2)aDGlcp(1-P-

I hope to see yor examples described in the way i described mine :)

UPDATE:

Short list of the IMHO best ones i found in replies (ordered by time the comment was posted):


([^e]|e([^s]|s([^\.]|\.([^c]|c([^o]|o([^m]|m([^p]|p([^\.]|\.([^o]|o([^ +s]|s([^\.]|\.([^l]|l([^i]|i([^n]|n([^u]|u[^x])))))))))))))))
by Hue-Bond
A simple grep -E regex aimed to determine cross-posts between certain newsgroups.


#!/usr/bin/perl -l "AB~ACFI~ADGJ~AE~BCDE~BFHJ~BI~EGHI~EJ~IJ" =~ /([^~])[^~]*([^~]).*~[^~] +*([^~])[^~]*([^~])(?{local$z=$1 and local$y=$2 and local$x=$1 eq$3?$4 +:$1 eq$4?$3:($z=$2)&&($y=$1)&&$2 eq$3?$4:$2 eq$4?$3:0}).*~[^~]*((??{$ +y})[^~]*(??{$x})|(??{$x})[^~]*(??{$y}))(?{$x{join" - ",sort$x,$y,$z}+ ++})(?!)/; print for sort(keys %x), keys(%x) . " triangles found";
by !1 The regex (be stricter, this mix of regex and perl code ;)) in the heart of this short script finds all triangles for this quest and puts them all in the %x hash.
URL matching RegEx by abigail
Author's comment:
This does only a subset of the possible URLs:
I had to put it under <spoiler> because of its length :)
forking regular expression by Ovid As this is a complete Perl script (the forking regex standalone has no sense), i have put it under spoiler too.
An abridged (due to incredible size of the original) version of ikegami's generated regex to solve Sudoku puzzles:
The regexes become stranger and stranger :) Whose will be the next? ;)

In reply to The craziest RegExes you ever created by Ieronim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-16 20:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found