No such thing as a small change | |
PerlMonks |
Re^7: Parsing HTML/XML with Regular Expressions (regex)by RonW (Parson) |
on Oct 20, 2017 at 21:29 UTC ( [id://1201776]=note: print w/replies, xml ) | Need Help?? |
I ran your version of my code and got the same output you did. Since I already discovered the embedded newlines in the elements list, I added tr/\n//d; at the top of the for loop:
After doing that, the id for Saturday picked up correctly. Also, out of curiosity, I removed the s/\W+//g; you added. The result was:
So, Saturday is cleaned up. I know why the id for Sunday is Foo, but still not sure why the "bbbdddeeeggg" is picked up. I will have to step through the code to see what's happening. As for the  , that's encoding dependent. Not sure why it would get excluded other than by explicitly filtering out non-ASCII characters. The y is the y in Sunday. Just requires entity decoding.
In Section
Meditations
|
|