Don't use a regex, use File::Spec, which is a core module (or you can use e.g. Path::Class or Path::Tiny from CPAN). use File::Spec; my $filename = File::Spec->splitpath($fullpath); - if this script is running on a non-Windows system but you need to handle Windows filenames, write File::Spec::Win32 instead of File::Spec.
Update: After rereading I realize you're also trying to extract the path from a string that looks like "TEXT: \"...\"". I feel like we might be missing some context, because the file format is unclear to me - is this some standardized file format you're trying to extract a part of? If so, what format? Usually quoted strings also have some kind of escaping mechanism, is that the case here? All of this things will affect what the best solution is. If the format really is as simple as it seems, then I would combine Corion's solution to extract the string from the quotes with my suggestion above to extract the filename.
| [reply] [d/l] [select] |
Thanks for the response. I have many other lines to capture in the regex and here to make it easier i have given a part of the regex where i am facing issue. So i need a fix in the regex so that i can adapt the existing code
| [reply] |
TEXT:\s+".*?[\\/]([^\\/"]+)"
The filename can contain everything except a path separator (\ or /) and double quotes, and must be followed by double quotes. | [reply] [d/l] [select] |
.*TEXT:\s+"([^"]+)"
If you have both, lines with and lines without double quotes, you will have to show more data.
Update: After reading haukex answer, I now realize that I only understood half of your question. Use my answer to extract the full path and then use File::Spec to get the actual filename. | [reply] [d/l] |
Hi,
I dont need everything upto second double quote but i need last part of the line after / which is the path name. and i need this in regex only so that i can adapt the existing code
| [reply] |
At the risk of stating the obvious, if you want your character class (the stuff inside the square brackets) to match a space, you should include a space in your character class. If you want only a space (that is, not other space-like characters like tabs) something like this should work: .*TEXT:.*?([a-zA-Z0-9_\x7f-\xff.\w ]+)". This is your regex with a space inserted before the right square bracket. If you want all the space-like stuff, use \s instead of a literal space.
You did not ask about this, but I observe that your character class appears to contain unneeded information. I am not aware of any circumstance where \w does not include ranges a-z, A-Z, 0-9, and the underscore (_). Certainly it does under ASCII, the ISO encodings, CP1252, and Unicode. So you should find that .*TEXT:.*?([\x7f-\xff.\w ]+)" matches everything you want, and is easier to understand.
| [reply] [d/l] [select] |