http://qs321.pair.com?node_id=891779


in reply to Substring consisting of all the characters until "character X"?

My guess is that the reason there isn't such a subroutine ready-made in Perl is because that kind of fancy substring extraction is usually handled by a regular expression in Perl. Regular expressions are more capable of handling the huge variety in methods for ending a string: a single character or end of string, a set of terminal characters (space or X or Z, whichever comes first), first occurance of a single character or a maximum number of total characters, a terminal string rather than a single terminal character, and many, many more. Some examples:

#from chr 2 to right before first space or to the end of $str #if no space is found # - ^.{2} = skip past first two characters # - \S = not whitespace, \s=whitespace # - (\S*) captures zero or more non-whitespace characters # - ($str =~ /^.{2}(\S*)\s/) is a list containing one string, # i.e. ($1) where $1=what was captured by (\S*) printf "substr(2, first ' ' or end): %s\n", ($str =~ /^.{2}(\S*)/); #from chr 2 to lessor of 5 character or first space #\S = not whitespace, \s=whitespace printf "substr(2, first ' ' or 5 chars): %s\n" , ($str =~ /^.{2}(\S{0,5})/); #from chr 3 to first X or end of $str printf "substr(3, first 'X' or end): %s\n" , ($str =~ /^.{3}([^X]*)/); #from chr 3 to lessor of first X or 5 chars printf "substr(3, first 'X' or 5 chars): %s\n" , ($str =~ /^.{3}([^X]{0,5})/); #from chr 3 to first occurance of two or more A's or to the end if #no doubled A's are found printf "substr(3,two or more A's or end): %s\n" , ($str =~ /^.{3}(.*?)(AA|$)/); #from chr 10 to lessor of 5 chars or first of run of 2 or more A's printf "substr(10,two or more A's or 5 chars): %s\n" , ($str =~ /^.{10}((?:[^A]|A(?!A)){0,5})/); #from chr 10 to lessor of 5 chars or first of run of 2 or more X's printf "substr(10,two or more X's or 5 chars): %s\n" , ($str =~ /^.{10}((?:[^X]|X(?!X)){0,5})/); #from chr 5 to first occurance of two or more X's or to the end if #no doubled A's are found printf "substr(3,two or more X's or end): %s\n" , ($str =~ /^.{3}(.*?)(?:XX|$)/); #outputs substr(2, first ' ' or end): XCDEFDGHIXTAAGRAAAAAA substr(2, first ' ' or 5 chars): XCDEF substr(3, first 'X' or end): CDEFDGHI substr(3, first 'X' or 5 chars): CDEFD substr(3,two or more A's or end): CDEFDGHIXT substr(10,two or more A's or 5 chars): IXT substr(10,two or more X's or 5 chars): IXTAA substr(3,two or more X's or end): CDEFDGHIXTAAGRAAAAAA theEnd

I grant you the syntax of those regular expressions above is somewhat arcane and cryptic. They aren't as obvious to the untrained eye as substr_chr($str,3,'A'). However, they give you much more flexibility to roll your own string endings with just a few keystrokes.

Have you had a chance to study perlretut and perlre? If not, consider doing so. If you are extract strings based on characters or other textual considerations on a regular basis, you will find regexes a very powerful tool in your toolkit.

Update: fixed typos in output labels