http://qs321.pair.com?node_id=1171486


in reply to Get a known substring from a string

As BrowserUk has pointed out, it is a little puzzling why you need to search for the ID if you already know it. However, if you are looking for an exact substring within a longer string then index might be a better approach rather than a regex. If you are also wanting to remove the substring from the string then the four argument form of substr is useful as it returns the removed text.

johngg@shiraz:~ > perl -Mstrict -Mwarnings -E ' my $find = q{thispart}; say $find; my $str = q{ksguhdipghisosipghthispartudirlhgdr}; say $str; my $posn = index $str, $find; die qq{Substring not found\n} if $posn == -1; my $idNo = substr $str, $posn, length $find, q{}; say $idNo; say $str;' thispart ksguhdipghisosipghthispartudirlhgdr thispart ksguhdipghisosipghudirlhgdr

index returns -1 if the substring is not found.

johngg@shiraz:~ > perl -Mstrict -Mwarnings -E ' my $find = q{thatpart}; say $find; my $str = q{ksguhdipghisosipghthispartudirlhgdr}; say $str; my $posn = index $str, $find; die qq{Substring not found\n} if $posn == -1; my $idNo = substr $str, $posn, length $find, q{}; say $idNo; say $str;' thatpart ksguhdipghisosipghthispartudirlhgdr Substring not found
I get a whole load of ID numbers come in from different sources, but for some reason, they aren't spaced apart

If the IDs are all mashed together beware of finding false positives. Given 4-digit IDs of 3819, 8076 and 7204 in the string 381980767204, looking for ID 6720 would falsely report as being present. If you are lucky enough to have fixed length IDs, consider breaking the string down using unpack to place the IDs into a hash. Then searching for any ID becomes simple.

johngg@shiraz:~ > perl -Mstrict -Mwarnings -MData::Dumper -E ' my $idStr = q{381980767204}; my %idLookup = map { $_ => 1 } unpack q{(a4)*}, $idStr; print Data::Dumper->Dumpxs( [ \ %idLookup ], [ qw{ *idLookup } ] ); say qq{ID $_ }, exists $idLookup{ $_ } ? q{found} : q{not found} for qw{ 7204 6720 };' %idLookup = ( '8076' => 1, '7204' => 1, '3819' => 1 ); ID 7204 found ID 6720 not found

I hope this is helpful.

Cheers,

JohnGG