http://qs321.pair.com?node_id=10299

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

For example, given

$string = "abe[123955785adasd]jdjajd";

I want to extract the substring between the [ and ] characters, and store it in another variable.

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How to extract a delimited substring?
by btrott (Parson) on May 05, 2000 at 06:35 UTC
    my $string = "abe[123955785ada]sdjdjajd"; if ( $string =~ /\[(.*?)\]/ ) { my $inside = $1; print $inside, "\n"; }

    The string inside the brackets is in $inside.

    The ( and ) around the .*? are "grouping parentheses"; they tell perl's regular expression engine to capture whatever it matches between them and store it in a special variable. The variable name will be a digit corresponding to the parenthesis group — $1 for the first group, $2 for the second group, etc. — counting '('s from the left.

    So, in this case, perl captures what it finds between the [ and ] and stores that string in $1.

    And what does it match? In this case, we've told it to match any character ("."), repeated 0 or more times ("*"), in a non-greedy manner ("?"). This last is best illustrated by example.

    Say your original string was

    "aslkj[2099asgjskjw]asljgn[awoeiwj]"
    There are two sets of [/]-delimited strings in there. Without the "?" in the regular expression, perl would match from the first [ to the last ], putting the following into $1:

    2099asgjskjw]asljgn[awoeiwj

    Most likely, this isn't what you want, so we put in the non-greedy modifier to make perl do what you want.

    By the way, I said that "." matches any character; this isn't strictly true. It usually doesn't match carriage returns; if you want it to match carriage returns as well, add the /s modifier at the end of the regular expression.

    For more information, see perlre. It explains all of this and more.

Re: how do i take out a string within a string ?
by btrott (Parson) on May 05, 2000 at 07:41 UTC
    The ( and ) around the .*? are grouping parentheses... they tell Perl's regular expression engine to capture whatever it matches between the [ and ], and store it in a variable: the variable name will be a digit corresponding to the set of parentheses ($1 for the first group, $2 for the second group, etc.).

    So, in this case, Perl captures what it finds between the [ and ] and stores that string in $1.

    And what does it match? In this case, we've told it to match any character ("."), repeated 0 or more times ("*"), in a non-greedy manner ("?"). This last is best illustrated by example. Say your original string was

    aslkj[2099asgjskjw]asljgn[awoeiwj]
    There are two sets of [ and ] strings in there. Without the "?" in the regular expression, Perl would match from the first [ to the last ], putting the following into $1:
    2099asgjskjw]asljgn[awoeiwj
    Most likely, this isn't what you want, so we put in the non-greedy modifier to make Perl do what you want. :)

    By the way, I said that "." matches any character; this isn't strictly true. It usually doesn't match carriage returns; if you want it to match carriage returns, add the "/s" modifier at the end of the regular expression.

    For more information, take a look at perlre. It explains all of this and more.