The regular expression: (?-imsx:(d{4})+.htm) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- d{4} 'd' (4 times) ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- htm 'htm' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------