The regular expression: (?-imsx:(d{4,5}).htm) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- d{4,5} 'd' (between 4 and 5 times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- htm 'htm' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------