How interesting sounds like prescient advice to me, if the HTML actually qualifies as XHTML.
Very obviously, the OP should be using any one of the several "HTML Parsers" that are readily available here, in order to be handed the particular strings that need to be further processed as it were, "on a silver platter." Regular expressions applied against the HTML string would be a monumental waste of effort that wouldn't produce results nearly so good.
But, "XPath is even better, if it works," because this strategy is non-procedural. If it works, then it means that you do not have to write programming that is tied to the structure of the parent document ... and which therefore would no longer work
if when that changes. If it turns out that it applies in this case, XPath can be a spectacular time-and-effort saver.
What you really would like to avoid and what XPath is very much engineered to let you avoid is programming that is specific to the exact structure of the XML. Such logic is not only fragile ... but unable to realize that it is now producing incomplete answers. (Having said that: "XPath, also, is not a panacea.")