See
Bruce Gray - Refactoring and Readability: Crouching Regex, Hidden Structures for a nice intro into pure Perl/raku options.
I do not know if there is a CPAN module that interfaces these external tools, but you may be interested in hxpipe that W3C provides. For example,
hxpipe (1) - convert XML to a format easier to parse with Perl or AWK
Here's the full list, which includes tools that support selecting elements.
cexport (1) - create headerfile of exported declarations from a C file
hxaddid (1) - add ID's to selected elements
hxcite (1) - replace bibliographic references by hyperlinks
hxcite-mkbib (1) - expand references and create bibliography
hxcopy (1) - copy an HTML file while preserving relative links
hxcount (1) - count elements and attributes in HTML or XML files
hxextract (1) - extract selected elements
hxclean (1) - apply heuristics to correct an HTML file
hxprune (1) - remove marked elements from an HTML file
hxincl (1) - expand included HTML or XML files
hxindex (1) - create an alphabetically sorted index
hxmkbib (1) - create bibliography from a template
hxmultitoc (1) - create a table of contents for a set of HTML files
hxname2id - move some ID= or NAME= from A elements to their parents
hxnormalize (1) - pretty-print an HTML file
hxnum (1) - number section headings in an HTML file
hxpipe (1) - convert XML to a format easier to parse with Perl or AWK
hxprintlinks (1) - number links & add table of URLs at end of an HTML file
hxremove (1) - remove selected elements from an XML file
hxtabletrans (1) - transpose an HTML or XHTML table
hxtoc (1) - insert a table of contents in an HTML file
hxuncdata (1) - replace CDATA sections by character entities
hxunent (1) - replace HTML predefined character entities to UTF-8
hxunpipe (1) - convert output of pipe back to XML format
hxunxmlns (1) - replace "global names" by XML Namespace prefixes
hxwls (1) - list links in an HTML file
hxxmlns (1) - replace XML Namespace prefixes by "global names"
asc2xml, xml2asc (1) - convert between UTF8 and nnn; entities
hxref (1) - generate cross-references
hxselect (1) - extract elements that match a (CSS) selector
And FWIW, Sphinx also provides
HTML stripping. Not sure how you'd use it, but it can be done when ingesting data for indexing.