Everything with .html gets indexed, everything else isn't.
And ... what?
That doesn't address my concern at all. If I have a private URL that ends in ".html", it'll still likely get indexed. Then someone guesses a URL similar to that, and boom, they're in.
A good solution would also have an additional regex or blacklist of things that should never be offered as a suggestion.
If I have a private URL that ends in ".html", it'll still likely get indexed.
It's not likely, it will get indexed for sure. I don't think this is meant as a finished solution but to show a general way how to do such things.
I am afraid however, there will be more cut & pasting than actual reading.