http://qs321.pair.com?node_id=690819


in reply to Re: Parsing with Regexes and Beyond
in thread Parsing with Regexes and Beyond

Small problem: 3 + 4 6 5 7 is considered valid by your parser. You need to check to make sure all tokens were absorbed or return an EOF token and check if that's the next token. The latter method would remove the need for "no warnings 'uninitialized';".

Very good catch, thank you. I'll update the tutorial accordingly.

I think you could improve performance by generating a single regexp from @token_def (/\G(?>$ws)(?>$tok1(?{...}|$tok2(?{...}|...)/g).

Since (?{...}) is marked as experimental even in perl 5.10.0 and I (re)discovered some bugs in it, I won't use it, at least not in a tutorial at this level. Thanks for the suggestion anyway, it made me think about 5.10's named captures that could be used instead.

Also, it would be better if the lexer was an iterator instead of doing all the lexing up front. That would decrease the memory footprint.

Indeed. I decided against it because it slightly increases complexity (or at least makes it harder to understand), but I should at least mention it.

I guess I'll find some time tomorrow to incorporate your suggestions.