Interesting problem.
A simple regexp-based solution that doesn't take lexemes into account will fail,
at least on pathological cases like
1800 IF V$="THEN FOO ELSE IF BAR" THEN $K="FOO"
I don't have the brain cells left tonight to work through a complete solution,
though I have a dim recollection of having done something like this many years back using "fixups".
1800 IF V$="K" THEN A$="+K+" ELSE IF V$="R" THEN A$="?R?" ELSE IF V$="M" THEN A$="!M!":Z1=R1:Z2=R2
would translate first into
1800 IF V$="K" THEN GOTO {fixup:skip}
1800.1 GOTO {fixup:after-next-goto}
1800.2 A$="+K+"
1800.3 GOTO {fixup:end}
1800.4 IF V$="R" THEN GOTO {fixup:skip}
1800.5 GOTO {fixup:after-next-goto}
1800.6 A$="?R?"
1800.7 GOTO {fixup:end}
1800.8 IF V$="M" THEN GOTO {fixup:skip}
1800.9 GOTO {fixup:end}
1800.a A$="!M!"
1800.b Z1=R1
1800.c Z2=R2
1800.d REM
The second pass would peform the fixups.
- {fixup:skip} becomes the number of the 2nd following line in the sequence
- {fixup:end} becomes the final line number of the sequence
- {fixup:after-next-goto} becomes the line number after the next goto in the sequence. (This works because conditionals cannot be nested within a line.)
By using a {fixup:skip} fixup (rather than calculating the target line number as you're generating the sequence, you allow for the possibility of doing peephole optimizations. In the sequence above,
1800.8 IF V$="M" THEN GOTO {fixup:skip}
1800.9 GOTO {fixup:end}
1800.a
could be optimized to
1800.8 IF V$<>"M" THEN GOTO {fixup:end}
1800.9
This "renumbers" lines within the sequence, but since target line number calculation/assignment has been deferred, not GOTOs are broken.