http://qs321.pair.com?node_id=11122407


in reply to Using perl script to convert .html file to tab delimited file

Hello, ssaahh. Welcome to the Monastery.

The script dicthtml2tab.pl is quite simple. It takes the input file as argument and writes to standard output. Therefore, assuming you are on a unix-like OS or at least using a standard shell, the steps to use it are:

  1. Download the file
  2. Make sure you can run it: chmod a+rx dicthtml2tab.pl
  3. Run it: ./dicthtml2tab.pl myinputfile > myoutputfile

Note that the dicthtml2tab.pl script uses regex to parse HTML which is, at best, fragile. If this is a common enough task then it might be better for someone to write and publish a new script which uses a proper HTML parser instead. But that's for another day.


🦛