Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^7: Module for parsing tables from plain text document

by cavac (Vicar)
on Jan 13, 2023 at 14:39 UTC ( #11149567=note: print w/replies, xml ) Need Help??


in reply to Re^6: Module for parsing tables from plain text document
in thread Module for parsing tables from plain text document

I stand corrected.

As for those advanced heuristics, my first instinct would be to look into the "Open/Import" functionality of all those open source Spreadsheet tools like LibreOffice. Those developers spent the last few decades writing software that can make sense of user provided, badly formatted data files.

As far as it concerns myself, those self-"learning" AI/heuristics/statistics tools might be somewhat interesting for occasional hobby use. But i wouldn't consider them for production use. If something goes wrong (e.g. "a bug happens"), it's easy enough to debug (and verify/certify) a handcrafted parser. If an AI goes wrong, all you can do is tweak the training data, retrain the model and pray to a $DEITY of your choice that

  1. this has fixed the current problem
  2. the change in your training data hasn't introduced new problems

Advanced statistics (including what we commonly refer to AI) is an amazing tool by itself. But when is goes wrong, you basically have to find an error (or omission) in what boils down to a formula with possibly tens of millions of variables. I mean, winning a Nobel price is nothing to sneer at, but i'm not sure how one would do it on a typical IT department budget ;-)

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
  • Comment on Re^7: Module for parsing tables from plain text document

Replies are listed 'Best First'.
Re^8: Module for parsing tables from plain text document
by LanX (Sage) on Jan 13, 2023 at 15:14 UTC
    > my first instinct would be to look into the "Open/Import" functionality of all those open source Spreadsheet tools like LibreOffice.

    I already mentioned the Excel's import wizard, but it's mainly meant for CSV and not free-form tables.

    > If an AI goes wrong, all you can do is tweak the training data, retrain the model and pray to a $DEITY of your choice that

    As I said, that's also not the way I would go.

    The AI should

    • a TK window with a preview with the interpretation(s) and options to improve
    • create a "wizarded" Perl code
    This code should be validating too, hence be more fault tolerant the most hand crafted code.

    (like detecting unusual data, like text in a field which always used to be numerical)

    Ideally the Perl code could also include data to restart the Tk-Wizard to refince in case of errors.

    I think that would easily address all your concerns.

    PS: Of course this could also be a realized as a web service and skip the Tk part.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149567]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2023-03-22 03:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which type of climate do you prefer to live in?






    Results (60 votes). Check out past polls.

    Notices?