Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Module for parsing tables from plain text document

by Tux (Canon)
on Jan 07, 2023 at 10:01 UTC ( [id://11149402]=note: print w/replies, xml ) Need Help??


in reply to Module for parsing tables from plain text document

If it is *really* fixed-width columns, I'd use unpack


Enjoy, Have FUN! H.Merijn
  • Comment on Re: Module for parsing tables from plain text document

Replies are listed 'Best First'.
Re^2: Module for parsing tables from plain text document
by GrandFather (Saint) on Jan 07, 2023 at 10:31 UTC

    There are lots of ways to do it if I want to count characters for each of the tables I need to deal with. What I'd like is something that looks at the table and uses heuristics to figure out the column widths and names. For all the tables I'm dealing with in the first instance the tables are machine generated so the columns are unlikely to change within a table, but they do change between tables.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      I wrote something similar for PDF once, and also wrote Data::TableReader, but I never got around to making PDF into one of the decoders. For PDF, it made sense to look at start X addresses for segments of text, and identify it as a column if there were roughly as many text fragments starting at an X as there are estimated number of lines. Text has less granularity, so I think if I were going to try writing it for text, I would iterate lines of text and make a history of which columns have a vertical run of whitespace, and at the EOF or first blank line, see which runs of whitespace lasted from the first to the last line. Concatenate adjacent whitespace columns, and then report the space inbetween as the data columns.

      It would be really awesome if you wanted to contribute a Decoder for Data::TableReader :-)

        Could you please show an example how to parse the OP's table?

        I find this example particularly challenging, since

        • it has nested columns
        • multiple subdivided head captions
        • especially "Longitude" is overlapping the "empty column" limiting its data entries below.

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149402]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 23:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found