Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

examine cell's content in HTML::TableExtract

by limner (Novice)
on Mar 26, 2018 at 18:23 UTC ( [id://1211774]=perlquestion: print w/replies, xml ) Need Help??

limner has asked for the wisdom of the Perl Monks concerning the following question:

Good morning to all monks
and thank you for the help you gave me in other posts (i'm new to this forum, not easy to find old posts...)
i've some questions about HTML::TableExtract
i'm trying to get some informations from ad html file.
I have been able to get the exact table that i want to capture but i'm not able to get
every single cell (because i have to manipulate the text of every cell of the table) This is the code i wrote

########################################################## use strict; use warnings; use HTML::TableExtract; my $headers = ['Col1_name', 'Col2_name', 'Col2_name']; my $table_extract = HTML::TableExtract->new(headers => $headers); $table_extract->parse_file('file.html'); my ($table) = $table_extract->tables; my $i=0; for my $rox ($table->rows) { print join(',' , @$rox), "\n"; } ##########################################################

now, if i run this code, i have as result the content of the table but
"all toghether" while what i want to do is examinate the content of every cell,
removing whatever i don't want and save the result

For example:
i have a table of 3 columns and 3 rows (9 cells total)

1)This cell contain "Limner (Carriage Return) other text "
2)This cell contain "carrots (Carriage Return) onion (Carriage Return) other text "
3)This cell contain "dagger"
4)This cell contain "Autolicos"
5)This cell contain "one statue (Carriage Return) one diamond "
6)This cell contain "sword (Carriage Return) sword "
7)This cell contain "Leimir "
8)This cell contain "one onion (Carriage Return) 900 other text"
9)This cell contain "bow "

Now that i want is to examinate every cell, one by one, remove all the text from
(carriage return) and put the result in a variable in order to save everithin
in a file that should be like this:

Limner;Carrots;Dagger
Autolicos;one statue;sword
Leimir;One onion;bow


Any help?
thanks in advance
Limner

Replies are listed 'Best First'.
Re: examine cell's content in HTML::TableExtract
by roboticus (Chancellor) on Mar 26, 2018 at 23:34 UTC

    limner:

    At first, I was wondering why you were having trouble, as HTML::TableExtract pulls all the data into a structure for you. Then I tried whipping up a simple example, and found that I couldn't edit the data in-place, as I expected to be able to.

    To get around that, I simply made a temporary array to hold the data. From there I could manipulate the data and do whatever with it:

    # Copy all the data from the table into @tbl my @tbl; push @tbl, [ @$_ ] for $table->rows; # Now edit the data any way you like: for my $r (0 .. $#tbl) { # Make the first column ALL UPPERCASE $data[$r][0] =~ tr/a-z/A-Z/; ... more editing, as desired ... } # print it for my $rox (@tbl) { print join(";", @$rox), "\n"; }

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: examine cell's content in HTML::TableExtract
by marto (Cardinal) on Mar 26, 2018 at 19:15 UTC
      Good morning and thanks to all

      i have no costrictions/obligations in using HTML::TableExtract module, i can use any
      module i want to have my result, but the thing that i need is to have, at the end, the
      possibility to manipulate every cell of the table.

      I have to say that i am not familyar with de/refecenfes of arrays (@$var).

      for example, if i try to print the result of my software, what i obtain is the
      printing of all table or the printing of all column without the possibility of
      printing only one cell (and manipulate it)
      I will start study Mojo::DOM, why do you think is the best solution?

      Limner

        "if i try to print the result of my software, what i obtain is the printing of all table or the printing of all column without the possibility of printing only one cell (and manipulate it)"

        In the link I gave you the code either prints or manipulates before using the value of stored within <td> tags. Did you look at this?

        "I will start study Mojo::DOM, why do you think is the best solution?"

        I didn't say it was the best solution, but it makes dealing with data like this trivial, even for complex selectors (the example previously given has details of a reasonable complex selector).

Re: examine cell's content in HTML::TableExtract
by Cristoforo (Curate) on Mar 26, 2018 at 19:24 UTC
    Perhaps marto has given the better way as I'm not familiar with Mojo::DOM. You might try to get rid of all the text after a carriage return.
    for my $rox ($table->rows) { print join(',' , map {s/\n.*//sr} @$rox), "\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1211774]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-26 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found