Good evening dear Monks!
Dear monks, I'm asking you a favour in troubleshooting a perl script!
After having loooked at the HTML::TableExtract Examples for two hours my head is aching!
see the examples here: http://www.mojotoad.com/sisk/projects/HTML-TableExtract/tables.html
i tried lots of own ideas - and now i come back to this place:
BTW: this is one of the best places in PERL-issues. A great place to learn!
I have worked with HTML::TokeParser and HTML::TreeBuilder:: to identify xpath-expression in the last days.
I also read the documentation for HTML::TableExtract, And i also had some introductions in PERL::DBI
Now - at the moment i need to do some PERL-Job in order to get some Text that is stored in HTML-Tables .I guess that this is
a great job for HTML::TableExtract. It can save my backside - since i have to parse more than 6000 files.
The HTML::TableExtract does what it says it does: Extracts specific tables from HTML source code. And it does that really well
i want (need to do this with a site:see here.
i need to get the following
9 (or ten lines)
Schuldaten.
Schulnummer:
Amtliche Bezeichnung:
Strasse:
Plz und Ort:
Telefon:
Fax:
E-Mail-Adresse:
Schuldaten ändern] :(this is UTF8 encoded or what)
Schülergesamtzahl (this is UTF8 encoded or what)
Question: can the HTML::TableExtract can be applied here to!? at the resultpage of more than 6400 shools: (See below)
Love to hear from you
Perlbeginner1
BTW;
See this page: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=672.8924536341191
Note: click all checkbuttons at the bottom of the site: Then you see a result-page with more
than 6400 school-results: see at the right of the site Weitere Informationen anzeigen you can
get detailed information if you click Weitere Informationen anzeigen
see here the code: where i have to extract the above mentioned text:
Note: i only need to get the above mentioned
9 (or ten lines)
... out of these following lines: ( and out of 6400 further resultpage ;-) )
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1
+">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css">
<title>Weitere Schulinformationen</title>
</head>
<body class="bodyclass">
<div style="text-align:center;"><center>
<!-- <fieldset><legend> Allgemeine Informationen zur Schule </legend>
-->
<br/>
<table border="1" cellspacing="0" bordercolordark="white" bordercolorl
+ight="black" width="80%" class='bp_ergebnis_tab_info'>
<!-- <table border="0" cellspacing="0" bordercolordark="white" borderc
+olorlight="black" width="80%" class='bp_SchuleSuchenInfo'>
-->
<tr>
<td width="100%" colspan="2" class="ldstabTitel"><strong>Schuldate
+n</strong></td>
</tr>
<tr>
<td width="27%"><strong>Schulnummer</strong></td>
<td width="73%"> 120571
</td>
</tr>
<tr>
<td width="27%"><strong>Amtliche Bezeichnung</strong></td>
<td width="73%"> Paul-Gerhardt-Schule Ev. Grundschule
</td>
</tr>
<tr>
<td width="27%"><strong>Strasse</strong></td>
<td width="73%"> Sonnenstr. 11
</td>
</tr>
<tr>
<td width="27%"><strong>Plz und Ort</strong></td>
<td width="73%"> 59269 Beckum
</td>
</tr>
<tr>
<td width="27%"><strong>Telefon</strong></td>
<td width="73%"> 02521 950725
</td>
</tr>
<tr>
<td width="27%"><strong>Fax</strong></td>
<td width="73%">
</td>
</tr>
<tr>
<td width="27%"><strong>E-Mail-Adresse</strong></td>
<td width="73%"> <a href=mailto:120571@schule.nrw.de>120571@s
+chule.nrw.de
</a>
</td>
</tr>
<tr>
<td width="27%"><strong>Internet</strong></td>
<td width="73%"> <a href=http://www.paul-gerhardt-schule-beck
+um.de>http://www.paul-gerhardt-schule-beckum.de
</td>
</tr>
<!--
<tr>
<td width="27%"> </td>
<td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? pr
+int $SCHULNR ?>" target="_blank">
[Schuldaten ändern] </a>
</tr>
</td> -->
<tr>
<td width="27%"> </td>
<td width="73%"> Schule in öffentlicher Trägerschaft</td>
</tr>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<tr>
<td width="27%"><strong>Schülergesamtzahl</strong></td>
<td width="73%"> 228
</td>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<tr>
<td width="27%"><strong>offene Ganztagsschule</strong></td>
<td width="73%"> Ja</td>
</tr>
<tr>
<td width="27%"><strong>Schule von acht bis eins</strong></td>
<td width="73%"> Ja</td>
</tr>
<!-- if (!fsp.isEmpty()){
ztext = " ";
int i = 0;
Iterator it = fsp.iterator();
while (it.hasNext()){
String[] zwert = new String[2];
zwert = (String[])it.next();
if (i==0){
if (zwert[1].equals("0")){
ztext = ztext+zwert[0];
}else{
ztext = ztext+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
i++;
}else{
if (zwert[1].equals("0")){
ztext = ztext+"<br> "+zwert[0];
}else{
ztext = ztext+"<br> "+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
}
}
-->
</table>
<!-- </fieldset> -->
<br>
</body>
</html>
can this be done with the HTML::TableExtract
Dear Monks - i love to hear from you! ;-)