Re: Rendering HTML / capturing pixels
by Corion (Patriarch) on Feb 27, 2003 at 08:54 UTC
|
Rendering HTML is far from "easy", especially with the "simple" things like tables and images. You might find some inspiration in the converters that convert HTML to Postscript and/or (La)TEX. For the actual rendering, you will also have to consider CSS and the like.
Under Win32, there are two relatively easy ways to capture the image of a webpage, either you automate Internet Explorer to display the HTML, and then take a screenshot, or you automate Internet Explorer to print the page into a file, and then postprocess that file.
Under Unix, I see only the way of printing to a file, but there is no such nice way of automating a browser as there is under Win32. You might be able to write some XS-glue to automate one of the rendering engines (KHTML, Gecko), but that's not "easy" per se (IMO).
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
| [reply] [d/l] |
|
Some (all? most?) versions or *nix Netscape allow remote control. You start netscape with the "-remote" option. You could probably generate Postscript as Corion suggests with the commands openURL() and saveAs(). I have not tried that particular combination. See this for more information.
Another option would be to get the Mozilla source and modify it directly or see if something in the source allows what you want.
Finally, building on what PodMaster said, there is a tkHTML widget here, but I do not know if there is a perl binding, yet. I have not played with it at all.
HTH, --traveler
| [reply] |
|
Thanks for the information and links. I will look into the Netscape angle.
SpaceAce
| [reply] |
|
If it's just plain text formatting then 'links --dump' might be a way to go.
I guess it depends upon what the motivation for this is, if it supposed to be used as a CGI script, for example, there might not be an X session running for the graphical browser to use..
Steve
---
steve.org.uk
| [reply] |
|
I am not overly concerned with the task being "easy" After all, the easy ones are usually the least interesting :)
I had already considered browser automation, but I would prefer to make the program as standalone as possible. If I have to depend on a browser to do it, I will probably try to work with a *nix version of Netscape as opposed to going for a Win32 solution.
SpaceAce
| [reply] |
Re: Rendering HTML / capturing pixels
by PodMaster (Abbot) on Feb 27, 2003 at 09:05 UTC
|
Basics like tables and images? That's complex enough ;)
You can do it (for the most part) using Wx and/or Tk.
You'd be better off using OLE Automation if you can (if you're on win32).
WxBrowser - a wxPerl HTML Browser
Re: capture what's on the screen
http://search.cpan.org/author/NI-S/Tk-HTML-3.002/
Another idea that might work is to embed perl into mozilla (there was a recent node about it, something about XUL), and let mozilla render it, and then take a screenshot.
( probably won't work, at least not using XUL )
MJD says you
can't just make shit up and expect the computer to know what you mean, retardo!
I run a Win32 PPM
repository for perl 5.6x+5.8x. I take requests.
** The Third rule of perl club is a statement of fact: pod is sexy.
|
| [reply] |
|
| [reply] |
|
Thank you, I will definitely have a peek at that.
SpaceAce
| [reply] |
|
By "basics" I just meant that I won't be dealing with Javascript, CSS or any other extensions/additions/dynamic situations. I realize tables and images are not really "simple" :)
Win32 is not out of the question but I prefer Linux for any kind of development, and especially for PERL. Unfortunately, for the last several versions of wxWindows and wxPerl I have not been able to successfully complete an installation. Even if I jostle things around and get the everything installed and operational, any wxPerl program I write tends to crash with a segfault, even "Hello, world". Perhaps it is time to try again.
Thanks for the link and the ideas.
SpaceAce
| [reply] |
Re: Rendering HTML / capturing pixels
by hiseldl (Priest) on Feb 27, 2003 at 16:47 UTC
|
You could convert your HTML to postscript via HTMLDOC (GPL) and then use Ghostscript.pm (Perl API for Ghostscript) to convert to a ppm. Then convert your ppm to a GIF, which can then be loaded into Image::Magick.
Here is a shell script showing how ghostscript converts a postscript file to a ppm on the command line, you could probably simulate these actions using Ghostscript.pm:
#! /bin/sh
# pstogif
#
# Call it by putting the .ps file name as first argument
# but without the ".ps" extension.
# Ex: for "Intro_Tbl.ps" use "pstogif Intro_Tbl"
#
gs -r72x72 -sDEVICE=ppmraw -sOutputFile=$1.ppm << endinput
($1.ps) run
endinput
pnmcrop < $1.ppm | ppmtogif > $1.gif
...This requires both GhostScript and pbmplus to work.
HTH. :-)
-- hiseldl What time is it? It's Camel Time! | [reply] [d/l] |
|
| [reply] |
|
% kwebdesktop 800 600 perlmonks.png http://www.perlmonks.org/
| [reply] [d/l] |