Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: wgetas - download many small files by HTTP, saving to filename of your choice

by ambrus (Abbot)
on Oct 22, 2011 at 22:06 UTC ( [id://933109]=note: print w/replies, xml ) Need Help??


in reply to Re: wgetas - download many small files by HTTP, saving to filename of your choice
in thread wgetas - download many small files by HTTP, saving to filename of your choice

You could do that if you want to. I just thought I might possibly want to add extra fields to the file format later.

But yes, I'm aware of this trick. I'm storing my collection of (public) bookmarks in a text file where each line has the URL as the last whitespace-separated field, and generating HTML pages automatically uploaded to my homepage from them.

For example, a part of this source file looks like this. (First word is the level, as this is organized to a hierarchy; the URL null: is used for a heading that's not a link.)

2 Könyvtárak (libraries, hu) null: 4 Magyar Országos Közös Katalógus null:next 6 katalógus (MOKKA) http://webpac.mokka.hu/WebPac/CorvinaWeb 6 nyitólap http://mokka.hu/ 6 tagkönyvtárak http://mokka.hu/?q=mokkaegyesulet/adatok/nevjegyzekek/ +tagkonyvta rak 4 Fővárosi Szabó Ervin Könyvtár null:next 6 katalógus (FSzEK) http://saman.fszek.hu/WebPac/CorvinaWeb;?action=ad +vancedsear chpage 6 nyitólap http://www.fszek.hu/

This is then translated to some HTML that looks like this when rendered (I'm not pasting the HTML source here as it's a bit hard to read):

* Könyvtárak (libraries, hu): + + * Magyar Országos Közös Katalógus katalógus (MOKKA) nyitó +lap tagkönyvtárak + * Fővárosi Szabó Ervin Könyvtár katalógus (FSzEK) ny +itólap

So anyway, here I'm using the fact that the URL is always exactly the last word. If the last word of a line doesn't look like an URL, the rendering script gives me a warning.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://933109]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-04-24 10:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found