Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Subdir globs

by NeverMore (Acolyte)
on May 31, 2000 at 02:12 UTC ( [id://15531]=note: print w/replies, xml ) Need Help??


in reply to Subdir globs

The thing is, in each directory containing these files, there is an index file. I don't know the name of the index file (i'm actually searching for *.tab, *.btab, *.crd, and *.pro files at http://www.nutz.org/olga/main). I've tried educated guesses such as index.pl, index.cgi, index.php, index.php3, index.shtml, etc. None of them have worked. Now, the index lists and links all of the subdirectories and files in the current directory. If I can find out the name of the index file, I can probably make a script that will scan the html code for links to a subdirectory and/or *.tab file and either switch to the index page of that directory or download the file.

-NM

Replies are listed 'Best First'.
RE: Re: Subdir globs
by lhoward (Vicar) on May 31, 2000 at 03:29 UTC
    Looking at that site I see that it is organized in a 3 level heirarchy. What you'll probably want to do is something along the lines of this:
    get the main page
    foreach subpage
      get subpage
      foreach sub-sub page
        get sub-sub page
        parse out any get any files you're interested in
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://15531]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found