Re: Re: Re: Cutting Out Previously Visited Web Pages in A Web Spider


Just another Perl shrine
	PerlMonks

Re: Re: Re: Cutting Out Previously Visited Web Pages in A Web Spider

by eric256 (Parson)

on Mar 11, 2004 at 03:51 UTC ( [id://335697]=note: print w/replies, xml )

Need Help??

in reply to Re: Re: Cutting Out Previously Visited Web Pages in A Web Spider
in thread Cutting Out Previously Visited Web Pages in A Web Spider

If you are saving info on each page you find to a file then couldn't you just check to see if the file already exists before writing to it??

I didn't realy understand your code but you could save each url in a hash. Then just check to see if the url already exists in your hash before reading the page agian. The hash would only get as big the number of sites you spider.

___________
Eric Hodges

Comment on Re: Re: Re: Cutting Out Previously Visited Web Pages in A Web Spider

Replies are listed 'Best First'.
Re: Cutting Out Previously Visited Web Pages in A Web Spider by mkurtis (Scribe) on Mar 11, 2004 at 03:59 UTC
Well, I can't check if the file exists because they are numbered, not named anything identifying. I'm sure a hash would work, but I don't know how. Is there any reason why the current set up doesn't work? Thanks	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://335697]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others avoiding work at the Monastery: (3)

As of 2024-04-19 23:32 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found