comment on

As any given url could be a cgi or an html that uses server-side includes, there is no way to guarentee that even fetching the same url twice within any given timeframe will result in identical return.

Any mechanism for determining whether the results of different urls is the same will have to rely on fetching them and comparing the results. This might lead to some optimisazion in storage by having the 2 urls point to the same data, but pre-determining is just not possible.

Even storing the data offline is fraught with problems in that there is no guarentee that the content of an entirely static page will not be updated 1 day/hour/minute/second/microsecond after you captured and stored it.

In reply to Re: Eliminating "duplicate" domains from a hash/array by Anonymous Monk
in thread Eliminating "duplicate" domains from a hash/array by hacker

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks