Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

OT: Preserving Information

by arashi (Priest)
on Oct 07, 2002 at 21:58 UTC ( #203509=perlmeditation: print w/replies, xml ) Need Help??

A few years back I came across a site that was a fairly interesting read. Since then, I've sent many friends who were interested in the topic to the site. I recently went there and found that the site isn't really updated anymore, the author removed his email address, and the forum is almost a ghost town. I began to worry that this good resource might disappear, and I want to preserve it if I can.

Now, I eventually found the author's email address and sent him a message about this, but this whole situation got me to thinking, what exactly do we do with dead websites, if the information can be of use to people?

Here's a sample scenario:
A website falls into disrepair and the author can no longer be contacted, yet you have an interest in the content and feel it should be preserved, what should you do?

Should you just let it fall into obscurity?

Should you take it upon yourself to preserve the information?

At what point would it become stealing, even if you have good intentions?

Is it ethical to reproduce the information without permission if you can't get the permission?

Personally, if I felt that the information was worth saving, and I had the resources to do so, I would mirror the site and write a disclaimer to the effect that my attempts were only to preserve the information, and that it would be taken offline at the authors request.

What would you do?

Replies are listed 'Best First'.
Re: OT: Preserving Information
by mjeaton (Hermit) on Oct 07, 2002 at 22:05 UTC
      Unfortunately, while their archive is massively broad, it's not very deep. With most sites you're lucky if you can follow two internal links in row.

      Makeshifts last the longest.

Re: OT: Preserving Information
by BUU (Prior) on Oct 07, 2002 at 22:54 UTC
    I personally would mirror it, post a disclaimer saying "This is all mirrored content", and/or attempt to notify anyone whose content it was.
Re: OT: Preserving Information
by Abigail-II (Bishop) on Oct 08, 2002 at 09:35 UTC
    Republishing it without having rights for it (either because of the license, or because the copyright holder gave you permission) is IMO unethical. It's probably illegal as well, it seems a clear violation of copyright and license laws. It's certainly not very smart to do - we all rely on those laws to make "open source" work. Why give the entertainment industry ammunition for their crusade?

    But making copies for you own purposes is fine. You are allowed to. But don't share without permission.


      All of Abigail-II comments are right on the money. A good resource for this is Brad Templeton's10 Big Myths about copyright explained.

      Personally, I'm surprised that caching sites such as Google have survived legally unscathed. I find it interesting that Yahoo avoids using this feature in their Google powered results. Perhaps there will be battles on this in the future.

      For my own work in hymnology, I spend a lot of time with old publications that were once under copyright but have now gone into the public domain. I am very careful to restrict myself to editions that were published long enough ago so that they have clearly fallen into the public domain everywhere in the world. It is possible to take a public domain work, alter it, and claim copyright for the edited version (this is how critical edition publishers of music such as Henle and Wiener Urtext make their living).

        published long enough ago so that they have clearly fallen into the public domain
        Tomorrow, the Supreme Court is hearing a case that will hopefully shorten the absurdly long time it currently takes for this to happen.


      What if there is no clear set publisher? Especially if the original data could be considered to be "Common Domain" in the first place, such as a thread in a bulletin board or discussion group, such as perlmonks. Look at, he is republishing it, but he is no way claiming ownership or and rights pertaining to it, he is merely providing a different interface to the same data. I agree this arguement starts to fall apart when you start to consider semi-commercial websites, ones that pay authors to write articles for them, in which case you would need to make more of an effort to attempt to contact the original author.
        If there's no clear publisher, you should assume there's a copyright holder. Why? Because that's the default. Anything that is created is copyrighted - unless it's clearly marked otherwise.

        Threads in a bulletin boards are NOT in the public domain (I do not know what "common domain" is) by default. Someone wrote them, so someone does have the copyright. That it may be hard for you to trace down the author doesn't mean the author doesn't have rights. Note also that whether or not something is copyrighted has nothing to do with commerciality. Nor does it mean that if you are "non-commercial" you suddenly have more rights. It might make a differences for the amount of damages you have to pay though if you'd lose a copyright suit though.

        As for republishing the same data with a different interface, there have been court cases against websites doing exactly that. I haven't heard of cases where the republisher won such cases, but I know of cases where the republisher lost.

        Note also that continuing to republish even after the original documents have been removed goes even a step further.

        Specially in cases where you cannot contact the original author, it is questionable that you should keep republishing the documents after they have disappeared. You do not know the reason why the documents are no longer available. Perhaps the information is stale, or outdated. Or perhaps they contain errors. Or it would actually be illegal to still publish that data.

        I see many reasons not to republish, both legally and morally. I don't see many reasons to do - you don't have an absolute right to the fruits of someone elses labour (just like noone has the rights to your labour (unless you have a contract)).


      How about for instance Google's cache and similar? Unethical? Illegal?

      However much this may seem like I am trying to talk back here, I am (maybe for once) not - I am sincerely curious. :) Just too tired to come up with a good way of asking.

      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
        I don't have a problem with a pure cache - one that will check whether information is stale before serving it. But caches that don't check the backend are a gray area. Personally, I don't have a problem with them, as long as they have a reasonable expiration period (that is, if the backend data is removed or modified, the cache should reflect that after a not-to-long period). But if a cache doesn't expire documents that have disappeared, or where the expire period is unreasonably large I think they are wrong. It might have legal problems as well.


      Illegal? Quite possibly, although I doubt any trouble would come of it 99.99% ofthe time (a cease-and-desist at worst).

      But I'd be inclined to consider it "ethical", depending on the circumstances of course. Stealing a page from yesterday's New York Times would be a bad thing. Mirroring an interesting page on the rise and fall of the Obscure Empire of 1300BC, which hasn't been updated in 3 years and whose domain is about to expire - that's completely different. Chances are the author simply forgot about it - or even died. The fact that the site was allowed to expire shows the author didn't care about the information being out there, or it would have been taken down. And the information contained on said website may be very useful for a few people. At that point I'd consider it positive karma to keep the page up for posterity. Of course, it becomes an exercise in statistics to figure out how ethical it may be - Probability-Author-Would-Object vs Usefulness sort of thing.

      Of course, if you don't feel it *has* to be on the web, perhaps it would be better to simply make a personal copy. You could send it to any friends you want to see it, or mirror it on an unlinked-to website. That should fall under 'fair use' any way you look at it.

      As far as legal liability goes though, I'd predict you're 99% safe unless you ignore a request to take it down.
        Making a personal copy is fair use. Quoting in the right context is fair use as well. Sending the copy of friends make that the copy is no longer personal. You're redistributing. Then it's no longer a personal copy - and the fair use clause doesn't apply.

        Compare it with GNU software. You're allowed to do modify it any way you see fit. You don't have gave away the source to the modifications. It's your personal copy. However, as soon as you distribute it - even to someone you label as "a friend", the license kick in. It's no longer a personal copy.

        As for your example about the Obscure Empire, I don't pretent to know better than the author whether something should be preserved or not. It's a slippery slope that I'm not willing to cross. I'm worried enough already what certain industries want to do regarding to copyrights - I don't think we should join hands and break down those laws from two sides.


Re: OT: Preserving Information
by Aristotle (Chancellor) on Oct 07, 2002 at 23:11 UTC
    What BUU said, basically. I have made local copies of a lot of articles, papers or other pages of information I want to make sure I won't lose. At their top I put small link as in

    Originally found at <http://foo/bar>

    Makeshifts last the longest.

Re: OT: Preserving Information
by mstone (Deacon) on Oct 09, 2002 at 17:13 UTC

    First off, the standard disclaimer: IANAL.

    Fair use also contains provisions for copying information that you can't get by any other means. It's a tricky area, because obviously you're not allowed to copy and sell out-of-print books. I'm pretty sure fair use covers copying an out-of-print book rather than buying it, though, since there's no practical way to acquire that book except by making a copy. By contrast, copying a book that's in print rather than buying it would be a clear violation of copyright.

    With regard to the original question, I'd say your best bet is to start by trying to contact the original creator, and ask permission to mirror/redistribute the work. If you do make contact, and do get permission, you're free and clear, and we don't have to talk about fair use at all.

    If you can't contact the original creator, your safest bet is to link to the existing site from your own. Use the original as a historical reference, and build your own body of new concent around that. Make a personal-use copy that can serve as a mirror in case the existing site goes away, but only put that mirror online if the existing site does go away.

    If you do put up a mirror of no-longer-available content, put a big disclaimer at the top of the document tree citing the original creator and listing the URL where the information used to live. Then add a 'statement of good faith', saying you want to contact the original creator and get permission to mirror, but haven't been able to do so yet. Also announce that you will pull the information offline if the original creator contacts you and tells you to pull the plug. Then post your own contact information, so the original creator can reach you if he ever stumbles across your mirror.

    That approach should keep you safe. In cases where people can't get the information any other way, and you've tried to contact the original creator but failed, the courts will probably allow you to run the mirror in good faith (i.e.: assuming you would have gotten permisssion to mirror) until the original creator says otherwise.


    If you really care about the information, don't just redistribute 'dead' documents. Observe the spirit of the Open Source/Free Software movements, and do something with the information. Use what's there as a foundation for something new. The more of your own sweat you put into a project, the more it becomes your own, both emotionally and legally. Build clean versions of the original pages, based on your own knowledge and research. Add new information. Patch holes in what's there. Cite additional sources for the same information. Get other people talking about the subject, and post those conversations with permission.

    Then make damn sure you've given others permission to build upon your work.

    The best way to preserve information is to keep it alive. Maintaining a dead site is like tending a mummy. With appropriate care, you can make them last a long time, but at the end of the day, they just sit there being dead at you. Tending living information is like raising a family. The bloodline will go on indefinitely, even though individual members pass away.

Re: OT: Preserving Information
by kryberg (Pilgrim) on Oct 10, 2002 at 16:49 UTC
    I recommend checking out the copyright website section on Public Domain. It states, in part, that Internet works subject to copyright laws "include news, stories, sortware, novels, screenplays, graphics, pictures, and even email."

    Also, checkout the U.S. Copyright Office Copyright Basics (Circular 1). Among many other things, it states that "Copyright is secured automatically when the work is created...."

    While at the U.S. Copyright Office website, search for fair use. Fair use allows the limited use of copyrighted material, but is a rather gray area.
Fair Use
by kryberg (Pilgrim) on Oct 10, 2002 at 19:33 UTC
    Lots of people have mentioned Fair Use in this thread. Not that I feel anyone is incorrect in what they've said about it, but it is best if people go straight to the source of Fair Use information, rather than going by hearsay.

    Fair Use is defined in the Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code - Section 107 Limitations on exclusive rights: Fair use and interpreted on the U.S government's copyright web site Fair Use page.

    Go to the source.
Re: OT: Preserving Information
by Anonymous Monk on Oct 11, 2002 at 00:57 UTC
    Every effort needs to be made to contact the site's authors to see if it possible to take it over. I have tried this on two separate occasions and both times I was asked for money. Both times I refused to pay. One site remains available but has not updated material in months, while the other has simply ceased to exist.
    Reproducing information without permission is always dangerous. Original material has a copyright attached to it, which lasts for many years after the death of the author.
    More importantly, as we have seen with technology such as GIFs, the intentions of the originator do not determine the conduct of the heirs. In other words, someone could come along and declare the material is theirs and they want to be compensated for its use. This is a problem whether or not you derive income in any form from your website. It may be possible to approach the problem the same way as Google. They often maintain a local copy of pages they index. The pages, however, are only accessible through a search engine and are not otherwise presented as a structureed product.
    The key to this is the site's robot exclusion policy. A trick of journalists and academics is creating new work based in part on the prior work of others. Copyright allows material to be quoted in this manner. You just need to make sure the new work is substantially your own. Your work could be a review or a critique of the original with substantial quotes included to help make your point.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://203509]
Approved by Steve_p
Front-paged by tye
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2023-03-20 15:53 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (59 votes). Check out past polls.