http://qs321.pair.com?node_id=54891

cburns has asked for the wisdom of the Perl Monks concerning the following question:

I want to place links in a web page to various documents (.ps, .pdf, .doc, .ppt, etc) without revealing the location of them on the server. I imagine this is possible with perl/cgi. Suggestions? Are there any modules that do this? Regards, cburns
  • Comment on Serving files without revealing their location

Replies are listed 'Best First'.
Re: Serving files without revealing their location
by jepri (Parson) on Jan 29, 2001 at 09:00 UTC
    beatnik has it. This is certainly possible and often done. You are looking for what is known as an 'anti-leech' script. It is most often used on sites that offer many *ahem* popular files for downloading. The script prevents bulk downloaders from getting to the files.

    It is usually implemented as a form, where you have to type something in (often a word from another page). The script then sends you the file using the correct headers. One example of the headers is here.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Re: Serving files without revealing their location
by AgentM (Curate) on Jan 29, 2001 at 02:38 UTC
    if you're using a real web server like apache, then you should look into mod_rewrite or mod_redirect (which are Apache modules and have essentially nothing to do with Perl). The latter is easier to use but less flexible. They will allow to spoof the URL to whatever you want and return a file. You could send all requests through a CGI, but I see that as wasteful and recreating what the web server was supposed to do in the first place.
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      If you're using Apache, then setting up an Alias with a CGI script handling the requests will do the trick. I've used this often to create fake, dynamic paths that are parsed by Perl to determine what script to run or document to return.
Throw-away URLs perhaps?
by orkysoft (Friar) on Jan 29, 2001 at 09:14 UTC

    Give the client a URL with some 'random' element in it, and at the server-side, record this same 'random' element. Once the URL is accessed once, that access is recorded (say, by a CGI script that the URL calls), so future attempts at using that 'random' value won't work anymore.

    http://www.somewhere.com/getfile.pl?file=win2k+without+bugs&key=RND#

    Getfile.pl must know that number is a valid key, and once it's used, it should be either marked invalid, or just forgotten about.

    If you mark it as used, the program can tell the user the key is invalid, and let him contact tech support (connection could have been lost, resulting in an incomplete download), so they can analyze logs, and decide whether to re-enable the key.

    Update: tilly prefers pathinfo parameters, because of some bug with Excel not liking ? in URLs. (I don't know why Excel would be involved here, though.)

    http://www.somewhere.com/getfile.pl/goodie/41579087

    Okay, for completeness, here's what tilly says:

    • tilly says It looks like a standard URL poisoning scheme. Be warned that if the download is for Excel, there is a bug if the URL contains a ? in it. So I prefer the pathinfo approach for downloads.
    • tilly says If you send a file named foo.csv to a Windows machine which has associated that with Excel (which most have), then it will prompt (at least on IE) for download vs run. The run is broken with IE 4.0 if their temporary internet file...
    • tilly says Where was I. Oh right, if the temporary file has a path with a space (eg "Temporary Internet Files" - the default on NT) it will launch Excel multiple times, and none will have the right file. No idea why.
    (This is why you should consider the pathinfo approach.)

Re: Serving files without revealing their location
by Trimbach (Curate) on Jan 29, 2001 at 09:18 UTC
    As has been stated by a couple of people already, you can easily set up a cgi that takes an id #, looks up the number, and then serves up the appropriate file. "Anti-leech" techniques aside, there is a feasible, practical way to serve files without revealing their location if you need to: it's called "passwords" and "encryption."

    I setup a secure order form for a client once. It was all very basic, collecting order information and credit card numbers via an SSL form. The client, however, needed an easy and secure way to get to the order info, so I set him up with a password-protected CGI accessed through SSL that took him to an admin tool that let him view or download any of the orders he wanted. The trick here is for extra security none of the order files (it was all a flat-file affair) were stored in web-accessible directories... a web browser couldn't even point to the files if it wanted to, but my CGI could read them in and spit them out just fine, and because it was encrypted and password protected it was (reasonably) secure.

    It's easy to take a similar approach to any situation that requires restricted access to files. Want to let someone download something only one time? Give them a password and set your CGI to only let that password in 1x. Want to give someone access to a file that you can revoke at anytime? Piece of cake. There ARE reasons for doing such things; the exact solution you pick will depend on what you're ultimately trying to accomplish.

    Gary Blackburn
    Trained Killer

Re: Serving files without revealing their location
by Elgon (Curate) on Jan 29, 2001 at 01:56 UTC
    I have absolutely no idea how to do this using Perl or any kind of CGI.pm trick, but I would do this using symlinks on the server filesystem:

    Use the docroot as a kind of holdall with index.html or index.cgi directing them to a page in a subdirectory. The place symlinks in the docroot to all of the desired files.

    I am NOT suggesting you do this, though: as a solution it is highly crufty. In spite of making a mess there are probably other good reasons not to do it of which I am unaware. It was just something that occurred to me.

    Elgon

    Update: One thing I forgot: Why? (See also Salvadors reply about mapping below.)

    Update #2:Okay, now I understand the kind of question we're dealing with here. What I would do is use a script which basically takes your account details and uses some kind of ID authentication function (password, credit card details, account details etc...) to take an encrypted file from a directory visible to the webserver, unencrypt it and return it to the client: then it doesn't matter if the URL is visible 'cause no bugger can unencrypt it without the key. Of course they're welcome to try a brute force attack if they want...

    On the other hand, if they really want to hand off the data to someone else, they can just take the copy they have, copy it and email it to them. Never underestimate a resourceful idiot!

      This question has also semi interested me as well. I can give you my own version of why...You offer an ecommerce solution to your customer (reffered to as 'the merchant'). The merchant is interested in selling intangible items such as information (websites, documentation) or files (apps and downloadable documents) to his customers. The problem is he doesn't want customer A purchasing product X and posting/spreading the link he used to get to product X OR using product X as a means of guessing links to product Y or product Z.
      Now lets put a real twist in this whole thing:) Some merchants don't mind housing files on a server owned by you, while other merchants can't and/or won't leave their merchandise on your machines, they want your ecommerce solution to point to something off site.
      Now the demand I get all the time is "I don't want my customers to know the link" or something along those lines so they in essence, get a one time download. The closest I can get is severely screwing up the address and making it near impossible for a non-technical customer to identify the address, but I have yet to even imagine a possibility of truly "hiding" an address.
      I think I could write a decent portaling system if the files were housed on my side, but 90% of the requests are for an offsite file/location.

      I can sort of see the logic in why this is impossible. "If you WANT someone to have something, why are you trying to hide it at the same time?"


      How is that for a valid "why?" :)
      hehe


      2501
Re: Serving files without revealing their location
by salvadors (Pilgrim) on Jan 29, 2001 at 02:01 UTC

    I want to place links in a web page to various documents (.ps, .pdf, .doc, .ppt, etc) without revealing the location of them on the server.

    I'm not sure that your question really makes sense. If you're trying to hide their URL then you can't. By definition this information has to be made available so that your web browser can actually fetch them. Even if you could find some really neat trick to hide them from Netscape and IE etc, I could write my own "browser" (ObPerl: using LWP, of course!) that would be able to get at the file.

    If you mean that you want to hide the actual location of them on the server, ie not show that they're in /www/docs/ or whatever, then your webserver should take care of that for you. No-one can easily know what your base directory is without reading your config file, and you can always remap different URLs to point to different physical locations anyway.

    Tony

      Let me clarify: Say the file is "www/docs/file.pdf" Couldn't I write a cgi that would recieve as a paramenter the file "id", obtain the filename from a db with the given id, then open a filehandle for that file, read its contents into a buffer, and print the buffer contents to the response? The user would only see "cgi-bin/get_file.cgi?id=xxx" in the url.
        This is certainly possible, but I have to repeat others' questions: why?

        The solution to your problem (which, itself, is still unknown to us) is kind of odd. All resources on a web server are accessed with a URL. All you're doing is substituting one URL (that maps to a PDF filename of the document) with another (that maps to an ID representing the same PDF data). The fact that that seems kind of non-sensical makes me wonder if you're trying to solve a problem that you don't really have, or if the solution you think you want is going in completely the wrong direction. The solution you outline here is just adding another layer of complexity to the file delivery process, in the form of CGI, which will make things a bit less efficient. If you wanted to go this route, mapping an ID number to a database entry or physical document, I might consider using apache's mod_rewrite or write your own query handler to do the ID -> filename mapping. At least that way you won't be pumping all of this through a CGI script. This seems kind of odd.

        So I'm curious: what are you trying to accomplish by doing this?

        Yes. You could do that. But then that address would *be* the location of the file. Why would http://www.tmtm.com/cgi-bin/get_file.cgi?id=6 actually be any better than letting it be http://www.tmtm.com/docs/file.pdf?

        If you don't want people to be able to fetch the file more than once, then that's a slightly different question. In that case you could wrap code around the get_file command to not allow the use of the URL more than once, but that's going to be more tricky than a simple database lookup to map the ID to the location.

        Perhaps you could explain *why* you're trying to do this, and we might come up with a better solution.

        Tony

Re: Serving files without revealing their location
by Beatnik (Parson) on Jan 29, 2001 at 02:20 UTC
    My guess is that you'll have to define a HTTP header with the appropriate Content-Disposition: name="filename" line, as you would with file uploading (described in RFC 1867).

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
      There are several things that are not clear to me:

      Is this data static (i.e. in memory) or is it a stream (read in a byte and the write a byte out).

      My first idea is to calculate the Standard Deviation and then through out any numbers that are outside some multiple of the the standard deviation from the mean.

      -- gam3
      A picture is worth a thousand words, but takes 200K.
Re: Serving files without revealing their location
by EvanK (Chaplain) on Jan 29, 2001 at 23:33 UTC
    Simple. the script should find the MIME type of the file in question, open it (the file), store the contents in a variable, then print the MIME header followed by the file contents. so the browser is never actually getting to the real file, the script is just printing it's contents to the browser. That should work, but i'm not 100% sure, what does everyone else thing?

    ______________________________________________
    When I get a little money, I buy books. If I have any left over, I buy food and clothes.
    -Erasmus

      oh, and I forgot to add, if you don't want the user to be able to spread the link, then have the script as say:
      http://www.site.com/cgi/bin/get.cgi?file=7&id=someencrypted_string
      where someencrypted_string is, say, the user's ip encrypted in des or something. or if you want to get really technical, say so they only have a certain amount of time to dl the file themselves, you could encrypt the date into it as well as the ip.  >:-]

      ______________________________________________
      When I get a little money, I buy books. If I have any left over, I buy food and clothes.
      -Erasmus

Re: Serving files without revealing their location
by elusion (Curate) on Jan 29, 2001 at 02:13 UTC
    I know it works like you're talking about, I'm just not positive how to do it. This is just a guess, but I think it'll work...

    You can probably write a perl script that opens the file from a cgi param. You can then print the file to the browser. Then if you instruct your users to click save link as and save it as a new filename they should be able to have it work. But again, this is just my guess.

    - p u n k k i d
    "Reality is merely an illusion, albeit a very persistent one." -Albert Einstein

A reply falls below the community's threshold of quality. You may see it by logging in.