Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Thwarting Screen Scrapers

by tjh (Curate)
on Jul 18, 2002 at 14:41 UTC ( [id://182827]=note: print w/replies, xml ) Need Help??


in reply to Thwarting Screen Scrapers

From the subject line I expected a conversation on hijacking content, possibly RSS or other news feed issues, copyright arguments, maybe even allusions to the U.S. entertainment industry trying, with vehement avarice, to technologically block any re-recording of anything (lol), and other things... :|

Instead, I can't tell if you are a merchant that is somehow being disintermediated by your own reseller or what - even though you're still making the sale. I'm confused. If you're still making the sale and collecting the payment, I don't get it. Has someone pre-empted your front end? Why would they do that? If you're being targeted and your site hijacked that's different.

If you have soft content, news or other written content, that someone is scraping and calling their own either by redisplaying on their own site, this is a different matter - a legal one without good Perl-specific solutions.

Did you state your problem exactly - or is this a drill?

Update: just read your follow up.

The tech tactics are being listed by others (dynamic session id's per page call, dynamic field names, etc.) In an ideal world all session mgmt and user authentication would be application level with high granularity - down to each page or function call from the client, every time a request arrives. I know of no current solution, Perl or otherwise, that solves this completely. Would love to see one though.

On the other front from your example, I have had this exact experience 2 times. All the technology solutions in the world won't stop someone who relentlessly intends this fraud. You have to detect them, copy the fraudulent material, get witnesses - do whatever your lawyer tells you to do about the copyright violation (and hope it's domestic). In one of my experiences, a simple email solved it. The other got a little warmer...Server Error (Error ID 4104269a2048911)

An error has occurred. The site administrators have been notified of the problem and will likely soon fix it. We thank you, for you're patients.

Replies are listed 'Best First'.
Re: Thwarting Screen Scrapers
by Abigail-II (Bishop) on Jul 18, 2002 at 15:31 UTC
    In an ideal world all session mgmt and user authentication would be application level with high granularity - down to each page or function call from the client, every time a request arrives. I know of no current solution, Perl or otherwise, that solves this completely. Would love to see one though.
    Yes, this level of authentication can certainly be done. I'm currently involved in a large project, where this is being done, and we even go further. Unfortunally, I can't tell you more.

    It's not simple, and it takes large investments. The question isn't "can this level of authentication be done", the question is "how much are you willing to pay?" (pay in a broad sense - mostly costs to hire people).

    Abigail

Re: Re: Thwarting Screen Scrapers
by kschwab (Vicar) on Jul 18, 2002 at 14:54 UTC
    You're right, I haven't included all the details. I was trying to keep this generic enough to apply in more than one situation.

    Basically, I'm selling something direct via a website. I have no resellers. A set of people I don't know at all have created their own websites, but they are nothing more than a shell around my website. They make money by adding a "service charge" and billing it to the customer. ( Without adding any apparent value )

    They take all the http and https requests from the Customer, via their own forms, and then take the data and make simulated browser requests to my site to make the purchase. Other areas, such as feedback, etc, are directed to my site as well.

    They obviously feel they are doing something wrong, since they hide behind unprotected web proxy servers and use other "stealth" techniques to make stopping them difficult.

    If it were just one party, a legal approach would work. Unfortunately, this situation happens over and over again, with a different set of front-enders, sometimes with an offshore website.

      I see (I think). They're processing their own forms (order and payment) themselves, then, in turn, mapping the same sequence on your site. Does this mean that every time an order is made and paid on their site that they cause the same on yours? Are you getting the original customer name, addy, etc., or would you know?

      Real-time detection is possibly the first goal. Unless there is something unique you can detect in the incoming 'ghost' client that you can block with, maybe you can work to detect duplicate payments, shipping addresses etc on the tail of the transaction - which assumes that your new 'partners' are ordering from you then re-shipping to their customer.

      If they are taking the customer data from their own forms and re-submitting it to you, including payment (CC#?) info to you - with a markup - how are they collecting their markup? If they are collecting their full payment using the customer's payment data, THEN resending that same payment data to you, effectively double-billing the buyer, this is a much different type of problem and you should be contacting law enforcement.

      From the looks of your other responses in this thread - methinks you need to do both - tech and legal. If you have a product that is inspiring so much theft/fraud, you need to protect it immediately - but not so protected that it can't be sold at all... :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://182827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-03-29 15:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found