Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Web scraping linkedin.com

by alexgrimmy (Initiate)
on May 08, 2015 at 01:58 UTC ( [id://1126047]=perlquestion: print w/replies, xml ) Need Help??

alexgrimmy has asked for the wisdom of the Perl Monks concerning the following question:

I didn't find easy to export my contact information for LinkedIn so I started to look at Mojo::UserAgent for ways to "scrape" my contact info off LinkedIn. To put it mildy, I'm failing miserably. I can get the transactor to pull the default page, but I tried to post my log information but quickly have no idea why what I see returned is different than the "view source" in a browser. Any advice is greatly appreciated.

Replies are listed 'Best First'.
Re: Web scraping linkedin.com
by Your Mother (Archbishop) on May 08, 2015 at 02:19 UTC

    (LinkedIn) User Agreement

    8.2. Don'ts. You agree that you will not:

    • …Use manual or automated software, devices, scripts robots, other means or processes to access, “scrape,” “crawl” or “spider” the Services or any related data or information;…

    I am guessing your actual problem lies with JavaScript stuff regarding their sessions in which case you would need a JS aware agent like WWW::Mechanize::Firefox or WWW::Selenium but it could be as simple as changing the name of the UserAgent so it’s not a flagged bot/agent name. Still you’re not supposed to do this here and I personally wouldn’t help you because it can cast Perl and its fans in a bad light and as poor Netizens.

Re: Web scraping linkedin.com
by Albannach (Monsignor) on May 08, 2015 at 05:03 UTC
Re: Web scraping linkedin.com
by Gangabass (Vicar) on May 08, 2015 at 03:47 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1126047]
Approved by Albannach
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2024-04-25 06:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found