http://qs321.pair.com?node_id=647297

koknat has asked for the wisdom of the Perl Monks concerning the following question:

PerlMonks,

I'm writing a program that uses WWW::Mechanize to traverse the LinkedIn social networking site.

Background:
LinkedIn lets you connect to people which you already know - co-workers, former co-workers, college classmates, etc. (friends). I'm connected directly to 86 friends.
Each of my friends is connected to their friends. By clicking on a friend's profile, I can see who their friends are. LinkedIn tells me that I have 2400 of these 2nd-degree connections (friends-of-friends).

I'd like to put this information into a Perl data structure. Once I've done that, it's easy to process the data:
* Are there any people which many of my friends know, which I don't?
* Does anyone from my waterski club know anyone at work?

I've used WWW::Mechanize before. I can pull in a page from LinkedIn and parse the HTML. The problem is that the page of my connections does not list my friends in HTML. It looks like it's running a plugin.
Here's the page. You need to be signed in to LinkedIn to see it: http://www.linkedin.com/connections?trk=network_yourcnx
Does anyone have an idea of how to solve this problem?

Thanks,

- Chris Koknat
  • Comment on WWW::Mechanize to traverse LinkedIn social networking site

Replies are listed 'Best First'.
Re: WWW::Mechanize to traverse LinkedIn social networking site
by Cody Pendant (Prior) on Oct 26, 2007 at 02:45 UTC
      That helps a lot. Thanks Cody.
        koknat, Can you please share the code? Thanks, Tal.
Re: WWW::Mechanize to traverse LinkedIn social networking site
by andyford (Curate) on Oct 26, 2007 at 01:46 UTC

    Read the Terms of Service carefully. Extrapolating from what I know about other similar sites, you might find that, from an well-meaning person's standpoint, the site's rules are a bit stricter than you might imagine.

    non-Perl: Andy Ford

Re: WWW::Mechanize to traverse LinkedIn social networking site
by aquarium (Curate) on Oct 26, 2007 at 01:52 UTC
    Not sure exactly what you're doing in the code but i guess you're simply retrieving the initial page and trying to do likewise for the list of friends. more than likely you'll need to send a POST request for these links, with appropriate parameters. the parameters can be sent separately or in the url of a POST request. In any case, to be sure, record a few click-through sessions with something like Ethereal running...to see what is exactly sent and received. looking at browser alone doesn't tell full story.
    btw...how are you going to handle the condition of "your friend's friend has you as their friend"..and similar conditions? if not careful, you can make endless loops there, or get wrong answers to questions. in fact, a relational database (structure) may not handle this very well at all...so a hierarchial DB (structure) could be better.
    good luck
    the hardest line to type correctly is: stty erase ^H
      Thanks for the tip about Ethereal.

      A clarificiation. I'm not trying to crawl the entire site. I'm just trying to jump to each of my 86 friends, and record the names of their friends. This prevents the endless loop problem from happening.

      Also, LinkedIn only lets you see your friend's friends (2 degrees of separation). You can't even see your friend's friend's friends.