Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Graphing HUGE Social Networks

by spx2 (Deacon)
on Mar 20, 2008 at 07:00 UTC ( [id://675170]=perlquestion: print w/replies, xml ) Need Help??

spx2 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I was looking to getting data from a social network represented as a graph,
I finally found GraphViz module which was excellent,
You can see the graph representation I've been able to render so far here : Screenshots (you can click on it to magnify
but be sure no to crash your browser,the image has 6.4MB).
However,some problems regarding the running speeds frighten me (because of problems with
scalability.
I have just rendered a graph with 10000 nodes and 10000 edges(many isolated nodes)
and it took some 2-3 minutes .
But if I use the overlap=>'false' feature which GraphViz comes with it(I also use
radial layout)
Also the biggest problem is that when I run this with a test it will be
about 1000000 nodes with 100000000 edges.
So I am thinking if this would be feasible and also I can't find any good benchmarks on this
GraphViz module,does someone know any benchmarks of it anywhere ?
An implementation using ajax would also be possible I think...but I don't know javascript
very well.


Regards,
Stefan

Replies are listed 'Best First'.
Re: Graphing HUGE Social Networks
by moritz (Cardinal) on Mar 20, 2008 at 07:41 UTC
    I faced the same problem when I tried to visualize my GPG keyring.

    My workaround was to create the graphics only very seldom, and use mapit to display the results. Not very convient, if you ask me.

    I guess the problem isn't graphviz itself (which is quite fast, actually) but the problem domain itself.

    You could try to preprocess your data, and instead of displaying every single node, you could create clusters and display them as one node, with the size being proportional to the number of nodes iin the cluster.

    You won't lose much in terms of graphics, because the human eye can't possibly process 106 nodes.

    But the cluster analysis will take quite some time, I fear.

Re: Graphing HUGE Social Networks
by quester (Vicar) on Mar 20, 2008 at 08:11 UTC

    I'm not sure see what use the humungous single graph at full resolution would be. Hardly anyone on Earth would even be able to open it for decades.

    You might consider starting by generating a very large number of reasonably small subgraphs as individual graphics. If you want to give the impression of a giant map that can be zoomed, you might start like this: take each the image of each subgraph and make a smaller, reduced resolution image of it. Group a number of subgraphs together and tile them, using a convenient tile size, on the order of 100 pixels square or so; then make a reduced resolution image out of that. Lather, rinse, and repeat until you finally get one set of tiles that are collectively small enough to display in a single window, say around 800x600 pixels or so.

    Yes, it is likely to be little more than a blob at that scale.

    Now, on the display side, that's a bit like how map programs such as Google Earth or Yahoo Maps work as you zoom in starting from a view of the entire planet, and they had a very similar problem to solve. You may or may not need or want the fancy Ajax scrolling. For many uses, the old way of clicking on an image map to zoom in at the clicked location is good enough. Either way, I think the underlying idea of tiles that are all the same pixel size but have different scales as you zoom in and out may be the simplest way of handling a graph that is really much too big to handle all at once.

      "Yes, it is likely to be little more than a blob at that scale"

      You could, at the far-out zoom levels, just make it a blob. Several visualizations I've seen of social networks and music turn subgraphs that would be too confusing to display into a colored blob, the size of which is proportional to the size of the subgraph. I'm assuming how they function is they just blob-ify everything more than X distance from the node you are looking at.

      Just food for thought, I think your idea of a sort of Google Earth type of display is great.

Re: Graphing HUGE Social Networks
by mattr (Curate) on Mar 20, 2008 at 17:14 UTC
    Certainly the human eye can process a 1 megapixel photo, so it's the way you represent it. First why are you doing it? (besides that it's fun).

    You might be interested in googling for: visualizing large graphs

    There are two free tools, H3Viewer which is in C++ and Walrus which is in Java3D. These both use fisheye style views.

    Here's a few links to get you started. GINY (scroll toward bottom), Munzer paper, LGL (gallery has a 10^6 edge image), Large RDF graphs.

    So these different approaches tend to use interactive zoomable charts, and show details that can be made sense of when seen even from far away.

    FWIW I remember once I was involved with a company doing Y2K remediation and I saw the visualizations IBM had cooked up. They may not have been as complex as yours but resembled your data, it was to describe safety of code in different programs. Each program was represented by a circle of dots and each circle was drawn like a pie chart in a way, with colorings of segments indicating their safety. It was just a bunch of these pie charts in a big table but from far away you could tell which programs were most complex or dangerous (more red). So maybe you need to think about what is most important to represent.

    That, and also to realize that 100M edges is extremely complex. Probably if you really want to grasp what is going on you should try some different ways, some which shrink groups down to a small number of pixels and other ways that allow you to zoom in more and forget people outside the group.

    For example look at the picture on the large-scale rdf graph visualization page. The Software Options section half-way down explains that GraphViz is aimed at making nice pictures of reasonably sized graphs. Since yours isn't reasonable you might want to consider using another tool or at least not drawing straight graphs but doing something else. It says Walrus is good for around 100K nodes and Tulip for about 1M nodes/edges (they have a social network graph sample).

    Sorry I couldn't be of more help.

Re: Graphing HUGE Social Networks
by redhotpenguin (Deacon) on Mar 21, 2008 at 04:33 UTC
Re: Graphing HUGE Social Networks
by ady (Deacon) on Mar 21, 2008 at 13:21 UTC
    I've worked with fairly large graphs in connection with Y2K analysis.

    I used GraphViz for a start, but increasingly switched to yED for faster generation, more flexible and interactive layout, graph partitioning and analysis, -- and more.

    There's no Perl interface to yED, but it's simple (in Perl :) to generate the raw graph data files in GraphML, GML or a 'free form' XML & XSLT format of your choice.

    Best regards
    allan dystrup

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://675170]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-18 22:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found