Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Google's MapReduce

by SpanishInquisition (Pilgrim)
on Oct 27, 2004 at 13:18 UTC ( [id://402990]=note: print w/replies, xml ) Need Help??


in reply to Google's MapReduce

I don't know in which programming language MapReduce was written, but could you write the basic functions in Perl?
It's Turing Complete.

Replies are listed 'Best First'.
Re^2: Google's MapReduce
by hardburn (Abbot) on Oct 27, 2004 at 13:34 UTC

    TC deals with computation in terms of functions. Anything with I/O is pretty much outside Turing's view. And you'll be doing a good deal of I/O in a distributed application like this. So just being TC isn't good enough.

    Just how much I/O you'll be doing depends on your application. There are some problems that could get sufficient bandwidth by having an intern load data off a floppy. Others are going to need high-speed fiber optic connections in order to keep up. Some problems are going to be just plain slower than doing it on a single machine.

    In any case, you could certainly do this with Perl. Would it be useful? If the application's bottleneck is I/O, then Perl would probably be a viable choice. However, good candidates for distributed systems are usually not I/O-bound. They're CPU-bound, like "take this DES-encrypted message and try decrypting it with keys x through x + 2**y, and let me know if any of them break the message". For something like that, you want a good number-cruncher language like C or FORTRAN.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      If the application's bottleneck is I/O, then Perl would probably be a viable choice.

      That depends on what kind of I/O we are talking about. There are various kinds of I/O, of which I will mention three:

      1. Network I/O. The least interesting category. If your application is bound by network I/O, there isn't much you can do except upgrade your network.
      2. Disk I/O. Interesting category, and which brings us in the realm of SANs, fibre-channel and multiple controllers. You're right that Perl might be a good for those applications.
      3. CPU-Memory I/O. Perl would absolutely suck for those kind of applications, as Perl is very memory hungry, and gives the programmer very little control over what is stored where. It uses gazillion pointers, storing stuff all over the place, resulting in a low cache hit ratio.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://402990]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-18 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found