in reply to Re^2: POE::Component::RSSAggregator breaks LWP::Simple::get in thread POE::Component::RSSAggregator breaks LWP::Simple::get
Once you invite it into your code, forget everything else you know about writing Perl. ....
I call bogus.
File and directory access is generally insignificant. If you're writing high-performance programs that simply can't wait for the filesystem, POE and IO-AIO get along nicely.
As we've seen elsewhere in the thread, POE-Component-Generic can wrap an asynchronous interface around all those blocking modules you know and love, assuming there isn't a POE::Component to your liking and you're not sufficiently motivated to write one.
Oh, right, and some POE components aren't maintained as well as they might be. Welcome to CPAN.
Because once you go the 'build our own cooperative scheduler' route, everything--and that means everything--in your program has to be be broken up into small bite-sized chunks, and retain state across those chunks, so that the non-preemptive scheduler doesn't get locked out.
Semi-accurate but exaggerated.
It really depends on the program in question. It's generally accepted that well-written code is already decomposed into bite-sized pieces: small functions and methods that do one thing well, and larger ones that are composed of glue logic between calls to the smaller ones. In this case, you may find that a lot of your code will work as-is.
POE provides ways to maintain state between cooperative callbacks, but the POE-agnostic parts of a program don't even need that.
A program that requires major restructuring to work in a cooperative environment may already have bigger problems.
It like using Lego. So long as everything you want to build has straight sides and is either a multiple of some fixed sized unit, or there is a (usually expensive) off-the-shelf part to suit, it's great. But if you want curved sides; or cirular holes; or 120° angles; or anything else that is vaguely custom--you're stuffed.
Completely bogus.
Lego are proprietary hardware. Most users can't fabricate their own bricks. POE is a suite of free, open-source software libraries, with internal and external extension APIs. If you can't find just the right component, you're encouraged to write it and share it with the rest of the world.
Re^4: POE::Component::RSSAggregator breaks LWP::Simple::get
by BrowserUk (Patriarch) on Jan 20, 2007 at 15:49 UTC
|
File and directory access is generally insignificant.
Sorry, but searching or parsing a large file, or scanning a directory tree are certainly *not* insignificant.
And your suggestion that I might need to dump binmode, close, closedir, dbmclose, dbmopen, die, eof, fileno, flock, format, getc, print, printf, read, readdir, rewinddir, seek, seekdir, select, syscall, sysread, sysseek, syswrite, tell, telldir, truncate, warn, write, -X, chdir, chmod, chown, chroot, fcntl, glob, ioctl, link, lstat, mkdir, open, opendir, readlink, rename, rmdir, stat, symlink, sysopen, umask, unlink, utime (and more) et al. and instead learn to use the aio_* varients from IO::AIO, along with all the entirely new and very different coding methods it requires:
use IO::AIO;
aio_open "/etc/passwd", O_RDONLY, 0, sub {
my $fh = shift
or die "/etc/passwd: $!";
...
};
aio_unlink "/tmp/file", sub { };
aio_read $fh, 30000, 1024, $buffer, 0, sub {
$_[0] > 0 or die "read error: $!";
};
# version 2+ has request and group objects
use IO::AIO 2;
aioreq_pri 4; # give next request a very high priority
my $req = aio_unlink "/tmp/file", sub { };
$req->cancel; # cancel request if still in queue
my $grp = aio_group sub { print "all stats done\n" };
add $grp aio_stat "..." for ...;
no matter how good that module may be, confirms my
"forget everything else you know about writing Perl" statement exactly.
So *not* bogus.
POE-Component-Generic can wrap an asynchronous interface around all those blocking modules you know and love,
Can I use Devel::SmallProf on POE code and get sensible, usable numbers? How about Devel::Trace? Or Devel::Size? Or Algorithm::FastPermute on large set with its callback? Or Data::Rmap with it's callback? Or Inline::C? Or Win32::API::Prototype? Or the analysis functions in Math::Pari or the datatset manipulations or graphics functions in PDL? Many, many others.
All of these modules have calls that can run for a substantial amount of time (minutes or hours), but need to run within the same process as the data they are operating on. Shipping large volumes of data to another process, and then reading the results back again is not efficient for those few for which this could be done. It just doesn't work at all for most of them.
So, I say again. This "forget everything else you know about writing Perl" is not bogus. You have to learn an entirely different way of working. I'm not saying that POE isn't brilliant for working that way. I'm am saying that it requires a programmer to learn an entirely different way of working that is much harder to learn, much harder to code and much harder to debug than the standard linear flow--do this, then do that, then do something else--that a single tasking program uses. And that *every programmer* learns to program.
Instead, it substitutes a--do a bit of this (and remember where we got to); and do a bit of that (and remember where we got to); and do a bit more of this (and remember...); and oh, do a bit of something else (and remember...); and do a bit more of that (and remember...); and ...--paradigm.
Semi-accurate but exaggerated.
It really depends on the program in question. It's generally accepted that well-written code is already decomposed into bite-sized pieces: small functions and methods that do one thing well, and larger ones that are composed of glue logic between calls to the smaller ones. In this case, you may find that a lot of your code will work as-is.
Sorry again, but you cannot even run a simple sort on a moderately large dataset within a cooperative environment, because it cannot be interrupted. How about running a moderately complex regex on a large string? You cannot interupt that either. The larger the data, the bigger the problem; and the larger the data, the greater the inefficiency of your proposed solution--transfering the large dataset to a separate process and then shipping the results back again.
POE provides ways to maintain state between cooperative callbacks, but the POE-agnostic parts of a program don't even need that.
You mean I can't just use lexical variables anymore? Isn't that a retrograde step?
A program that requires major restructuring to work in a cooperative environment may already have bigger problems.
That sounds a lot like 'if your double buggy doesn't fit through my turnstill; it's your double buggy that's a fault'!
Lego are proprietary hardware. Most users can't fabricate their own bricks.
Most (Perl) users can't (or don't want to and shouldn't have to; and for most programming problems, don't have to), fabricate their own (POE) bricks. Much less have to re-fabricate other peoples existing, working, tested, freely availble open-source software bricks. Again, *not* bogus.
And with threads, they don't have to.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
BrowserUk: Sorry, but searching or parsing a
large file, or scanning a directory tree are certainly *not*
insignificant.
Large files and directory trees are not the general cases I meant
by "generally insignificant".
BrowserUk: your suggestion that I might need to
dump {lots of functions} and instead learn to use the aio_* varients
from IO::AIO, along with all the entirely new and very different
coding methods it requires: {some code} no matter how good that module
may be, confirms my "forget everything else you know about writing
Perl" statement exactly.
You seem to think I advocate IO::AIO for the general case. I
don't. I clearly stated it was for high-performance programs that
can't wait for the filesystem.
As I said elsewhere, your application can block for as long as you
care to tolerate it. That can even be "indefinitely".
Threads introduce their own brand of complexity, and require their
own "new" and "different" coding methods.
BrowserUk: Can I use {specific
modules}?
I've used several of those modules successfully with POE. The ones
I've never used can probably be made asynchronous in a separate thread
or process. The exact method should depend on specific practical
considerations, not the liberal immolation of straw men.
BrowserUk: Shipping large volumes of data to
another process, and then reading the results back again is not
efficient {....}
Threads don't really help here. Perl is forced to serialize and
ship data between isolated interpreters, each in its own thread. This
is pretty much the same pattern as serialized IPC.
Even worse, Perl's data copying is triggered by tie()-like magic.
If your threads work on the same shared data, they're probably
propagating all their changes to N-1 other threads.
Multiprocess designs are more future-resistant than multithreaded
ones. When the application must scale beyond a single machine, the
multithreaded design is scrapped while the multiprocess design is
extended and redeployed.
BrowserUk: I'm am saying that it requires a
programmer to learn an entirely different way of working that is much
harder to learn, much harder to code and much harder to debug than the
standard linear flow--do this, then do that, then do something
else--that a single tasking program uses.
The same can be said about threads: Useful multithreading in Perl
requires programmers to learn an entirely different, much harder way
of working. Multithreaded Perl programs are a lot harder to write
than their singlethreaded counterparts, and timing and data scoping
issues make them way harder to debug.
BrowserUk: you cannot even run a simple sort on
a moderately large dataset within a cooperative environment, because
it cannot be interrupted. {...} {large-data argument removed because
threads aren't much better; see above}
That's the problem with vague and poorly specified problems. Any
"proposed solution" is going to be wrong once the details come to
light. And if the details are as malleable as yours, the only viable
solution will be the one you prefer.
For example, if we're talking about a highly-scalable data
processing application, I might suggest using a database server rather
than a disk file. This puts the data somewhere central, and a CPU
farm can access it in parallel. If the database becomes your
bottleneck, you can replicate it across multiple machines to divide
the load.
The choice between callbacks or threads isn't pertinent at this
level of design.
rcaputo: POE provides ways to maintain state
between cooperative callbacks, but the POE-agnostic parts of a program
don't even need that.
BrowserUk: You mean I can't just use lexical variables
anymore? Isn't that a retrograde step?
Check out Lexical-Persistence if you want to store
persistent POE state in lexical variables. The eg directory
includes a simple POE example. If you need a higher level of
abstraction, you're invited to write one.
Also keep an eye on POE-Stage. It's the project that spun
off Lexical::Persistence, and it makes heavy use of persistent lexical
variables.
rcaputo: A program that requires major
restructuring to work in a cooperative environment may already have
bigger problems.
BrowserUk: That sounds a lot like 'if your double buggy
doesn't fit through my turnstill; it's your double buggy that's a
fault'!
My turnstile is not the doubly buggy thing here. But seriously, to
reiterate and clarify my point: Well-designed code tends to be easier to make
cooperative than monolithic messes. And if you have a monolithic
mess, you have other issues.
BrowserUk: Most (Perl) users can't (or don't
want to and shouldn't have to; and for most programming problems,
don't have to), fabricate their own (POE) bricks. Much less have to
re-fabricate other peoples existing, working, tested, freely availble
open-source software bricks. Again, *not* bogus. And with threads,
they don't have to.
Are you claiming that most Perl programmers are incompetent or
lazy? Woo-woo! Well, laziness is a virtue, but this smells
like the dreaded false laziness.
Oh, and threads work with POE, so everything works with POE by the
transitive property of "threads make everything okay". :)
By the way, check out SEDA. Matt
Walsh has shown that an amalgam of threads and asynchronous I/O
performs better than either by itself. SEDA stands for "Staged Event
Driven Architecture". It's the inspiration for POE::Stage's name.
Thanks for reading.
| [reply] |
|
At the time of my posting, your post had one upvote. Mine. I always upvote rational argument.
It's just a shame that your rationality broke down when you wrote:
Are you claiming that most Perl programmers are incompetent or lazy? Woo-woo! Well, laziness is a virtue, but this smells like the dreaded false laziness.
My assertion has nothing to do with incompetence nor laziness.
For those projects that need the kind of massive parallelism and/or distribution that your description alludes to:
For example, if we're talking about a highly-scalable data processing application, I might suggest using a database server rather than a disk file. This puts the data somewhere central, and a CPU farm can access it in parallel. If the database becomes your bottleneck, you can replicate it across multiple machines to divide the load.
And for which Perl's inherent strengths make it the tool of choice, *I* will be the first to recommend POE--assuming you don't beat me to it. I have my doubts as to the number of projects for which Perl and massive parallelism/distribution is applicable, but there are certainly examples--eg. Genomic Research. For this type of project, POE is the only game in town and I am in awe of you and your fellow contributors for it's development.
However, where I have a problem is when POE is advocated for solving the types of parallelism and asynchronous processing needed by the vast majority of projects that do not fall into the above category. A programmer needs to be able to sort an array of data whilst maintaining a responsive C/GUI. The data is already in memory--it was probably generated in memory--and it will never need to exist beyond the life of the program. Spawning a thread to do this is trivial, and an intuative extension of existing programming practices.
The idea that I have to create a database table, ship all my data into it, just so that I can read it back again sorted is a nonsense.
And that's the big point here. Like threads & processes, threads & POE are not competitors. Each has it's set of problems for which is a natural fit. And for each, there is a set of problems for which it can be bent and twisted to solve, but for which it is entirely the wrong tool. The problems, and my ire, arise when the wrong tool for the particular problem is advocated on the basis of 'religious zeal' and the NIH syndrome.
I maintain that for the large majority of small scale, in-process asynchronous processing tasks (in Perl), threads are a natural fit and the simple solution.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|