Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: In search of an efficient query abstractor

by samtregar (Abbot)
on Dec 07, 2008 at 18:53 UTC ( [id://728760]=note: print w/replies, xml ) Need Help??


in reply to In search of an efficient query abstractor

I think your idea of doing this with a state machine in C is a decent one. Have you tried it? How did it work? Are you planning to write it by hand or generate it from a grammar using something like yacc?

Another thought - are you sure this has to be really fast? I've done a lot of log analysis work and it's very rare to need a lot of speed. Often it's fine if the job runs overnight as long as it produces the correct answers.

Alternately, why not throw some hardware at the problem? Can you divide the log file up and run the analysis in parallel on many machines? You'll need to add a way to combine the results from each machine, but for log analysis that's often a simple step.

-sam

Replies are listed 'Best First'.
Re^2: In search of an efficient query abstractor
by xaprb (Scribe) on Dec 07, 2008 at 20:51 UTC

    Thanks. There are two things scaring me away from doing it in C:

    • I try to keep these tools self-contained and platform neutral so you can wget them and just run them with no further ado. I have never built a Perl program that makes calls to a compiled C module, but I imagine it'd be hard to make it wget-able and build it into a single file. Although maybe it could be a C library embedded into __DATA__ that running it could write out to a temporary location or something similarly evil ;)
    • I'm not exactly an expert in C programming. I mean I think I can translate my hand-drawn state machine, which isn't too complicated, into a C state machine pretty simply. But I am pretty sure something like yacc would be better and I'm scared of that.

    So, basically I'm scared. I want to stay in my comfortable safe Perl zone if I can.

    It definitely has to be fast and I can't divide it up, alas. I work with Percona; we are consultants for hire. We log onto people's critical production database servers and figure out why they are slow. One of the top-five tools we use is a log analysis tool. We need something that can crunch the file without requiring installation or whatever. We can't even copy the file elsewhere in most cases, we pretty much just have to operate in read-only mode.

    This code was originally written to be correct, not fast -- back in the good old days when I had the luxuries you mention (I was working on my own DB servers.) It handles a lot of special cases that the log-parsers-ad-nauseum out there don't. (Which is why I'm reinventing the wheel. No one has done this well yet.) Now I have to make it both correct, and fast.

      but I imagine it'd be hard to make it wget-able and build it into a single file.

      While I think solving this in pure-perl is probably do-able, if you do want to drop to C, you could checkout Inline::C.

      This allows you do to exactly what you want, embed C in your perl file and have it All Just Work.

      (It does require a working C build environment where you run the script, and it will also write the .o/.so files generated by the compiler to a cache so that it only compiles as needed. This might create deployment issues.)

      There's no reason to be afraid of C. It's one of the easiest languages to learn (you already learned one of the hardest, Perl!) and there's tons of great code out there to study. There's also no need to worry overmuch about portability - C is probably the most widely supported programming language ever created. If a platform runs Perl I guarantee it can run a C compiler! And you don't even have to install a compiler to get your C code running, you can cross-compile for practically any target you can imagine.

      And just to echo what you've already been told - Inline::C is perfect for this project. It will make gluing your C code into your Perl program very easy. If you already knew XS I'd say not to bother since your call signature should be very simple here, but why bother if you don't have to?

      Finally, when you're done, why not put it on CPAN? I'm sure other people could use a good SQL statement sig generator. I'd be interested in adding it as an optional filter for DBI::Profile and DBI::ProfileDumper, for one.

      -sam

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://728760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found