TheYoungMonk has asked for the wisdom of the Perl Monks concerning the following question:
i was curious to know how complex regexes are handled so fast in this beautiful language. Well...this is what i could find
"To speed things up, during compilation stage, perl compiles the regexp into a compact sequence of opcodes that can often fit inside a processor cache. When the code is executed, these opcodes can then run at full throttle and search very quickly."
Any more in-depth information available on regex handling by the language?
Pardon me if i'm askin something irrelevant...But, isnt perl supposed to be doing interpretation and not compilation ?? Where does this compilation stage come from ? I know how a C program gets executed..precompiler, header files, C compiler and stuff like that...How does a perl program exactly get processed ?
Please share ur knowledge on this and enlighten this seeker!!
Re: How is a perl program processed ?
by robartes (Priest) on Apr 01, 2003 at 06:38 UTC
|
Perl is somewhat of a hybrid between interpreted and compiled languages. There is indeed an interpreter that interprets the scripts, but it does not execute the interpreted code directly. Rather, it builds bytecode that in it's turn gets executed. The generation of the bytecode can be seen as a compilation step.
If you're interested, you can have a look at the bytecode of a script using the B::Bytecode module. For example:
$ perl -MO=Bytecode -e '$camel="flea-ridden";print $camel;' >bytecode
The file bytecode contains the bytecode.
In Perl6, this will be taken to the next step, and there will be a bytecode interpreter that's seperate from the Perl interpreter. This is parrot.
Update: As diotalevi rightly (if rather succinctly ;) ) points out, there is no bytecode. In fact, what I mistakingly and confusingly call bytecode, is in fact a serialisation of the internal optree the Perl interpreter generates. So, substitute 'optree' for 'bytecode' in the above, and you get an equally confusing, yet more correct explanation.
CU Robartes- | [reply] [d/l] |
|
| [reply] |
|
Note that a Perl regex is compiled into a tree of regnodes while the rest of the Perl code is compiled into a tree of opcodes.
So there are two different kinds of non-"byte-code"s here. The original inspiration of this thread was talking about regnodes while you've linked to a discussion about Perl opcodes.
The regnodes and opcodes are quite similar, so the impressions people have gotten from this minor misdirection are likely rather accurate. (:
I've heard that Perl is relatively slow at dispatching opcodes and relatively fast at dispatching regnodes such that formulating problems as a regex can be quite a bit faster than formulating them as Perl code. Not that this is usually an easy proposition.
- tye
| [reply] |
|
|
That was really enlightening !!
Thanks for all the info on perl being interpreted or compiled, but this last link proved to be the "missing link" in our discussions...i was actually searching for places to see how we can so freely use the term "bytecode" as in languages like java...this link provided the answer
Got a good insight...and hope the discussion will be useful for other seekers!
| [reply] |
Re: How is a perl program processed ?
by Zaxo (Archbishop) on Apr 01, 2003 at 06:40 UTC
|
Perl is compiled, but it doesn't give that impression because it does not save an executable image. Instead, compilation is done on startup (like an interpreter), then the compiled program is automatically run.
The compiler produces a parse tree for the perl statements it encounters, then the runtime walks through the parse tree executing the primitives it finds there. Details may be found in the Camel Book, or in the perl documentation.
Update: perlguts and perlcompile have the most clues to how perl works. Note that perl can drop into compiling from running and vice versa with eval, BEGIN blocks, and so on.
After Compline, Zaxo
| [reply] |
|
| [reply] |
|
| [reply] |
|
| [reply] |
|
| [reply] |
Re: How is a perl program processed ?
by gmpassos (Priest) on Apr 01, 2003 at 07:02 UTC
|
Well, what's compilation for you? Any good interpreted language make a parse of the code, create a bytecode, optimize it, and than run the code. You can see that for Perl, Phyton, Java (yes a .class is just a bytecode), etc...
When the bytecode is created the code is optimized, numbers will be calculated (like 1024*8 to 8192), quoted strings will be parsed, etc... And this is very important to can run nice and easy the code.
About the regex. Well, why parse and create for each regex? In the bytecode stage all the regex are parsed and created, and similar regex will use the same compiled regex. Note that regex is not a simple thing to do, and anything that can get speed is welcome!
I think that the best way to differ real compiled programs to interpreted, is the creation of binarys codes or not. A real binary code will have machine codes, that will be executed by the OS to work with the CPU instructions.
The definition of compilation can't tell you if is a interpreted program:
Compile: To put together, to construct, to build... To put together in a new form...
Just to remember some assembler. This code in C, when is compiled, is converted to Assembler fisrt:
## C:
a = b + c + d ;
## Assembler:
ADD a , b , c
ADD a , a , d
The ADD is a CPU instruction (based on MIPS). But in the end everything is interpreted, if you think that CPU instructions are a language. ;-P
About Perl supposed to be doing interpretation. Well, Perl don't follow any theorie/philosophy/concept just by the theorie/philosophy/concept. This is why the Perl interpreter is the faster and with the best memory management that exist.
The famous phrase:
"Perl is designed to make the easy jobs easy and the hard jobs possible."
-Larry Wall
Graciliano M. P.
"The creativity is the expression of the liberty". | [reply] [d/l] |
|
A teacher of mine said that an interpreter has a signature of:
(P × D) ⇀ D
Instead a compiler has a signature of:
P ⇀ D ⇀ D
Where P is the domain of programs, and D is the domain of data.
So perl has a signature of the first kind: you give it the source and the data at the same time. cc, on the other hand, has asignature of the second kind: you give it only the source, and it produces another program, which is a function from data to data.
So much for theory
In practice, traditionally an interpreter has worked in a "miopic" way, looking at a single line (or even less) of code at a time; a compiler instead has always looked "at the big picture", swallowing the entire source at once, looking for problems and inconsistencies.
So, in this way, Perl is a compiler.
And on a sidenote, C is compiled to machine language, not assembly. Assembly is compiled to machine language by an assembler...
--
dakkar - Mobilis in mobile
| [reply] |
|
There's no sharp border between a compiler and and an interpreter.
As you said, we often think of a compiler that takes the entire
source code, and produces something else (typically, before
running it), while an interpreter just looks at a small chunk
of the source at a time, and executes that before carrying on.
But compilers typically produce output that's being "interpreted".
Machine code is basically "interpreted". And a language that
has the ability to use "eval", can defer most of its "compilation"
to runtime, and have it compiled one chunk at a time.
I usually call a program that takes data in one format and
produces equivalent data in another format a compiler. That
means that 'gcc' is a compiler. But also 'dvips'. And perhaps
parts of 'perl' as well, although the other format is only
used internally.
Abigail
| [reply] |
|
|