Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

How is a perl program processed ?

by TheYoungMonk (Sexton)
on Apr 01, 2003 at 06:26 UTC ( #247154=perlquestion: print w/replies, xml ) Need Help??

TheYoungMonk has asked for the wisdom of the Perl Monks concerning the following question:

i was curious to know how complex regexes are handled so fast in this beautiful language. Well...this is what i could find

"To speed things up, during compilation stage, perl compiles the regexp into a compact sequence of opcodes that can often fit inside a processor cache. When the code is executed, these opcodes can then run at full throttle and search very quickly."

Any more in-depth information available on regex handling by the language?

Pardon me if i'm askin something irrelevant...But, isnt perl supposed to be doing interpretation and not compilation ?? Where does this compilation stage come from ? I know how a C program gets executed..precompiler, header files, C compiler and stuff like that...How does a perl program exactly get processed ?

Please share ur knowledge on this and enlighten this seeker!!

Replies are listed 'Best First'.
Re: How is a perl program processed ?
by robartes (Priest) on Apr 01, 2003 at 06:38 UTC
    Perl is somewhat of a hybrid between interpreted and compiled languages. There is indeed an interpreter that interprets the scripts, but it does not execute the interpreted code directly. Rather, it builds bytecode that in it's turn gets executed. The generation of the bytecode can be seen as a compilation step.

    If you're interested, you can have a look at the bytecode of a script using the B::Bytecode module. For example:

    $ perl -MO=Bytecode -e '$camel="flea-ridden";print $camel;' >bytecode
    The file bytecode contains the bytecode.

    In Perl6, this will be taken to the next step, and there will be a bytecode interpreter that's seperate from the Perl interpreter. This is parrot.

    Update: As diotalevi rightly (if rather succinctly ;) ) points out, there is no bytecode. In fact, what I mistakingly and confusingly call bytecode, is in fact a serialisation of the internal optree the Perl interpreter generates. So, substitute 'optree' for 'bytecode' in the above, and you get an equally confusing, yet more correct explanation.


        Note that a Perl regex is compiled into a tree of regnodes while the rest of the Perl code is compiled into a tree of opcodes.

        So there are two different kinds of non-"byte-code"s here. The original inspiration of this thread was talking about regnodes while you've linked to a discussion about Perl opcodes.

        The regnodes and opcodes are quite similar, so the impressions people have gotten from this minor misdirection are likely rather accurate. (:

        I've heard that Perl is relatively slow at dispatching opcodes and relatively fast at dispatching regnodes such that formulating problems as a regex can be quite a bit faster than formulating them as Perl code. Not that this is usually an easy proposition.

                        - tye

        That was really enlightening !!

        Thanks for all the info on perl being interpreted or compiled, but this last link proved to be the "missing link" in our discussions...i was actually searching for places to see how we can so freely use the term "bytecode" as in languages like java...this link provided the answer

        Got a good insight...and hope the discussion will be useful for other seekers!

Re: How is a perl program processed ?
by Zaxo (Archbishop) on Apr 01, 2003 at 06:40 UTC

    Perl is compiled, but it doesn't give that impression because it does not save an executable image. Instead, compilation is done on startup (like an interpreter), then the compiled program is automatically run.

    The compiler produces a parse tree for the perl statements it encounters, then the runtime walks through the parse tree executing the primitives it finds there. Details may be found in the Camel Book, or in the perl documentation.

    Update: perlguts and perlcompile have the most clues to how perl works. Note that perl can drop into compiling from running and vice versa with eval, BEGIN blocks, and so on.

    After Compline,

      This might be a dumb question, but where is the answer to this question in the man pages or other documentation? I did a quick look and couldn't find anything.

      Update though I did find this page from YAPC 2000 -- "A Perl Novice ... Wonders why no one can give him a straight answer about whether Perl is compiled or interpreted." --

      “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
      M-J D
      Thanx !

      Any place on the web where we can find some documents related to the origin of perl and the details which u had given ???

      The documentation doesn't give these details !!<p

      I will like to add another question then, does perl do JIT compilation then? or does it compile the entire program at startup and loads the entire code in memory?


        The optree is created once and only once unless things like B::Generate are active in which case manual alterations are possible (and of course eval and require also compile new code). Perl doesn't do JIT - people smarter than I can speak about Parrot's JIT.

Re: How is a perl program processed ?
by gmpassos (Priest) on Apr 01, 2003 at 07:02 UTC
    Well, what's compilation for you? Any good interpreted language make a parse of the code, create a bytecode, optimize it, and than run the code. You can see that for Perl, Phyton, Java (yes a .class is just a bytecode), etc...

    When the bytecode is created the code is optimized, numbers will be calculated (like 1024*8 to 8192), quoted strings will be parsed, etc... And this is very important to can run nice and easy the code.

    About the regex. Well, why parse and create for each regex? In the bytecode stage all the regex are parsed and created, and similar regex will use the same compiled regex. Note that regex is not a simple thing to do, and anything that can get speed is welcome!

    I think that the best way to differ real compiled programs to interpreted, is the creation of binarys codes or not. A real binary code will have machine codes, that will be executed by the OS to work with the CPU instructions.

    The definition of compilation can't tell you if is a interpreted program:
    Compile: To put together, to construct, to build... To put together in a new form...

    Just to remember some assembler. This code in C, when is compiled, is converted to Assembler fisrt:

    ## C: a = b + c + d ; ## Assembler: ADD a , b , c ADD a , a , d
    The ADD is a CPU instruction (based on MIPS). But in the end everything is interpreted, if you think that CPU instructions are a language. ;-P

    About Perl supposed to be doing interpretation. Well, Perl don't follow any theorie/philosophy/concept just by the theorie/philosophy/concept. This is why the Perl interpreter is the faster and with the best memory management that exist.

    The famous phrase:
    "Perl is designed to make the easy jobs easy and the hard jobs possible."
    -Larry Wall

    Graciliano M. P.
    "The creativity is the expression of the liberty".

      A teacher of mine said that an interpreter has a signature of:

      (P × D) ⇀ D

      Instead a compiler has a signature of:

      P ⇀ D ⇀ D

      Where P is the domain of programs, and D is the domain of data.

      So perl has a signature of the first kind: you give it the source and the data at the same time. cc, on the other hand, has asignature of the second kind: you give it only the source, and it produces another program, which is a function from data to data.

      So much for theory

      In practice, traditionally an interpreter has worked in a "miopic" way, looking at a single line (or even less) of code at a time; a compiler instead has always looked "at the big picture", swallowing the entire source at once, looking for problems and inconsistencies.

      So, in this way, Perl is a compiler.

      And on a sidenote, C is compiled to machine language, not assembly. Assembly is compiled to machine language by an assembler...

              dakkar - Mobilis in mobile
        There's no sharp border between a compiler and and an interpreter. As you said, we often think of a compiler that takes the entire source code, and produces something else (typically, before running it), while an interpreter just looks at a small chunk of the source at a time, and executes that before carrying on.

        But compilers typically produce output that's being "interpreted". Machine code is basically "interpreted". And a language that has the ability to use "eval", can defer most of its "compilation" to runtime, and have it compiled one chunk at a time.

        I usually call a program that takes data in one format and produces equivalent data in another format a compiler. That means that 'gcc' is a compiler. But also 'dvips'. And perhaps parts of 'perl' as well, although the other format is only used internally.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://247154]
Approved by robartes
Front-paged by dash2
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2022-06-29 19:05 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (97 votes). Check out past polls.