Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Parsing Emacs Lisp sexpr?

by perlancar (Friar)
on Apr 09, 2020 at 03:00 UTC ( #11115256=note: print w/replies, xml ) Need Help??


in reply to Re: Parsing Emacs Lisp sexpr?
in thread Parsing Emacs Lisp sexpr?

Nice work! I wonder why you opt to parse this format specifically instead of the generic lisp format though.

As for the speed, it's actually rather on-par with Data::SExpression, which uses Parse::Yapp. I commented out the dumping and then:

% time perl 11115197.pl archive-contents

real    0m7.449s
user    0m7.036s
sys     0m0.413s

% time perl -MFile::Slurper=read_text -MData::SExpression -E'$ds=Data::SExpression->new; ($sexp, $text) = $ds->read(read_text "archive-contents.2");'
real    0m5.411s
user    0m5.386s
sys     0m0.025s

archive-contents.2 is just the original file with replaced with ( ), and then the problematic @ atom replaced by "@".

Perl regex or Regexp::Grammars will probably be several times faster.

Replies are listed 'Best First'.
Re^3: Parsing Emacs Lisp sexpr?
by choroba (Archbishop) on Apr 10, 2020 at 22:14 UTC
    > I wonder why you opt to parse this format specifically instead of the generic lisp format though.

    As I said, I started from a wrong end. I'm kind of busy working from home and staying there with a wife and three children, so I didn't have time to fix it immediately. Here's a much simpler and faster version, which parses melpa's archive-contents in less than 5 seconds on my machine:

    #! /usr/bin/perl use warnings; use strict; use Marpa::R2; my $dsl = << '__DSL__'; :default ::= action => ::first lexeme default = latm => 1 List ::= ('(') Elements (')') Elements ::= Element+ action => [values] Element ::= List | Vector | Atom | String | Pair Vector ::= ('[') Elements (']') Atom ::= identifier String ::= ('"') Quoteds ('"') Quoteds ::= Quoteds Quoted action => concat | Quoted Quoted ::= backslash || qq || plain Pair ::= Element (dot) Element action => pair :discard ~ whitespace whitespace ~ [\s]+ dot ~ '.' backslash ~ '\\' qq ~ '\"' identifier ~ [-\w@:+]+ plain ~ [^\\"]+ __DSL__ sub concat { $_[1] . $_[2] } sub pair { +{ $_[1] => $_[2] } } my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl}); my $lisp = do { local $/; <> }; my $value_ref = $grammar->parse(\$lisp, {semantics_package => 'main'}) +; use Data::Dumper; print Dumper $value_ref;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks for this, choroba. It finishes in about 2 seconds on my computer, pretty impressive. I'll see what I can use to improve my SExpression::Decode::Marpa.
Re^3: Parsing Emacs Lisp sexpr?
by perlancar (Friar) on Apr 10, 2020 at 02:31 UTC

    And here's my stab at creating a Marpa-based parser, based on JSON::Decode::Marpa: https://github.com/perlancar/perl-SExpression-Decode-Marpa/. It's unfinished (its number and string rules, particularly, are still not adjusted), but can already parse the original archive-contents file, a bit faster than Data::SExpression:

    % time perl -Ilib -MSExpression::Decode::Marpa=from_sexp -MFile::Slurper=read_text -E'from_sexp(read_text "archive-contents")'
    
    real    0m4.023s
    user    0m3.818s
    sys     0m0.204s
    
Re^3: Parsing Emacs Lisp sexpr?
by perlancar (Friar) on Apr 09, 2020 at 10:44 UTC
    Anyhow, I tried hacking a regex-based parser here. It's "working" with some problem: 1) segmentation fault for larger data, indicating a leak somewhere. 2) parsing failure when e.g. the NUMBER rule fails to match and it matches ATOM instead, e.g. in this sexp: (1a) which fails, but (1) and (a) succeed.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11115256]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2020-05-27 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (152 votes). Check out past polls.

    Notices?