http://qs321.pair.com?node_id=522460

bronto has asked for the wisdom of the Perl Monks concerning the following question:

Hello everybody

A colleague of mine asked me if I knew how to automatically indent an LDAP filter to make it more readable. E.g.: he wantend something like this:

(&(&(&(& (mailnickname=*) (| (&(objectCategory=person)(objectClass=use +r)(!(homeMDB=*))(!(msExchHomeServerName=*)))(&(objectCategory=person) +(objectClass=user)(|(homeMDB=*)(msExchHomeServerName=*)))(&(objectCat +egory=person)(objectClass=contact))(objectCategory=group)(objectCateg +ory=publicFolder)(objectCategory=msExchDynamicDistributionList) )))(o +bjectCategory=contact)(proxyAddresses=smtp:*example.com)))

to become something more similar to this

(& (& (& (& (mailnickname=*) (| (& (objectCategory=person) (objectClass=user) (! (homeMDB=*) ) (! (msExchHomeServerName=*) ) ) (& (objectCategory=person) (objectClass=user) (| (homeMDB=*) (msExchHomeServerName=*) ) ) (& (objectCategory=person) (objectClass=contact) ) (objectCategory=group) (objectCategory=publicFolder) (objectCategory=msExchDynamicDistributionList) ) ) ) (objectCategory=contact) (proxyAddresses=smtp:*example.com) ) )

I didn't want to spend a long time over this, so after a few attempts with the trial-and-error technique and with the help of Devel::ptkdb I came out with this quick and dirty script based on Text::Balanced

#!/usr/bin/perl use strict ; use warnings ; use Text::Balanced qw(extract_multiple) ; die "Uso: $0 filtro\n" unless @ARGV ; my ($begop,$begin,$end) = (qr/\([&|!]\s*/, qr/\(\s*/, qr/\)\s*/) ; my $filter = shift @ARGV ; my @blocks = extract_multiple($filter,[$begop]) ; my $step = 0 ; foreach my $block (@blocks) { if ($block =~ $begop) { # Inizia un operatore print_chunk($step++,$block,1) ; } else { # E` un blocco di match, probabilmente sbilanciato my @matches = extract_multiple($block,[$begin,$end]) ; # Questi sono match while (@matches >= 3) { my @chunks = splice(@matches,0,3) ; # Fai check sui "chunk" e agisci di conseguenza: if ($chunks[1] =~ /=/) { # E` un match: print_chunk($step,join("",@chunks),1) ; } else { # Sfiga while (my $chunk = shift @chunks) { if ($chunk =~ $end) { print_chunk(--$step,$chunk,1) ; } else { # Ricarica gli elementi in @matches e riparti unshift @matches,$chunk,@chunks ; last ; } } } } # Queste sono parentesi che si chiudono drop_parenses(@matches) ; } } sub print_chunk { my ($step,$string,$newline) = @_ ; print " "x$step ; print $string ; print "\n" if $newline ; } sub drop_parenses { while (my $parens = shift @_) { print_chunk(--$step,$parens,1) ; } }

I am pretty sure that there are far better ways to do that, and I am interested on how you'd do it. Anyone?

Ciao!
--bronto


In theory, there is no difference between theory and practice. In practice, there is.

Replies are listed 'Best First'.
Re: Making an LDAP filter more readable
by BrowserUk (Patriarch) on Jan 11, 2006 at 15:39 UTC

    This seems to get closed to the required output:

    Updated: Corrected fencepost-ish error.

    Update2: Added a second pass to condense it a bit.

    #! perl -slw use strict; ( my $input = do{ local $/; <DATA> } ) =~ tr[\n][]d; my $tab = 0; $input =~ s[([()])]{ $tab-- if $1 eq ')'; my $modified = "\n" . ( " " x $tab ) . $1; $tab++ if $1 eq '('; $modified; }ge; $input =~ s[\n\s+\)][)]g; print $input; __DATA__ (&(&(&(& (mailnickname=*) (| (&(objectCategory=person)(objectClass=use +r)(!(homeM DB=*))(!(msExchHomeServerName=*)))(&(objectCategory=person)(objectClas +s=user)(| (homeMDB=*)(msExchHomeServerName=*)))(&(objectCategory=person)(objectC +lass=cont act))(objectCategory=group)(objectCategory=publicFolder)(objectCategor +y=msExchD ynamicDistributionList) )))(objectCategory=contact)(proxyAddresses=smt +p:*exampl e.com)))

    Yields:

    P:\test>junk1 (& (& (& (& (mailnickname=*) (| (& (objectCategory=person) (objectClass=user) (! (homeMDB=*)) (! (msExchHomeServerName=*))) (& (objectCategory=person) (objectClass=user) (| (homeMDB=*) (msExchHomeServerName=*))) (& (objectCategory=person) (objectClass=contact)) (objectCategory=group) (objectCategory=publicFolder) (objectCategory=msExchDynamicDistributionList) ))) (objectCategory=contact) (proxyAddresses=smtp:*example.com)) )

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Very interesting! I have just some difficulties in reading that s/// operator. Could you please explain?

      Thanks for posting!

      Ciao!
      --bronto


      In theory, there is no difference between theory and practice. In practice, there is.

        Sure.

        my $tab = 0; ## Number of 'tabs' to insert starts at zero $input =~ s[ ( ## Capture to $1 [()] ## All open or close parens ) ]{ ## /e-xecute turns the second half into a code block. ## Decrement teh tab count if this is a close paren $tab-- if $1 eq ')'; ## Insert a newline + $tab tabs before the paren (open or close) my $modified = "\n" . ( " " x $tab ) . $1; ## Increment the tab count if this is an open paren $tab++ if $1 eq '('; ## And 'return' the modified text for substitution $modified; }xge; ## Late addition: ## Second pass strips out any 'lone' close parens ## to compact the results a little. $input =~ s[\n\s+\)][)]g;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Making an LDAP filter more readable
by dakkar (Hermit) on Jan 11, 2006 at 15:43 UTC

    What about using a FSA based parser?

    #!/usr/bin/perl use strict; use warnings; use Text::Diff; { my $INDENT_STEP=4; sub indenter { my ($expr)=@_; my $indent=0; my $result=''; pos($expr)=undef; while(1) { if ($expr =~ m{\G \s* ( \( [&|!] )}smxcg) { # combinatore: print, newline, inc indent $result.=(' 'x$indent)."$1\n"; $indent+=$INDENT_STEP; } elsif ($expr =~ m{\G \s* ( \( [^)=]+ = [^)]+ \) )}smxcg) { # test: print, newline $result.=(' 'x$indent)."$1\n"; } elsif ($expr =~ m{\G \s* ( \) )}smxcg) { # fine combinatore: dec intert, print, newline $indent-=$INDENT_STEP; $result.=(' 'x$indent)."$1\n"; } else { last; } } return $result; } } my $expr=q{(&(&(&(& (mailnickname=*) (| (&(objectCategory=person)(obje +ctClass=user)(!(homeMDB=*))(!(msExchHomeServerName=*)))(&(objectCateg +ory=person)(objectClass=user)(|(homeMDB=*)(msExchHomeServerName=*)))( +&(objectCategory=person)(objectClass=contact))(objectCategory=group)( +objectCategory=publicFolder)(objectCategory=msExchDynamicDistribution +List) )))(objectCategory=contact)(proxyAddresses=smtp:*example.com))) +}; my $expected=<<'END_EXPECTED'; (& (& (& (& (mailnickname=*) (| (& (objectCategory=person) (objectClass=user) (! (homeMDB=*) ) (! (msExchHomeServerName=*) ) ) (& (objectCategory=person) (objectClass=user) (| (homeMDB=*) (msExchHomeServerName=*) ) ) (& (objectCategory=person) (objectClass=contact) ) (objectCategory=group) (objectCategory=publicFolder) (objectCategory=msExchDynamicDistributionList) ) ) ) (objectCategory=contact) (proxyAddresses=smtp:*example.com) ) ) END_EXPECTED my $ret=indenter($expr); print "ok\n" if $ret eq $expected; print diff \$ret,\$expected, { STYLE => 'Unified', };
    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)