Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
talexb and I briefly discussed in the chatterbox that this might be possible by using Vim syntax files. If one were to write a Perl parser for the syntax files, you could obtain a lot of information about the php scripts you are interested in.

The syntax files contain lines that look like the following:
" Env Variables syn keyword phpEnvVar GATEWAY_INTERFACE SERVER_NAME SERVER_S +OFTWARE SERVER_PROTOCOL REQUEST_METHOD QUERY_STRING DOCUMENT_ROOT HTT +P_ACCEPT HTTP_ACCEPT_CHARSET HTTP_ENCODING HTTP_ACCEPT_LANGUAGE HTTP_ +CONNECTION HTTP_HOST HTTP_REFERER HTTP_USER_AGENT REMOTE_ADDR REMOTE_ +PORT SCRIPT_FILENAME SERVER_ADMIN SERVER_PORT SERVER_SIGNATURE PATH_T +RANSLATED SCRIPT_NAME REQUEST_URI contained " Internal Variables syn keyword phpIntVar GLOBALS HTTP_GET_VARS HTTP_POST_VARS HTTP_CO +OKIE_VARS HTTP_POST_FILES HTTP_ENV_VARS HTTP_SERVER_VARS PHP_ERRMSG PHP_SELF HTT +P_RAW_POST_DATA HTTP_STATE_VARS _GET _POST _COOKIE _SERVER _ENV con +tained " Function names syn keyword phpFunctions apache_lookup_uri apache_note ascii2eb +cdic ebcdic2ascii getallheaders virtual apache_child_terminate apache +_setenv contained syn keyword phpFunctions array array_change_key_case array_chun +k array_count_values array_diff array_filter array_flip array_fill ar +ray_intersect array_key_exists array_keys array_map array_merge array +_merge_recursive array_multisort array_pad array_pop array_push array +_rand array_reverse array_reduce array_shift array_slice array_splice + array_sum array_unique array_unshift array_values array_walk arsort +asort compact count current each end extract in_array array_search ke +y krsort ksort list natsort natcasesort next pos prev range reset rso +rt shuffle sizeof sort uasort uksort usort contained " Comment syn region phpComment start="/\*" end="\*/" contained cont +ains=phpTodo extend syn match phpComment "#.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo syn match phpComment "//.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo
The above are just a few snippets from the php.vim file I received with vim 6.1. With this, it doesn't seem like it would be too difficult to write a script to parse this syntax file, then pull out interesting information from the actual php scripts.

Using the above snippets, perhaps we could create 4 arrays: phpEnvVar, phpIntVar, phpFunctions, and phpComment. Then, just use split or the like to put each variable, function name, and comment from the syntax file onto their respective array. Once you have all the information you care about parsed out of the syntax file, you could use any number of means to extract useful info out of the PHP scripts. As talexb mentioned, Parse::RecDescent seems like a good candidate for this. However, the adventurous may even be able to get it to work with a combination of Tie::File and Quantum::Superpositions.

Good luck!
-Eric

--
Lucy: "What happens if you practice the piano for 20 years and then end up not being rich and famous?"
Schroeder: "The joy is in the playing."

In reply to Re: Parsing a PHP web application by andreychek
in thread Parsing a PHP web application by talexb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-23 18:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found