http://qs321.pair.com?node_id=191613


in reply to Parsing a PHP web application

talexb and I briefly discussed in the chatterbox that this might be possible by using Vim syntax files. If one were to write a Perl parser for the syntax files, you could obtain a lot of information about the php scripts you are interested in.

The syntax files contain lines that look like the following:
" Env Variables syn keyword phpEnvVar GATEWAY_INTERFACE SERVER_NAME SERVER_S +OFTWARE SERVER_PROTOCOL REQUEST_METHOD QUERY_STRING DOCUMENT_ROOT HTT +P_ACCEPT HTTP_ACCEPT_CHARSET HTTP_ENCODING HTTP_ACCEPT_LANGUAGE HTTP_ +CONNECTION HTTP_HOST HTTP_REFERER HTTP_USER_AGENT REMOTE_ADDR REMOTE_ +PORT SCRIPT_FILENAME SERVER_ADMIN SERVER_PORT SERVER_SIGNATURE PATH_T +RANSLATED SCRIPT_NAME REQUEST_URI contained " Internal Variables syn keyword phpIntVar GLOBALS HTTP_GET_VARS HTTP_POST_VARS HTTP_CO +OKIE_VARS HTTP_POST_FILES HTTP_ENV_VARS HTTP_SERVER_VARS PHP_ERRMSG PHP_SELF HTT +P_RAW_POST_DATA HTTP_STATE_VARS _GET _POST _COOKIE _SERVER _ENV con +tained " Function names syn keyword phpFunctions apache_lookup_uri apache_note ascii2eb +cdic ebcdic2ascii getallheaders virtual apache_child_terminate apache +_setenv contained syn keyword phpFunctions array array_change_key_case array_chun +k array_count_values array_diff array_filter array_flip array_fill ar +ray_intersect array_key_exists array_keys array_map array_merge array +_merge_recursive array_multisort array_pad array_pop array_push array +_rand array_reverse array_reduce array_shift array_slice array_splice + array_sum array_unique array_unshift array_values array_walk arsort +asort compact count current each end extract in_array array_search ke +y krsort ksort list natsort natcasesort next pos prev range reset rso +rt shuffle sizeof sort uasort uksort usort contained " Comment syn region phpComment start="/\*" end="\*/" contained cont +ains=phpTodo extend syn match phpComment "#.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo syn match phpComment "//.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo
The above are just a few snippets from the php.vim file I received with vim 6.1. With this, it doesn't seem like it would be too difficult to write a script to parse this syntax file, then pull out interesting information from the actual php scripts.

Using the above snippets, perhaps we could create 4 arrays: phpEnvVar, phpIntVar, phpFunctions, and phpComment. Then, just use split or the like to put each variable, function name, and comment from the syntax file onto their respective array. Once you have all the information you care about parsed out of the syntax file, you could use any number of means to extract useful info out of the PHP scripts. As talexb mentioned, Parse::RecDescent seems like a good candidate for this. However, the adventurous may even be able to get it to work with a combination of Tie::File and Quantum::Superpositions.

Good luck!
-Eric

--
Lucy: "What happens if you practice the piano for 20 years and then end up not being rich and famous?"
Schroeder: "The joy is in the playing."