Re: MUMPS Array Subscripts Parsing Via RegEx

My current employer makes extensive use of a software I expected never to encounter in my life, MUMPS! It turns out that MUMPS (much older than Perl) is a database and language, much like SQL or Oracle, but it's heirarchical and not relational. I'm told. Whatever that means.

I feel your pain, because MUMPS has become part of my current job, much more than I ever wanted it to be part of my job.

MUMPS is a database in that all "globals" are stored on disk rather than just in memory. The globals are stored as trees (heirarchical), not as tables (relational). A global, like any MUMPS variable, can store a single value (like a perl scalar), or it can store key-value pairs (much like a perl hash, but with implicitly sorted keys), or it can store both at the same time. The values of the key-value pairs can again be single values or key-value pairs, deeply nested. But there is nothing like SQL to query these trees, you have to write MUMPS code. (Caché (see below) does offer an SQL interface to the trees, but it looks very strange.)

Oh, by the way: Did you know that MUMPS started as an operating system running on bare metal of ancient computers? All current implementations still provide the grey-haired coder with this illusion.

I need to report on certain transactions found in Journals, in particular I need to extract the Global Variables from the record types in which they are created, modified, or cleared.

There are several very different implementations of MUMPS, despite being standardised by ANSI or some other authority. I know only the Micronetics implementation (MSM) from personal experience, the Caché implementation from a big distance, and the Perl implementation from a short "just forget it" experience.

Parsing MUMPS code is easy for the common case, but there are some edges that make your live really hard. The indirection operator (@) is my favorite here, directly followed by the string-eval command XECUTE. As soon as you find one of them, you are essentially lost with a simple parser. You need to know the current values of the variables referenced in the code to continue. So, you can't simply parse MUMPS, you have to interpret it. It's the "only perl can parse Perl" of the punch card age. With the minor difference that each and every MUMPS implementation has its own set of incompatible extensions.

From the Micronetics implementation, I know that there are several tools for handling MUMPS code. The INDEX program is able to generate a cross-reference for a single MUMPS program or a bunch of MUMPS programs, with variables, syntax warnings, and so on. See ftp://ftp.intersys.com/pub/msm/docs/msm44/utility.pdf for details. Of course, it's just a MUMPS parser, not a MUMPS interpreter, and it seems to be ported from a really ancient version. It can be confused by "modern" code that uses device mnemonics, but it's the best available tool for the job (simply because it's the only one).

Generally, don't try to work with MUMPS code outside a MUMPS system. You will fail at writing a MUMPS interpreter. Try to solve your problem inside MUMPS, or export your data from MUMPS to text files and handle those exports in a modern language. It's quite easy to write even XML or JSON from MUMPs to files, but parsing those formats correctly from MUMPS is nearly impossible.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Comment on Re: MUMPS Array Subscripts Parsing Via RegEx Select or Download Code

Replies are listed 'Best First'.
Re^2: MUMPS Array Subscripts Parsing Via RegEx by Clovis_Sangrail (Beadle) on May 16, 2012 at 15:10 UTC
As yet I'm, not called on to parse any Mumps code. All I get from Mumps are binary journals, data much like transaction journals from other DataBase Systems. The GT.M implementation of Mumps comes with a 'mupip' program that (among other things) gives me a character-based dump of the Journals, basically a human-readable text file of newline-delimited records with '\' as the field separator. I wonder if other current implementations of Mumps include mupip (or journals at all, for that matter). I can use 'split' to make a list of each record, and I'm fortunate that the Global Variable is in the final field, because I've even found embedded '\' characters in some Global Variable subscripts, but I can ignore them via the 3rd (limit) parameter to split.	[reply]
Re^3: MUMPS Array Subscripts Parsing Via RegEx by afoken (Chancellor) on May 17, 2012 at 10:54 UTC
Partly related: MSM has a `%GS` (global save) command that writes globals to text files, in a pretty simple format non-surprisingly called MSM format. The first line contains date and time, and some constants, the second line is the comment entered while running `%GS`, the following lines contain alternating the global name inclusing all subscripts, and the value. To announce the end of a global, both lines are "", to announce end of file, both lines are "*". Simple, readable, parseable with nearly no efford. Unless one of the globals happen to contain control characters like CR or LF. Even MSM can't read back those files it wrote just seconds ago. It's a shame. The companion program `%GR` (global restore) reads the globals back into the system. And I remember from browsing the sources that there is a second file format named "ANSI format", but unfortunately, I don't remember the details, and I don't have to access to the MSM systems at work from home. My idea is to search for tools that are written to exchange data with other MUMPS systems. One of the design goals of ANSI MUMPS was to be able to exchange programs and data across the various implementations, so there should be tools. And because MUMPS is so old, my bet is that most exchange formats are simple ASCII files with a line-oriented format and simple delimiters, because that's what all MUMPS systems (and those grey-haired MUMPS coders) are able to handle. And by the way: Don't expect much error checking or even error handling in old tools. All MUMPS code I've seen (not only or own legacy system, but also the code delivered by Micronetics) is very optimistic regarding the well-formedness and validity of its input. It seems that no MUMPS coder ever mistrusted foreign data or user input. Unexpected input usually leads to crashes or damaged or lost data, get used to it. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]


Keep It Simple, Stupid
	PerlMonks