http://qs321.pair.com?node_id=814059

bobf has asked for the wisdom of the Perl Monks concerning the following question:

Prelude: While this question is quite specific, I am also interested in how possible solutions could be generalized to other types of devices.

I have a HP Officejet Pro L7580 printer. One of the functions in the HP Solution Center software (that came with the printer) is an ability to get printer usage statistics. This is done by using the Solution Center GUI to select "myPrintMileage", which opens a web page that displays the printer usage stats in a graphical format.

The problem is that the graphics (bar charts, pie charts) are not very precise and none of the data is available for download. I would like to find an automated way of obtaining the information (using Perl, of course) so I can track it over time.

Possible approaches include:

I have done quite a bit of searching to avoid reinventing the wheel. Unfortunately, I have not found any indication that the printer has a public API or that an HP client can be used as a proxy to interface with the printer. The few HP tools that I found (other than printer drivers) are targeted to the management of multiple printers and do not appear to meet my needs.

Since my initial attempt at finding a fully-automated approach failed, I looked behind door #2. Using Firefox's Live HTTP Headers extension I discovered that the myPrintMileage program makes a POST call to the HP website. The printer statistics are passed as an XML string (URL-encoded, of course). The data in the XML is much more precise and includes more information than what is displayed on the web page shown to the user.

Therefore, I beseech my fellow monks to meditate on the following:

  1. Is there a way to discover how the myPrintMileage program interfaces with the printer so I can make the call directly?
  2. Are there existing utilities that would allow me to get the printer stats directly from the printer? Am I reinventing the wheel?
  3. Is it feasible to attempt to intercept the POST call? (I believe this approach would require some manual intervention.)
  4. What other approaches should I consider?

I could write an XML parser for the POST content or a screen-scraper, but before I settle for a solution that is not fully automated I would like to get input from those wiser than I.

Many thanks.

Update: I suspect, but do not fully grok, how SNMP and/or MIB might apply.

Replies are listed 'Best First'.
Re: Getting statistics from an attached device (printer)
by Corion (Patriarch) on Dec 23, 2009 at 12:26 UTC

    (Repost due to DB hiccups, twice)

    Scraping the GUI using WWW::Mechanize is likely the easiest path, especially as you've already looked at the stuff that gets sent using the Live HTTP Headers.

    Alternatively, I'd sniff the whole network traffic using Wireshark and see what SNMP packets get sent, and if any of these belong to the HP namespace.

      Thanks for the input, Corion. I am not opposed to writing a scraper, but since there is a multi-step manual process to generate the page(s) I'd prefer to investigate other options first. In addition, not all of the information that I want is presented on the pages so it would not be accessible.

      I'll definitely put Wireshark in my back pocket. I have no prior experience using a packet sniffer so there would likely be a learning curve.

      I've done some more reading on SNMP and I installed a few related modules from CPAN. I am starting to think that might be the most efficient route. The trick now is finding out what variables to request from the printer.

      SNMP::NPAdmin might be helpful in pursuing an SNMP approach.

      I've used Wireshark to avoid understanding overly convoluted web pages. It turned out to be far easier than I expected. It will help to filter on the IP address of the printer either in capturing transactions or in analyzing them. Once you know what requests are being made, you may find wget to be useful to test requests that the GUI application won't generate.

        SNMP::NPAdmin is one of the several modules that I am investigating (and the most promising, by the way). Unfortunately the PPM installation failed and when I used the manual approach (nmake) it broke on the space in "Program Files"*. I will pursue the installation issue when I return after the holidays. In the meantime, I think I will install Wireshark on my laptop and give it a test drive.

        If anyone has any ideas about how to determine which MIB is used by this printer, I'd really appreciate it. My efforts have been unsuccessful thus far and I will need the variable names (or OIDs) to make calls to NPAdmin.

        *This is a huge pet peeve of mine, given Windows' finicky shell quoting and the difficulty of third party software to handle it correctly. Why "they" decided to put a space in the name of the root directory used for programs is beyond me.</rant>