Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Fellow Monks,

I have just submitted to CPAN a very alpha release of a module which collects data from various online providers of Covid19-related statistics (e.g. number of confirmed cases etc.). For example, data provided by Johns Hopkins University (as an arcgis "dashboard") or the data provided by the UK government for data relating to the UK local authorities.

All the providers I used (so far, John Hopkins University and the UK government) offer an API which provides JSON data. The scraper can be easily configured (that is subclassed) to set the url entry point to the API and how data should be converted to a Perl object. So, it is relatively easy to create more data fetchers which can all store to the same db.

Fetched data is stored in an SQLite database (support for MySQL exists but remains untested and probably broken - but easily fixed) and there is a high-level interface (thank you DBIx::Class) for saving and retrieving this data. This makes it easy to save data points only if they are more "up-to-date" than what currently exists in database, for the same location and time point (using heuristics). Or, it allows to retrieve all data for a single location over time, or for a single time point/range over all or some locations.

The CPAN module is Statistics::Covid. It is also hosted on github at https://github.com/hadjiprocopis/statistics-covid which additionally provides the data I have so far collected since a couple of weeks ago.

If anyone has any comments or suggestions please leave me a message.

If anyone wishes to contribute, e.g. data analysis or plots generation, under this or any other namespace, please let me know so that I link to that work. I am also starting to write my own analysis which will be under the namespace: Statistics::Covid::Analysis.

Here is some code from the synopsis as a quick start:

use Statistics::Covid; use Statistics::Covid::Datum; $covid = Statistics::Covid->new({ 'config-file' => 't/example-config.json', 'providers' => ['UK::BBC', 'UK::GOVUK', 'World::JHU'], 'save-to-file' => 1, 'save-to-db' => 1, 'debug' => 2, }) or die "Statistics::Covid->new() failed"; # fetch all the data available (posibly json), process it, # create Datum objects, store it in DB and return an array # of the Datum objects just fetched (and not what is already in D +B). my $newobjs = $covid->fetch_and_store(); print $_->toString() for (@$newobjs); print "Confirmed cases for ".$_->name() ." on ".$_->date() ." are: ".$_->confirmed() ."\n" for (@$newobjs); my $someObjs = $covid->select_datums_from_db({ 'conditions' => { belongsto=>'UK', name=>'Hackney' } }); print "Confirmed cases for ".$_->name() ." on ".$_->date() ." are: ".$_->confirmed() ."\n" for (@$someObjs); # or for a single place (this sub sorts results wrt publication ti +me) my $timelineObjs = $covid->select_datums_from_db_for_location('Hac +kney'); for my $anobj (@$timelineObjs){ print $anobj->toString()."\n"; } print "datum rows in DB: ".$covid->db_count_datums()."\n"

Edit: thank yous to marto for advice on githubbing this module and to erix for pointing some errors in this page (John -> Johns)

BW, bliako


In reply to Statistics::Covid : module for fetching and storing covid19-related data for analysis by bliako

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-25 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found