Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Statistics::Covid : module for fetching and storing covid19-related data for analysis

by bliako (Prior)
on Mar 26, 2020 at 19:42 UTC ( #11114688=CUFP: print w/replies, xml ) Need Help??

Fellow Monks,

I have just submitted to CPAN a very alpha release of a module which collects data from various online providers of Covid19-related statistics (e.g. number of confirmed cases etc.). For example, data provided by Johns Hopkins University (as an arcgis "dashboard") or the data provided by the UK government for data relating to the UK local authorities.

All the providers I used (so far, John Hopkins University and the UK government) offer an API which provides JSON data. The scraper can be easily configured (that is subclassed) to set the url entry point to the API and how data should be converted to a Perl object. So, it is relatively easy to create more data fetchers which can all store to the same db.

Fetched data is stored in an SQLite database (support for MySQL exists but remains untested and probably broken - but easily fixed) and there is a high-level interface (thank you DBIx::Class) for saving and retrieving this data. This makes it easy to save data points only if they are more "up-to-date" than what currently exists in database, for the same location and time point (using heuristics). Or, it allows to retrieve all data for a single location over time, or for a single time point/range over all or some locations.

The CPAN module is Statistics::Covid. It is also hosted on github at https://github.com/hadjiprocopis/statistics-covid which additionally provides the data I have so far collected since a couple of weeks ago.

If anyone has any comments or suggestions please leave me a message.

If anyone wishes to contribute, e.g. data analysis or plots generation, under this or any other namespace, please let me know so that I link to that work. I am also starting to write my own analysis which will be under the namespace: Statistics::Covid::Analysis.

Here is some code from the synopsis as a quick start:

use Statistics::Covid; use Statistics::Covid::Datum; $covid = Statistics::Covid->new({ 'config-file' => 't/example-config.json', 'providers' => ['UK::BBC', 'UK::GOVUK', 'World::JHU'], 'save-to-file' => 1, 'save-to-db' => 1, 'debug' => 2, }) or die "Statistics::Covid->new() failed"; # fetch all the data available (posibly json), process it, # create Datum objects, store it in DB and return an array # of the Datum objects just fetched (and not what is already in D +B). my $newobjs = $covid->fetch_and_store(); print $_->toString() for (@$newobjs); print "Confirmed cases for ".$_->name() ." on ".$_->date() ." are: ".$_->confirmed() ."\n" for (@$newobjs); my $someObjs = $covid->select_datums_from_db({ 'conditions' => { belongsto=>'UK', name=>'Hackney' } }); print "Confirmed cases for ".$_->name() ." on ".$_->date() ." are: ".$_->confirmed() ."\n" for (@$someObjs); # or for a single place (this sub sorts results wrt publication ti +me) my $timelineObjs = $covid->select_datums_from_db_for_location('Hac +kney'); for my $anobj (@$timelineObjs){ print $anobj->toString()."\n"; } print "datum rows in DB: ".$covid->db_count_datums()."\n"

Edit: thank yous to marto for advice on githubbing this module and to erix for pointing some errors in this page (John -> Johns)

BW, bliako

  • Comment on Statistics::Covid : module for fetching and storing covid19-related data for analysis
  • Download Code

Replies are listed 'Best First'.
Re: Statistics::Covid : module for fetching and storing covid19-related data for analysis
by Fletch (Chancellor) on Mar 27, 2020 at 19:25 UTC

    There's a GitHub repo from Johns Hopkins with their data which might be suitable for cloning for an offline data source.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Fletch that's great, I will add this to the providers. Thanks

Re: Statistics::Covid : module for fetching and storing covid19-related data for analysis
by bliako (Prior) on Mar 27, 2020 at 13:00 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://11114688]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2020-08-12 15:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which rocket would you take to Mars?










    Results (66 votes). Check out past polls.

    Notices?