http://qs321.pair.com?node_id=653274

chanakya has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I need suggestions to implement the following

1) I have a csv file(afs.csv) which has the data as follows
Value Date,Effective Date,Cons Code,Constituent Name,Country Code,Exch +ange Code,ISO code,Markers,Closing Subsector Code,New Subsector Code, +Closing Price,Price Adjustment Factor,Adjusted Price,Previous Shares +In Issue,New Shares In Issue,Previous Investability Weight,New Invest +ibility Weight,Amendment Code,Amendment No tes 06/08/2007,09/08/2007,C30301,Heiwa,JA,FJT,JPY,AWD AWDXUS AWEURPS WORLD +S AWPACS AWXE AWXEBS AWXSAFS AW XUKS AWXUSAS,,,1770.000000,,,115743000,145048960,,,IS, 06/08/2007,06/08/2007,C36949,OneSteel,AU,AAS,AUD,AWD AWDXUS AWEURPS WO +RLDS AWPACS AWPACXJA AWXE AWXEB S AWXJAS AWXSAFS AWXUKS AWXUSAS WIAUS,,,6.750000,,,573416481,867409520 +,,,IS,
the above data is imported to a hashref ($afs_data), which is as follows:
{ 'C30301|IS' => { 'Closing_Sub_Code' => '', 'New_Sub_Code' => '', 'Exchange_Code' => 'FJT', 'Amendment_Code' => 'IS', 'New_Shares_Issue' => '145048960', 'Effective_Date' => '09/08/2007', 'Cons_Code' => 'C30301', 'Prev_Shares_Issue' => '115743000', 'Country_Code' => 'JA', 'IndexMarker' => 'AWD AWDXUS AWEURPS AWOR +LDS AWPACS AWXE AWXEBS AWXSAFS AWXUKS AWXUSAS', 'New_Invest_Wt' => '', 'Value_Date' => '06/08/2007', 'Amendment_Notes' => '', }, 'C36949|IS' => { 'Closing_Sub_Code' => '', 'New_Sub_Code' => '', 'Exchange_Code' => 'AAS', 'Amendment_Code' => 'IS', 'New_Shares_Issue' => '867409520', 'Effective_Date' => '06/08/2007', 'Cons_Code' => 'C36949', 'Prev_Shares_Issue' => '573416481', 'Country_Code' => 'AU', 'IndexMarker' => 'AWD AWDXUS AWEURPS AWOR +LDS AWPACS AWPACXJA AWXE AWXEBS AWXJAS AWXSAFS AWXUKS AWXUSAS WIAUS', 'New_Invest_Wt' => '', 'Value_Date' => '06/08/2007', 'Amendment_Notes' => '', }, }
2) I have a config file from which the data is imported to a AOH (@IndxData) The datastructure is as below
{ 'SectorCurrency' => 'USD', 'Currency' => 'USD', 'IdxCode' => 'WORLDS', 'Name' => 'AFS World Index', 'CountryCode' => '', 'ExchangeTag' => 'FAWIU' }; { 'SectorCurrency' => 'EUR', 'Currency' => 'EUR', 'IdxCode' => 'WORLDS', 'Name' => 'AFS World Index', 'CountryCode' => '', 'ExchangeTag' => 'FAWI_E' }; { 'SectorCurrency' => 'USD', 'IdxCode' => 'WORLDS', 'SectorCode' => '0530', 'Sector_IdxMarker' => 'WORLDS0530', 'Name' => 'AFS World Index', 'ExchangeTag' => 'FAWI0530', 'CountryCode' => '' }; { 'SectorCurrency' => 'USD', 'IdxCode' => 'WORLDS', 'SectorCode' => '1770', 'Sector_IdxMarker' => 'WORLDS1770', 'Name' => 'AFS World Index', 'ExchangeTag' => 'FAWI1770', 'CountryCode' => '' }; { 'SectorCurrency' => 'USD', 'IdxCode' => 'WORLDS', 'IndustryCode' => '1000', 'Industry_IdxMarker' => 'WORLDS1000', 'Name' => 'AFS World Index', 'ExchangeTag' => 'FAWI1000', 'CountryCode' => '' }; { 'SectorCurrency' => undef, 'Currency' => 'USD', 'IdxCode' => 'AWDEMEA', 'Name' => 'AFS Developed Index', 'CountryCode' => '', 'ExchangeTag' => 'FADEMU' }; { 'SectorCurrency' => undef, 'Currency' => 'USD', 'IdxCode' => 'AWE', 'Name' => 'AFS Secondary Index', 'CountryCode' => '', 'ExchangeTag' => 'FASEU' };
3) I have a country_mapping file from which the data is imported into a hashref $country_data;
Region Code| Country Code WORLDS, ARG|AU|BELG|BRAZ|CAN|CHL|CHN|COL|CZE|DEN|EGY|FIN|FRA|GER|GRC|H +K|HUN|IDA|INDO|JA| AWPACS, ARG|AU|BRAZ|CAN|CHL|CHN|COL|EGY|HK|IDA|INDO|ISR|JA|KOR|MAL|MAR +|MEX|NZ|PAK|PER|PHIL|SAF|SI|THAI|TWN|USA|
$country_data:: { 'WORLDS' => { 'Country_code' => 'WORLDS', 'Region_code' => 'ARG|AU|BELG|BRAZ|CAN|CHL|CH +N|COL|CZE|DEN|EGY|FIN|FRA|GER|GRC|HK|HUN|IDA|INDO|JA|' }, 'WIEURPS' => { 'Country_code' => 'AWPACS', 'Region_code' => 'ARG|AU|BRAZ|CAN|CHL|CHN|C +OL|EGY|HK|IDA|INDO|ISR|JA|KOR|MAL|MAR|MEX|NZ|PAK|PER|PHIL|SAF|SI|THAI +|TWN|USA|' },
Now here are the rules:

1) i) for each record of afs.csv, the "Markers" (all the markers in the for a given record should be checked) need to be picked up and need to be look up @IndxData at "IdxCode"
ii) once the value of "IdxCode" is matched against the marker (AWD/AWDXUS etc) , pick the corresponding "ExchangeTag" from @IndxCode
iii) go to country_mapping file and search the "Country Code" from afs.csv file in the line where we have the "ExchangeTag".
Once the country is found return the corresponding ExchangeTags from @IndxData for that record of afs.csv file.
iv) if the "Closing Subsector Code" is defined, then look at "SectorCode" in @IndxData and pull out the "ExchangeTag"

Let me give an example of the first record in afs.csv file: Example: "Markers" = AWD AWDXUS AWEURPS WORLDS AWPACS AWXE AWXEBS AWXSAFS AW XUKS AWXUSAS for each and every marker in the string(AWD/AWDXUS/AWEURPS etc) the script should look into @IndxData.

for "WORLDS" (one of the marker in the markers list), it should get the ExchangeTags "FAWIU", "FAWI_E" and FAWI1770 based on "Closing Subsector Code".
Now, script should look for "country code" JA in the country_mapping file in the line where we have "Region Code" as "WORLDS".

As per the rule the country code "JA" exists in "Region Code" WORLDS. Finally script should return the ExchangeTags "FAWIU", "FAWI_E", FAWI1770 for the first record in afs.csv file.

Please give me suggestions or sample code that can do the above steps in a faster way.

Thanks in advance

Replies are listed 'Best First'.
Re: Help match relevant data from three data structures
by dragonchild (Archbishop) on Nov 27, 2007 at 16:17 UTC
    I'd be delighted to do the job for you. You can contact me here for my rate. Of course, I'm not cheap. Alternately, you could post this to jobs.perl.org or one of the rent-a-coder sites.

    Of course, you could just take the specification that you've described (which is quite good) and actually write it yourself. You'll be using a lot of hashes.

    That said, what do you want "faster" for? This should just run once, right? Who cares if it runs in 2 minutes or overnight so long as it gets done ... ?


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Help match relevant data from three data structures
by gamache (Friar) on Nov 27, 2007 at 16:23 UTC
    In my opinion, your best bet is to use Perl in conjunction with a relational database, via the DBI interface. Perl data structures don't represent this data ideally; there are a lot of relations here which can be expressed more clearly using a RDBMS and SQL, than in Perl using nested hashes/arrays. If you don't have a database system handy, you can grab and install DBD::SQLite, which is a small, zero-configuration SQL database system.

    If you really want to do this in pure Perl, here are some tips:

    Rather than having space- or pipe-separated string lists in your data structures, use array refs instead, as in:

    $country_data->{WORLDS}{Region_code} = [ split /\|/, 'ARG|AU|BELG|BRAZ +|CAN|...' ];

    Write a lot of intermediate functions like get_exchangetags_from_countrycode() and is_country_in_region() and write your higher-level routine in terms of these, rather than making one monolithic routine which tweaks all your data structures directly. This is called 'abstraction', and it not only frees you to think at a higher level without worrying about lower-level details, it is a godsend should you ever have to change the layout of your data structures.