http://qs321.pair.com?node_id=662171


in reply to Re: Building a database from XML data feed
in thread Building a database from XML data feed

Yes I'm throwing out the old work. It requires building a new table for every company every year, is undocumented, ...I'll stop there.

The spec is understandable. There is the question of building a facility to merge specific partial feeds together manually, or just rebuild the whole thing daily from a full feed. A rollback and maybe a way to lock fields from being updated.

I've also been pondering a model built around the full XML feed loaded right into memory at server startup, which might make it more robust and configurable.. Also yes they say the schema will change but not how, I figure the most important parts won't change but would like to make it configurable by the admin so I do not have to support it forever. Certainly an update will be issued when an executive of a company is hired or retires, also new types of officers could be added, etc.

So yes I can see a way to model the largest features of the XML structure in DBIx but am intrigued by the possibility of not greatly minimizing that. Somewhere though I'll have to do some degree of linking feed data to manually entered data, or importing them into the same database. It can all just be string data. Maybe json and yml could be useful.

The data looks like this. Probably thousands of companies, here's just one. I think storing 500 companies is more what we need to do for now though.

<?xml version="1.0" encoding="ISO-8859-1"?> <Feed ExtractDate="08/08/2006" ExtractTime="11:30:41"> <ENTITY EntityReference="0000127509" LegalName="21st Century Holding C +o." Status="A"> <COMPANY> <Identity> <OfficialName>21st Century Holding Co.</OfficialName> <ShortName>21st Century Holding Co.</ShortName> <Status>Active</Status> <CountryCode>USA</CountryCode> <Region>South Atlantic</Region> <CompNumber>00096995</CompNumber> <CIK>0001069996</CIK> <MergentIndustryCode>8.2</MergentIndustryCode> <CommonTicker>TCHC</CommonTicker> <CommonExchange>NMS</CommonExchange> <CommonCusip>90136Q100</CommonCusip> <Street1>4161 N.W. 5th Street</Street1> <City>Plantation</City> <State>FL</State> <Country>USA</Country> <Zipcode>33317</Zipcode> <PhoneNumber>954 581 9993</PhoneNumber> <Email>fedinfo@fedusa.com</Email> <WebSite>www.fedfirst.com</WebSite> <FYE>12/31/2005</FYE> </Identity> <BusinessActivities> <SIC Primary="6331" Secondary="6719"/> <NAIC Primary="524126" Secondary="551112"/> <TextSection Title="Business Summary" Date="06/01/2006"> <![CDATA[ <p>21st Century Holding is an insurance holding company, which, throug +h its subsidiaries, controls the insurance underwriting, distribution + and claims process. Co. underwrites personal automobile insurance an +d homeowners and mobile home property and casualty insurance in the S +tate of Florida through its subsidiary, Federated National Insurance +Company. Co. has underwriting authority for third-party insurance com +panies which it represents through a managing general agent. Co. also + offers financing to its own and third-party insureds through its sub +sidiary, Federated Premium Finance, Inc., and pays advances through F +ed First Corp.</p> ]]> </TextSection> </BusinessActivities> <Executives> <Section Title="Officers"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn., Pres."/> <Executive FirstName="Richard" MiddleName="A." LastName="Widdic +ombe" Title="C.E.O."/> <Executive FirstName="Michele" MiddleName="V." LastName="Lawson +" Title="V.P., Agency Oper., Treas."/> <Executive FirstName="James" MiddleName="G." LastName="Jennings +" Suffix="III" Title="C.F.O."/> <Executive FirstName="Keith" MiddleName="M." LastName="Linder" +Title="C.O.O."/> <Executive FirstName="James" MiddleName="A." LastName="Epstein" + Title="Sec."/> </Section> <Section Title="Directors"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn."/> <Executive FirstName="Carl" MiddleName="" LastName="Dorf"/> <Executive FirstName="Bruce" MiddleName="" LastName="Simberg"/> <Executive FirstName="Charles" MiddleName="B." LastName="Hart" +Suffix="Jr."/> <Executive FirstName="Richard" MiddleName="W." LastName="Wilcox +" Suffix="Jr."/> <Executive FirstName="Peter" MiddleName="" LastName="Prygelski" +/> </Section> </Executives> <FinData_Generated> <Report> <ReportDate>03/31/2006</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="23001737"/> <fi Mapcode="-384" Amount="0.83"/> <fi Mapcode="-379" Amount="53213270"/> <fi Mapcode="-365" Amount="8599042"/> <fi Mapcode="-364" Amount="40167125"/> <fi Mapcode="-356" Amount="227079885"/> <fi Mapcode="-344" Amount="93988871"/> <fi Mapcode="-337" Amount="28367811"/> <fi Mapcode="-333" Amount="6013312"/> <fi Mapcode="-310" Amount="25114709"/> <fi Mapcode="-249" Amount="36.8577792400461"/> </Report> ... 20 more reports here ... <ReportDate>03/31/2002</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="6086503"/> <fi Mapcode="-384" Amount="0.22"/> <fi Mapcode="-379" Amount="14592615"/> <fi Mapcode="-365" Amount="6165671"/> <fi Mapcode="-364" Amount="5822488"/> <fi Mapcode="-356" Amount="59264371"/> <fi Mapcode="-344" Amount="17710206"/> <fi Mapcode="-337" Amount="549056"/> <fi Mapcode="-333" Amount="991370"/> <fi Mapcode="-310" Amount="9507000"/> <fi Mapcode="-249" Amount="16.2431471547281"/> </Report> </FinData_Generated> <Miscellaneous> <Employee Description="AppoximateFullTime" Count="135" AsOf="12/ +31/2005"/> <Shareholders Count="3000" AsOf="03/29/2006"/> <ShareHolderRelations Name="Becky Campillo" PhoneNumber="954-581 +-9993 x1257"/> <Incorporation Country="USA" State="FL" Month="3" Year="1991"/> <Provider ServiceType="Auditor" Name="McKean, Paul, Chrycy, Flet +cher &amp; Co."/> <Provider ServiceType="Counsel" Name="Broad &amp; Cassel"/> </Miscellaneous> <StockSummary> <StockIssue Type="Common" Description="common"> <StockOutstanding Amount="6048842.00" Units="SHR" Date="12/31/ +2004"/> <Par Amount="0.01" Units="USA"/> <Authorized Amount="37500000.00" Units="SHR" Unlimited="No"/> <Treasury Amount="696849.00" Units="SHR"/> <StockIdentity Ticker="TCHC" Exchange="Nasdaq National Market" +/> <TextSection Title="Stock Splits" Date="06/01/2006"> <![CDATA[ <p><font color="black">$0.01 par shares split in the form of a 50% sto +ck dividend on Sept. 7, 2004.</font></p> ]]> </TextSection> <TextSection Title="Ownership" Date="06/01/2006"> <![CDATA[ <p><font color="black">As of April 15, 2005, Edward J. Lawson and all +directors and executive officers as a group held 25.1% and 33.1%, res +pectively of Co.'s outstanding common stock.</font></p> ]]> </TextSection> <TextSection Title="Voting Rights" Date="06/01/2006"> <![CDATA[ <p><font color="black">Entitled to one vote per share.</font></p> ]]> </TextSection> <TextSection Title="Dividends Paid" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td> <p><font color="teal"><two +column>2001</twocolumn></font></p> </td> <td> <p><fo +nt color="teal"><twocolumn>0.08</twocolumn></font></p> </td> + <td> <p><font color="teal"><twocolumn>2002</twocolumn></font +></p> </td> <td> <p><font color="teal"><twocolumn>0. +11</twocolumn></font></p> </td> <td> <p><font color= +"teal"><twocolumn>2003</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> </tr> </table><p/> <p><font color="red"><footnote>&#6540 +7;</footnote></font><font color="black">Adjusted for 3-for-2 split:</ +font></p> <table border="1"> <tr> <td> <p><font color +="teal"><twocolumn>2004</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> <td> <p><font color="teal"><twocolumn>[1]2005</t +wocolumn></font></p> </td> <td> <p><font color="teal +"><twocolumn>0.32</twocolumn></font></p> </td> <td>&#65407; +</td> <td>&#65407;</td> </tr> </table><p/> <p><font color=" +red"><footnote>[1]To Dec. 1</footnote></font></p> ]]> </TextSection> <TextSection Title="Options" Date="06/01/2006"> <![CDATA[ <p><font color="black">Dec. 31, 2004, authorized for issuance, 3,688,5 +00 shares; options outstanding, 1,119,575 shares. </font></p> ]]> </TextSection> <TextSection Title="Transfer Agent &amp; Registrar" Date="06/0 +1/2006"> <![CDATA[ <p><font color="black">Global Securities Transfer, Inc., Denver, CO</f +ont></p> ]]> </TextSection> <TextSection Title="Price Range" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td>&#65407;</td> <td> <p> +<font color="green"><pricerange>2004</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2003</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>2002</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>2001</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2000</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>1999</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>1998</pricerange></font></p> </t +d> </tr> <tr> <td> <p><font color="green"><priceran +ge>High</pricerange></font></p> </td> <td> <p><font +color="green"><pricerange>24.50</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>23.59</pricerange>< +/font></p> </td> <td> <p><font color="green"><pricer +ange>13.75</pricerange></font></p> </td> <td> <p><fo +nt color="green"><pricerange>3.88</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>7 15/16</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>7 3/4</pricerange></font></p> </td> <td> <p +><font color="green"><pricerange>8 1/4</pricerange></font></p> < +/td> </tr> <tr> <td> <p><font color="green"><pricer +ange>Low</pricerange></font></p> </td> <td> <p><font + color="green"><pricerange>9.17</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>9</pricerange></fon +t></p> </td> <td> <p><font color="green"><pricerange +>3</pricerange></font></p> </td> <td> <p><font color +="green"><pricerange>0.98</pricerange></font></p> </td> <td +> <p><font color="green"><pricerange>2 7/16</pricerange></font +></p> </td> <td> <p><font color="green"><pricerange> +2 7/8</pricerange></font></p> </td> <td> <p><font co +lor="green"><pricerange>5 3/4</pricerange></font></p> </td> < +/tr> </table><p/> ]]> </TextSection> <TextSection Title="Offered" Date="06/01/2006"> <![CDATA[ <p><font color="black">(1,250,000 shares) at $7.50 per share (proceeds + to Co., $6.90 per share) on Nov. 10, 1998 through Gilford Securities + Incorporated; and associates. Offering contained over-allotment opt +ions to cover 187,500 shares. Proceeds used for contribution to Fede +rated National's capital to increase its underwriting capacity, repay +ment of a portion of the outstanding balance under Co.'s revolving li +ne of credit agreement, financing of acquisitions and working capital + and general corporate purposes.</font></p> ]]> </TextSection> </StockIssue> </StockSummary> </COMPANY> </ENTITY> ... more entities here ... </Feed>

Replies are listed 'Best First'.
Re^3: Building a database from XML data feed
by mattr (Curate) on Jan 14, 2008 at 05:03 UTC
    Replying to myself here.. I just found that I'll have to allow companies to be added manually, not just from the feed. So I will have to use a database it seems. Also will have probably 1000 companies and maybe grow up to 10,000 over some years. Thanks for your help.