![]() |
|
Your skill will accomplish what the force of many cannot |
|
PerlMonks |
comment on |
( #3333=superdoc: print w/replies, xml ) | Need Help?? |
perlfaq6: How can I print out a word frequency or line frequency summary? You've really got two programming problems. One is the business logic: How to do word counts (or whatever stats you want to generate given a body of text). The next is how to scrape Wikipedia. If it were just some 3rd tier website you were scraping I would expect that you would have to deal with a separation of concerns; you would find a module that helps with the business logic, and another that helps with the scraping (plus something to help with the parsing). But this is Wikipedia, and it's possible that there is something already in existence that can scrape Wikipedia more effectively "out of the box." There may even be something that can handle your language statistics. You have to search. Type "Wikipedia" into the search box. Try it now: Wikipedia. There you find all sorts of CPAN solutions that mention Wikipedia. You browse through them. You find one that seems to suit your needs. And then you incorporate it into your project. If you're lucky you find something where you just write a wrapper around it and all the functionality you need it provided. More likely, you find something that gets you part-way there, and the rest is what we call programming. When I did a quick search I was sort of impressed with Text::Corpus::Summaries::Wikipedia. But this might be a case where you're better off using WWW::Scraper::Wikipedia::ISO3166, or WWW::Wikipedia (more general solutions), and then come up with your own business logic, or let another CPAN solution take over where the Wikipedia modules leave off. Dave In reply to Re: Create a dictionary from wikipedia
by davido
|
|