http://qs321.pair.com?node_id=297389


in reply to What are the monks doing with Perl and Linguistics?

Collins Dictionaries were doing a lot of corpus linguistics using Perl when I left, back in 2002. They look after the Collins/Birmingham University Bank of English, which is a great big huge corpus. There are also a variety of monitor corpora, which are used to gauge changes in usage over time.

Corpus data collection got a whole lot easier with the web ... ☺ -- Sitescooper is particularly handy for large-scale text collection (with permission, of course).

--
bowling trophy thieves, die!

  • Comment on Re: What are the monks doing with Perl and Linguistics?

Replies are listed 'Best First'.
Re: Re: What are the monks doing with Perl and Linguistics?
by Anonymous Monk on Nov 11, 2003 at 23:22 UTC
    I'm currently researching cross-lingual digital libraries and I use Perl, although I am fairly new to the language. I have just finished writing a light stemmer, some ngram code, some ngram comparaison code, and basically i'm at that 'generating stats' stage. I'm looking for similarities between documents, differences in them too, and then look at language and context, and so on. The idea is to make documents searchable in many different langauges. I did a masters where I used Java, and made a system that could retrieve a similar english document in french and german..it kinda worked ;) I'm always interested in hearing what other are up to in that area, maybe we can swap some tools and share some ideas!! Ceejay