Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Estimating Vocabulary

by belg4mit (Prior)
on Mar 27, 2002 at 03:36 UTC ( [id://154575]=note: print w/replies, xml ) Need Help??


in reply to Estimating Vocabulary

Well I suppose that depends on your defintion of word. am, are, is, was - are these each words? Also IIRC the English language is purported to have a lexicon on the order of 320,000 words*. The average American vocabulary has been in steady decline since the early twentieth century at which point I believe it was on the order of several thousand words*. A few things to consider:
  • dictionaries may contain archaic forms
  • does your dictionary contain proper nouns? do you care?
  • the content of the language is not evenly distributed across the lexicon, e.g. a single word (sans modifiers) for "love" and a plethora for shades of blue.
  • * I shall attempt to find evidence to support this. An enlightening thread, but then again it is usenet... Apparently this is a pretty hotly contested topic.

    --
    perl -pe "s/\b;([st])/'\1/mg"

    Replies are listed 'Best First'.
    Re: Re: Estimating Vocabulary
    by YuckFoo (Abbot) on Mar 27, 2002 at 04:06 UTC
      Good points all, belg4mit.

      * If the sample is large enough, the correct percentage of archaic words will be in the sample, it'll work itself out.

      * I had already removed proper nouns, nouns containing any uppercase letter. I should have noted that, but again I'm not sure it matters with a large enough sample.

      * I'm not sure how words should really be counted, still looking for a reference myself. For my purpose, I am considering run, runs, ran, running as unique words.

      I'm just looking for a ballpark number. It seems like a good ballpark to me that if the boy consistently knows 20-25% of the words in the sample, he should know 20-25% of the words in $DICT.

      If anyone has pointers to real vocabulary development numbers and counting methods, I'd like to get'em.

      YuckFoo

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: note [id://154575]
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others meditating upon the Monastery: (4)
    As of 2024-04-19 15:30 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found