Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

OT (for now): Mis-spelling research

by BrowserUk (Patriarch)
on Oct 15, 2003 at 10:43 UTC ( [id://299371]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

This is a half-notion, with a view to maybe some code (in perl of course) at some future point.

Is anyone aware of any research papers, discussion etc on

  1. What words are most frequently mis-spelt (english (uk prefereably)).
  2. Research that considers why particular words are mis-spelt.
  3. Research that considers who mis-spells which words...educational background, socio-economic background, IQ etc.

Obviously, available web-references would be ideal, but requestable sources or books would also be of interest.

TIA.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: OT (for now): Mis-spelling research
by derby (Abbot) on Oct 15, 2003 at 12:00 UTC
    You have to love the net. A few seconds on google reveals - www.spellingsociety.org. I'm not sure if it has what you need ... I'm just floored that there is such a thing.

    -derby

      Yes. I found that a while ago. They reference a lot of studies and research, but unfortunately only quote small snippets without context in support of their particular take on the world. They also tend to concentrate on studies of children whom are learning to spell, rather than a broad range of age groups or other categorisations, which I am more interested in.

      I would dearly like to read many of the original studies that they quote from, but I've had difficulty trying to track them down on the web. I'll probably end up going to the British Library to obtain copies if my interest continues to that level, but hoped that some of the linguist monks might know of better on-line sources than I have found so far.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        This a very comprehensive paper on spelling correction research available in the ACM digital library.You need to
        login with an account to access the full .pdf file.

        Techniques for automatically correcting words in text

        I have the full pdf file in case you want it.
        Cheers:)
Re: OT (for now): Mis-spelling research
by jdtoronto (Prior) on Oct 15, 2003 at 14:20 UTC
    Another thought,

    Like you, I find that the more passionately I feel about a subject and the more infromation I have to convey, the more errors I make. Most of my errors are consistent, from becomes form for example.

    However, I do notice other phenomena.

    • People who learned to 'touch type' in the classical methods used to teach typists make these errors far less frequently. I suspect because they can watch the words forming on the screen as they type.
    • Those with a classical, broad and rigorous education, typical for example, of some English public schools and Australian private schools, make the mistakes less frequently. This is not universal, it depends significantly upon the attitude of the school and the teaching staff.
    • Those who learned English as a second langauge, after initial difficulties with grammar and syntax were amongst the best spellers.
    • Socio-economic background seems to not be an indicator. The main indicators for poor spelling and grammar in my experience are,
      • narrow education base
      • unfamiliarity with the written language (they don't read a lot).
    • IQ is an interesting one, some of my brightest students were the most atrocious spellers. Generally however these were the ones destined not to ultimately succeed. There was a distinct correlation to attitude to language and learning that translated into success after graduation. The poor spellers with high IQ generally had a poor attitude to detail which was reflected not only in spelling and grammar but also in lab work and submitted assignments. Those of similar IQ who exhibited better spelling were more likely to produce better quality lab work, more complete and better argued assignments and generally succeeded more readily in the workplace or in research.
    • As a teenager I was recruited into Mensa by some friends. I didn't stick around long. The members are of very high IQ, but many of them had an arrogance about how they handled their ability. Some had a very cavalier attitude to spelling and grammar, I used to call it the "I am so brilliant that people will clamour to sort out my mess just to receive my wisdom" attitude.
    My conclusion after some years of lecturing at University in the UK and Australia? Educational background and personal attitude has more to do with spelling than socio-economic factors or IQ. Followed closely of course by the "my fingers cant go as fast as my brain" phenomenon that you admit to and to which I am also prone.

    Great topic! jdtoronto

      I think your right about the touch-typing skills. I've often wished I taken lessons, rather than teaching myself my 3 1/2 fingers an 1 1/2 thumbs method.

      Interesting. Some of that correlates closely with my gut feel based on people I know or worked with etc., but some of it flatly contradicts it, but then my "sample" is quite small, my record one of (notoriously poor) memory, and corrolation, a finger-in-the-air affair.

      Without wishing to impune the veracity of your data:), how much of your conclusions are subjective and how much would you say was "pretty accurate"? Dumb question I know, but "ask the horse" and you usually get a pretty good answer:)

      This all got started because of an on-line conversation with 3 people, who's opinions I value, all reached disparate conclusions, which is unusual. None of us are linguists, teachers or any other profession that would give any weight at all to our conclusions, but we do generally, more or less, reach agreement after discussion. On this subject, we each had different experiences, and drew different conclusions.

      What I'm really looking for is some way to measure the effects of the phenomena (pressure, passion, mind-load), as well as any research data that might be available.

      If anyone has any thoughts as to how to write a program (in perl of course) to do this, I'm all ears.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        BrowserUK,

        I suppose what I am giving you is the distillation of over 30 years of my own schooling, work, teaching and thinking about the subject.

        Firstly, I am pedantic about spelling and try to be the same way about grammar. My parents didn't ensure I got a good education, but I was a voracious reader. After finishing high-school I earned every last cent of my university fees and educated myself. Got a great job and kept on learning. Over 20 years I completed eleven degrees in communications, electronics, mathematics, physics, bio-medical engineering and history. For 20 years I essentially worked in research, but when I started lecturing at universities I got a rude shock. My entire career was about doing the research and commununicating the ideas to others. But as a lecturer I found this was not so important. In 1992 I resigned my third part time teaching post (one in UK, one in US and then one in Australia) after having been taken to the faculty board three times with complaints of unfair treatment when I expected dissertations to be intelligible and was challenged when I marked them down for poor English.

        Each of us, coming from our own place, will perceive things differently. I don't doubt that your colleagues have different experiences, they start from a different place and got where they are by different routes I would expect. To be honest my samples may not be too large either. I don't think any of my classes had more than 15, maybe 20 students, my fields of specialisation being quite esoteric. My work experience was always in small groups, researchers tend to be that way! But within LARGE research institutes or institutions where the lunch room talk amongst my colleagues and I was often the problems they saw with rooky graduates. I was usually in the fundamental/bleeding edge team that had no more than 6-10 people at any given time.

        I had best stop, you got me onto one of my hobby horses!

        jdtoronto

      Was s/information/infromation/ an intentional joke considering that that was exactly the typo that you went on to discuss? :-)

      Two follow-up points. First for repetitive tasks (specifically assembly line work), it has been found that unintelligent people consistently do better than intelligent ones. Intelligent people are likely to get bored, distracted, experiment a little, etc, all of which worsens their performance.

      The other is that people who were trained by the classical methods used to teach typists (I was briefly) were judged by a measure where typos were very expensive. For instance you might have a wpm test where they test how many wpm you can sustain for 5 minutes or until your third typo, whichever comes first. This will teach accuracy.

      Given how much easier it is to erase mistakes on a computer than a manual typewriter, it is no wonder that few people who learned to type on computers become as careful.

Re: OT (for now): Mis-spelling research
by liz (Monsignor) on Oct 15, 2003 at 10:47 UTC
    I vaguely recall TREC having done some research in that area.

    Liz

Re: OT (for now): Mis-spelling research
by skx (Parson) on Oct 15, 2003 at 12:48 UTC

     Whilst I'm not aware of any real research I guess this is closely related to some work that I've been doing.

     After reading about bayasian filters for a while, and being impressed that they work so well I started thinking of other problems that could be solved statistically.

     One thing that I often do is misspell particular words, which don't get caught because I'll use the wrong word - like using "they" instead of "this". (Amazing how often I do that).

     Another class of errors is the holding of the shift key for too long. This resulted in the previous sentence starting "ANother...", and results in frequent uses of "THe", "LIvejournal", etc.

     The first problem I've not solved, but the second can be detected and corrected if you look at frequency analysis of letter pairs.

     I've written code that sums up the changes of a given letter being followed by another given letter. So for example the chance of "q" being followed by "u" is 95%. The chance of "T" to be followed, legitimately, by "H" is 7%.

     With a big enough sample I can flag errors with 98% accuracy - without using a dictionary.

     Maybe this is a cool use for perl?

    Steve
    ---
    steve.org.uk

      I'm not actually looking at how to correct misspellings at this point, though I agree that the use of Bayasian filters for this makes a lot of sense and is a very cool use of perl.

      My interest stems from my noticing that the frequency and range of the words I misspell increases dramatically when I am trying to write or type something I feel passionate about and/or have a large volume of information to convey.

      It seems to me that my brain gets ahead my fingers abilities to type the stuff and I find that I will sometime contract two similar words, that may be several words apart in the sentence am trying to type, into a single word and completely omit the intervening words. Another common occurance in my own typing is when I make a typo in a word that I normally spell correctly, notice and go back to correct it and get a mental block about the spelling.

      An example of this occured whilst type the word "passionate" above. I ommited one of the S's, and interchanged the I & O: "pasoinate". Whilst it was perfectly obvious to me that it was misspelt, for a few seconds, I simply could not see how to correct it. It required me to stop thinking about what I was going to type and concentrate specifically upon that word before I could see the correction.

      I've been trying to think of a way of measuring this phenomena -- if that's what it is. I keep thinking about some sort of program to try and apply pressure to the subject typing (me:) and also some way to force me to try and type one pice of information whilst thinking about another, but I thought I might find some research and possibly some test methods out there somewhere, but they have escaped me so far.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

Re: OT (for now): Mis-spelling research
by halley (Prior) on Oct 15, 2003 at 13:02 UTC

    There is a list of a few hundred English misspellings in Microsoft Word's Auto-Correct List file. It's a DBCS file but I haven't bothered to figure out the actual encoding. I just stripped any words that included non-ASCII, then started adding my own observations to the file.

    With that as a starting point, I wrote a little HTML proxy called Typoxy. When I use it, the proxy corrects spelling and minor grammatical errors as I browse the web. Very nice way to keep my sanity as I browse certain blogs that are littered with examples of the failings of American schooling. I suppose it could just count the errors instead, and thus get a somewhat slanted statistical view of the world of mistakes.

    I think any such research should be amended to the Moby Lexicon Project.

    --
    [ e d @ h a l l e y . c c ]

Re: OT (for now): Mis-spelling research
by inman (Curate) on Oct 15, 2003 at 12:54 UTC
    A brute force way to find the most often misspelled word that is in everyday use would be to ask the guys at google to trawl their logs to come up with a 'zeitgeist' fact. The statistics would be based on the number of times the 'did you mean to search for ...' link was followed and the words that were suggested.

    In the mean time I found this after typing misspelled into Google...

    inman

Re: OT (for now): Mis-spelling research
by William G. Davis (Friar) on Oct 15, 2003 at 11:22 UTC

    Mis-spelling research

    Did you mean misspelling research? Sorry, I couldn't help myself :)

Re: OT (for now): Mis-spelling research
by rkg (Hermit) on Oct 15, 2003 at 14:49 UTC
    There are misspellings, and there are typos. If interested in the latter as well, there's an interesting old Perl Journal article on typos which made it into the Games, Diversions & Perl Culture: Best of the Perl Journal book. The article discusses qwerty vs. dvorak, clearly wrong vs. plausible typos, and typos in non-Roman alphabet languages. Good article.
Re: OT (for now): Mis-spelling research
by Ella (Acolyte) on Oct 15, 2003 at 17:32 UTC
    You may be able to find some information that helps to answer your question (or at least to make it more interesting) here. You shall probably have the most fruitful line of enquiry if you do a search under "Writing Systems" or under "Text/Corpus Ling" If that fails, you might try posting to the List.

    Hope that helps, I'll be interested to see where you're going with this.

    'share and enjoy'

      Perfect. Thankyou greatly.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

Re: OT (for now): Mis-spelling research
by Not_a_Number (Prior) on Oct 15, 2003 at 19:45 UTC

    As rkg pointed out briefly above, I think that you should maybe consider more closely the difference between misspellings and typos. I have been working for a very long time in publishing and translation (Perl is Just Another Hobby). To take an example, there is a difference between people who will submit a manuscript containing a spelling mistake such as ‘definately’ and those who might let through a typo such as ‘deifnitely’. There are, of course, borderline cases. But I think that in most cases the difference is clear.

    Another example: using pen and ink (admittedly an increasingly less frequent phenomenon), I will never write ‘teh’ (for ‘the’), but this happens frequently when I’m typing on a keyboard. Same thing for the classic ‘from/form’ confusion.

    I am not up to date with current research, but back when I was, most efforts were focused on spelling (and how to ‘improve’ it) rather than on typos.

    At the risk of straying even further OT, it would be interesting to see research on how spellcheckers have affected spelling habits. Back in the old days, faced with a word that I was unsure about (let’s say ‘vicissitudes’), I would get out a dictionary. Now, I type some approximation into M$ Word, and if it comes out with a squiggly red underline underneath, right-click on it in the hope that the spellchecker will suggest the right spelling...

    dave

      In regards to typos such as "teh/the" and "form/from" mentioned above I have noticed a pattern to my mistakes. I tend to make these mistakes with letters that are typed with different hands. For example my left hand will hit "t" and then "e" before my right hand hits "h". I am particularly horrible at hitting the letter "a" too soon. I have also noticed that I rarely hit letters early with my right hand.

      Strange that my left hand gets ahead of my right hand so often as I am a righty.


      mr greywolf
      ....and then there are the spellcheck typos, which produce perfectly spelled but completely incorrect and out of context words in texts that have been 'proofed' (drives me mad.) I'd also be interested to know how much the default "English - US" setting on most spellcheckers affect non-US English speakers' spelling, especially when autocorrect is on (I despise autocorrect) - use of "color" instead of "colour", "gray" instead of "grey" etc etc.

      gosh, this is rather OT, isn't it?

      'share and enjoy'
Re: OT (for now): Mis-spelling research
by johndageek (Hermit) on Oct 15, 2003 at 17:52 UTC
    Wonderful question! ++

    Your point number 3 is very important, as in what context is the misspelling occurring? Also, defining what a mis spelling is is important. e.g. For people through to many things four there friends too catch. vs. fuor pepole trhew too many things fro thier friends to catch. Another catch may be misspelled words such as "Suzeez Donut Shoppe"

    If you are looking for a way to sample spellings that you may have some control over, write a spider program to grab web pages, then spell check those pages. Presuming of course the pages you scan are written or proofed by the types of people you wish to test. Count misspellings.

    Great fun dageek

Re: OT (for now): Mis-spelling research
by Anonymous Monk on Oct 15, 2003 at 20:23 UTC
    Here is something interesting that I found. It may not be of any interest, but it does concern the human brain and the way that words are spelled:

    "Aoccdrnig to rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe."

    ~fluhmann

      Wadda yer no!


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        I wonder if that's where the guy that sent it to me pulled it from. Hhmmm....

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://299371]
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-24 20:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found