Re: Text Analysis Tools to compare Slinker and Stinker?

Replies are listed 'Best First'.
Re: Re: Text Analysis Tools to compare Slinker and Stinker? by Cody Pendant (Prior) on Jan 22, 2003 at 04:42 UTC
It's not that I don't appreciate the effort, but I'm going to have to ask people to stop trying to help me with the social and administrative aspects of my problem, really. I won't explain the rules of the community involved, that would be silly. But if we were convinced that the two people were the same, action would be taken, that's all you need to know. If a text-analysis tool proved that the two had very similar writing styles, on a level where it was 1000-to-one that it was coincidental, then that would be considered proof. But, having used the Fathom module, see above, I've got nothing conclusive, I'm afraid. It's a very useful tool but hasn't proven or disproven anything. There are fewer differences between two randomly-chosen posters than between Slinker and Stinker, it turns out. Another angle of attack on this problem, which I hadn't thought of before, is mis-spellings -- Slinker has spelt "happening" as "happenning" twice, but Stinker gets it right every time... -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]
Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by BigLug (Chaplain) on Jan 22, 2003 at 05:15 UTC
This is a great idea for such a problem as yours. Combining readability, tupples, fathom etc, with misspellings (or is it mispellings or missspellings or ...) I wonder how successful we could get a module for comparing two texts. I might take a look at that sometime in the next few weeks. I really think that misspellings might be a great key to comparing two texts. Judging from the above information, I'd have to guess that stinker != slinker. It would be unusually difficult to fix spellings just to get back into a web-community. (IMHO)	[reply]
Re: Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by John M. Dlugosz (Monsignor) on Jan 22, 2003 at 07:13 UTC
This came up a few years ago in another forum I've been a part of, on the general issue of recognising anonomous text not for a particular issue in the forum, after something like that happened in the news. I found I was able to write in a manner which neither person nor software was able to correctly match up with my reference material. About 20% of the people who tried had the same results. Others were matched and were often surprised by what tripped them up when we posted our guesses. Some people used the very tools under discussion to pre-check their work before posting the anonomous sample. Naturally, they showed non-match in the computer's guess. I furthermore used writing constructs that are among my pet peeves, and a simpler vocabulary (as measured by a reading-level tool), and tripped up the human guessers as well. I think keeping the reading "level" down helped the automatic scans too, since the simpler text has more in common with all text. BTW, most everyone who tried were successful (published, that is) writers. —John	[reply]
Re^4: Text Analysis Tools to compare Slinker and Stinker? by mojotoad (Monsignor) on Jan 22, 2003 at 06:57 UTC
'Misspellings' are precisely where Bayesian filtering, once trained, will help tremendously (though as others have pointed out, never conclusively). As an example from the anti-spam efforts, once Bayesian filtering was enabled they were amazed that single token with the highest probability of indicating spam was 'FF0000', the hex value for bright red. Unexpected, but damning. Consistently misspellt words could show up accordingly. Mattt	[reply]
Re: Re^4: Text Analysis Tools to compare Slinker and Stinker? by castaway (Parson) on Jan 22, 2003 at 08:23 UTC
Re: Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by Cody Pendant (Prior) on Jan 22, 2003 at 06:25 UTC
The only part of the process that I'm not confident about is the control. Say I compare Slinker and Stinker, and they have almost exactly the same average sentence length, FOG readability index and so on, how do I know I wouldn't get the same result comparing Slinker with Ernest Hemingway or Toni Morrison or Irvine Welch? You'd need to be able to say with confidence that if author A scores a 97% similarity score with author B, then you couldn't get the same result with author X. The mis-spellings ought to be quite easy to implement though: Author A makes the following mistakes every time. Author B makes the following mistakes every time. That would convince me... -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]
Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by pg (Canon) on Jan 22, 2003 at 05:18 UTC
I am not fighting against you (I am 100% sincere), but did you realize that, actually you are not trying to find a "good" tool, but trying to find a tool to "conclusively" satisfy your guess, and to convince your community members and yourself to "believe" something you already pre-determined. No good tool goes against your guess, would be a good tool in this situation. I am just telling the truth, although it might be difficult to ... ;-)	[reply]
Re: Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by Cody Pendant (Prior) on Jan 22, 2003 at 05:34 UTC
OK, as you won't give up, pg, here are the rules in question: Bad behaviour gets you a first warning If you don't improve after a second warning, you get a two-month suspension If you attempt to rejoin the community under another name while suspended, no matter how well you behave, you get banned I really think these are fair rules. And they're stated upfront. But no matter what rules we choose, the facts are this: We suspect someone of lying about who they are. When you suspect someone is lying, asking them "hey, are you lying?" is not a logical way to find out. Linguistic analysis is. And there are great Perl modules for it. You should be happy with the outcome anyway pg, because as far as I'm concerned, with the help of Perl, I'm now satisfied that these two people aren't the same. It's like one of those annoying lawyer shows where they prove the guy innocent. -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]
Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by cadfael (Friar) on Jan 23, 2003 at 03:05 UTC
But, having used the Fathom module, see above, I've got nothing conclusive, I'm afraid. It's a very useful tool but hasn't proven or disproven anything. There are fewer differences between two randomly-chosen posters than between Slinker and Stinker, it turns out. Another angle of attack on this problem, which I hadn't thought of before, is mis-spellings -- Slinker has spelt "happening" as "happenning" twice, but Stinker gets it right every time... Leaving alone the issue of whether it is really worth it to spend a lot of time on this mystery, testing services have dealt with some aspects of your problem. Especially the personality tests where they ask you the same question in many slightly different ways and perform some kind of analysis to determine whether you are trying to spoof the test by appearing to be someone you are not. Your mention of a spelling discrepency brought to mind a scene from The Princess Bride where Westley was to add poison to one of the drinks, and his adversary was to choose, after Westley had shifted (or not) the position of the glasses. The bad guy goes through a series of qustions and answers trying to figure out Westley's thoughts -- "You placed the poisoned glass closer to me so I'd choose it. But I'm too smart for that, so it must be the one closest to you... But you knew I'd anticipate that move, so it must be the one closest to me after all." And so on for a few minutes or pretty funny dialogue. (I'm sure I got the details turned around, but you get the gist) Is this guy deliberately mispelling a word or two just to throw you off? Does it really matter? It still boils down to a guess, doesn't it? Even after centuries of linguistic analysis, and lately with some fairly sophisticated computer analysis, scholars are still arguing whether Marlowe wrote the works attributed to Shakespeare, or whether Shakespeare was, indeed, Shakespeare. ----- "Computeri non cogitant, ergo non sunt"	[reply]
Re: Re: Re: Re: Text Analysis Tools to compare Slinker and Stinker? by Cody Pendant (Prior) on Jan 23, 2003 at 03:25 UTC
>a scene from The Princess Bride Man in black: (turning his back, and adding the poison to one of the goblets) Alright, where is the poison? The battle of wits has begun. It ends when you decide and we both drink - and find out who is right, and who is dead. Vizzini: But it's so simple. All I have to do is divine it from what I know of you. Are you the sort of man who would put the poison into his own goblet or his enemy's? Now, a clever man would put the poison into his own goblet because he would know that only a great fool would reach for what he was given. I am not a great fool so I can clearly not choose the wine in front of you...But you must have known I was not a great fool; you would have counted on it, so I can clearly not choose the wine in front of me. Man in black: You've made your decision then? Vizzini: (happily) Not remotely! Because Iocaine comes from Australia. As everyone knows, Australia is entirely peopled with criminals. And criminals are used to having people not trust them, as you are not trusted by me. So, I can clearly not choose the wine in front of you. Man in black: Truly, you have a dizzying intellect. Vizzini: Wait 'till I get going!! ...where was I? Man in black: Australia. Vizzini: Yes! Australia! And you must have suspected I would have known the powder's origin,so I can clearly not choose the wine in front of me. Man in black: You're just stalling now. Vizzini: You'd like to think that, wouldn't you! You've beaten my giant, which means you're exceptionally strong...so you could have put the poison in your own goblet trusting on your strength to save you, so I can clearly not choose the wine in front of you. But, you've also bested my Spaniard, which means you must have studied...and in studying you must have learned that man is mortal so you would have put the poison as far from yourself as possible, so I can clearly not choose the wine in front of me! Man in black: You're trying to trick me into giving away something. It won't work. Vizzini: It has worked! You've given everything away! I know where the poison is! Man in black: Then make your choice. Vizzini: I will, and I choose...(pointing behind the man in black) What in the world can that be? Man in black: (turning around, while Vizzini switches goblets) What?! Where?! I don't see anything. Vizzini: Oh, well, I...I could have sworn I saw something. No matter. (Vizzini laughs) Man in black: What's so funny? Vizzini: I...I'll tell you in a minute. First, lets drink, me from my glass and you from yours. (They both drink) is that the one you meant? I still maintain that this was in interesting exercise. I did one other thing, which was brute-force but also interesting. I grabbed every 2-char string from the posters, put them in a hash with number of occurrences, sorted the results by number, and compared the most popular 1,000 2-char strings from the suspect posters with the most popular 2-char strings from "real" posters. Again, the results were inconclusive. Slinker and Stinker shared 75% of the most-popular-strings, but another poster shared 68%, so it wasn't very dramatic evidence either way. -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]


No such thing as a small change
	PerlMonks