Re: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?)

In fact, the idea is craftily clever. Their stemmer and parser can only stem and parse simple sentences, so if it can't process the sentence with a sufficiently high certainty, they flag it as too complex :-)

I don't know what technology they use in the editor. Also, I quit academia almost ten years ago, so things might have moved a bit since I worked on similar stuff.

But generally, English is one of the easier languages to process. Its morphology is simple (almost no declension, simple conjugation) and the training data for statistical methods are huge.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Comment on Re: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) Download Code

Replies are listed 'Best First'.
Re^2: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by LanX (Saint) on Jun 14, 2021 at 15:20 UTC
> But generally, English is one of the easier languages to process. For stemmer! Sure! But lack of grammar makes context and interpretation harder... ... Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by choroba (Cardinal) on Jun 14, 2021 at 15:35 UTC
Ever tried saying it to an average English speaker? The lack of grammar keeps related phrases closer to each other which helps parsing a lot. For free word order languages, grammar seems to help, but due to homonymy (or homography) you usually don't have a solid foundation to base the grammar on. The most advanced system nowadays are based on Machine Learning, so there's no grammar involved at all, you just need large training data. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^4: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by erix (Prior) on Jun 14, 2021 at 16:30 UTC
I've noticed that, when a google-translate translation (EN->NL is most conspicuous to me) is ridiculously wrong, it's often fixed a few months later. I think (or at least hope) that the reason is that more data has been processed, i.e., more 'training data', or at the very least - better numbers/statistic decisions.	[reply]
Re^5: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by LanX (Saint) on Jun 14, 2021 at 16:49 UTC
Re^4: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by LanX (Saint) on Jun 14, 2021 at 16:26 UTC
> Ever tried saying it to an average English speaker? Sure, that's how I normally greet my friend John from Buffalo! ;-) ° > The lack of grammar keeps related phrases closer to each other which helps parsing a lot. It ain't necessarily so, try deciphering the headlines of newspapers like the Guardian ... Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) actually he is from Hamburg NY, but that's too confusing for the locals here ...	[reply]
Re^2: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by Bod (Parson) on Jun 14, 2021 at 23:39 UTC
I don't know what technology they use in the editor Javascript apparently... There is an explanation here	[reply]


There's more than one way to do things
	PerlMonks