The MIT-IBM Watson Lab and Harvard University's Natural Language Processing unit have come up with GLTR (Glitter), a tool for spotting automatically-generated text.
It works through using the same kind of algorithms used to make text, to spot text made using those algorithms too. Specifically, GLTR uses the GPT-2 language model, which is often used to generate text almost indecipherable from a human-generated equivalent. Indeed, we tried it ourselves in a recent article.
Because the generation algorithm is understood, GLTR does a pretty good job of figuring out as to whether the next word has been written by human or computer. It works by a rather more complicated version of proximity analysis, and scores words in that way. Words in green or yellow denote a high likelihood of autogeneration.
(Moving company Tucson http://www.acodeza.com/2017/06/)
GLTR offers a demonstration, and to their credit, the team at the Lab clearly state that this is in its early stages. Rather than paste an existing example, which any decent AI engine would spot as being autogenerated, we used a website to create new text from its own version of GPT-2. This website had a user manual, some of which we stuffed into GPT-2 to produce a new paragraph, which we then put into GLTR.
The result, as is evident in this article's header image, is that GLTR has done a good job at identifying a fake article.
Of course, automated text generation will start to become generally acceptable in years to come. Media literacy will require the inclusion of something approaching a Turing test, where we will need to understand as to whether an article is written by a machine or not – and, indeed, if the same machine has invented the reporter's name, scraped a public domain image for the reporter's headshot, and indeed generated the images themselves too. If we don't, then society will find it hard to nurse itself out of its current malaise in understanding what "fake news" really means.
(This article contains paid placement links.)