Tag Archives: machine editing

Nice and accurate

2 Apr

“Completely bizarre Daily Mail article,” writes Neil Gaiman on Twitter, “possibly written by something not human, like an elk.”* And you can see what he means, although a competent elk would probably have made a better job of the first par/second par transition than this:

The article is, indeed, so odd that some people on Gaiman’s timeline wondered if it was written by a something like a bot (there’s only a ‘Daily Mail Reporter’ byline). “Also joining the 48-year-old was co-stars Michael Sheen and David Tennant.” (“Was”?) “Aisha fit right in with the Austin scene in a Rock & Roll T-shirt, blazer, jeans and dancer-like shoes.” (“Dancer-like”?) “The writer opted for a casual look in a black T-shirt and jeans, but attempted to dress up his ensemble with a blazer and dress shoes.” (“Attempted to”? Ouch.)

Then there’s the fact that the article describes the event as a premiere, when it was nothing of the kind: just a panel discussion. Then there’s the fact that it says the book was “co-written by Neil and the late Sir Terry in 1990” (Sir Terry who?) to “poke fun of” the Bible. Then there’s what Gaiman says is his favourite sentence in the piece: “Good Omens is based on a fictional book of the same name.” (Except that the article gives the title as “Good Omens: the Nice and Accurate”, which is longer than the title of the TV series and shorter than the full title of the book, “Good Omens: the Nice and Accurate Prophecies of Agnes Nutter, Witch”).

And yet, when you look at the numerous photographs that accompany the article – in the best Mail tradition, eight of them in an article that barely musters 275 words – you do sense the presence of human intervention. Understandably in the circumstances, somebody has cut and pasted sentences wholesale from the text into the captions and prefaced them with the kind of slightly desperate, added-value kickers (“Plot thickens”, “Men of the hour”) familiar to any sub-editor who has ever had to pull together a picture story based on no information.

At least, I assume no bot yet devised is capable of noticing the non-rhyming alliteration of “Dapper Draper”, knowing where the boundary is between “casual” and “smart”, or creating that air of teeth-gritted conscientiousness as yet another photo of the same two people gets inserted into the end of the piece, requiring yet another caption. But who knows? If the alarming AI text generator GPT2 can imitate a columnist, it can presumably learn to think like a sub-editor. (Although if it has, why didn’t it call them the “O-men of the hour”? I mean, come on!)

*As noticed and passed on by Ten Minutes Past Deadline’s ever-alert Memphis office

Film editors

14 Apr

Another submission to IMDb, another response it would be fair to characterise as “robust”:

Screen Shot 2015-03-31 at 22.51.51

We first encountered the robot editors of the Internet Movie Database last year, attempting to get an episode summary past its stern battery of automatic parsers. Recently, though, another artificial writing assistant, Grammarly, has come to prominence following a high-profile marketing campaign in which the company attempted to grammar-check EL James’s Fifty Shades Of Grey when the film of the book premiered. It ran James’s text and that of several famous historical authors through the system, and presented its findings in a lively press release.

Grammarly is a hugely ambitious undertaking: an algorithm that attempts to read and parse text like a human editor, and check spellings and punctuation in context. Unfortunately, the marketing push didn’t go as well as might have been hoped. As was widely observed, notably by Jonathon Owen at Arrant Pedantry, the worked examples contained several infelicities or mistakes, including some questionable overpunctuating and a suggestion that The Tempest be corrected to “We are such stuff on which dreams are made on“. As Arrant Pedantry concluded, many of the things Grammarly found in its press release weren’t errors and, where it intervened, “the suggested fixes always worsen the writing”.

The IMDb parser is a much less ambitious undertaking. It doesn’t work on free-form text or purport to “read”: instead, it controls inputs tightly by using step-by-step data entry. And yet somehow, it feels so much more like being edited.

Above all, it’s the tone. As ever, the rejection notice at the top is brisk but not wholly discouraging, like a copy editor intercepting a reporter with a question. Then there’s the fact-checking and the resultant queries, and the automatic corrections for house style (surname, first name), done without a song and dance. Then there’s the institutional memory and the ever-so-slight weariness that goes with it: there are 3,304 attributes like this already – are you sure you want to create a new one? Then there’s the encouragement not to touch the type: “If you don’t understand how the ordering should be formatted, please leave it blank.” And, as we saw last time, it enforces word counts ruthlessly and threatens to reassign material elsewhere if it’s not cut to fit.

Maybe Grammarly wouldn’t have stumbled over the cap ‘W’ in “Written” and asked for guidance; IMDb doesn’t do line-by-line context and only really spellchecks the proper nouns already in its database. But this is big-picture, organisational editing for accuracy and factual consistency. Rather than entering the murky, and often highly debatable, world of comma use and the passive voice, it just aims to get things cross-referred and reliable.

In fact, quite a lot of real editors’ work is like the kind of  “database editing” IMDb does. Is that how we normally spell it? Haven’t I read that paragraph somewhere else? Someone else has written a piece about this: what does that say? The parser may not be able to write a headline, but it can certainly keep control of a multi-contributor encyclopaedia.

Assertive,  detail-oriented, unbending about style, weary but polite – and, as we see from the “override” tickboxes, stoical about the possibility of being ignored: doesn’t that sound just like an editor? These robots are getting more lifelike by the day.

The robots are coming

8 May

I wish I had the nerve to talk like this to the newsdesk:

Screen Shot 2014-05-07 at 10.11.15

If you’ve ever tried submitting anything to the Internet Movie Database, you may recognise this tone. IMDb is a wiki – that is, an aggregation of user contributions – but it has achieved the status of  a semi-official reference tool at the Tribune, much more so than Wikipedia ever will. And I think that may be because of its fearsome army of robot editors, which intercept and scan everything you submit, and more often than not sling it back like Jason Robards growling “You haven’t got it” to Redford and Hoffman.

No diffident pencilled queries in the margin for IMDb: for example, if you have a couple of pieces of casting information you want to add to a TV show, you’d better have chapter and verse to hand.

Screen Shot 2014-05-06 at 20.56.06

So you say this person was in the show? Here are a list of actors with similar names: it’s easy to get confused. If you’re uncertain, click here and we’ll sort it out for you. Or perhaps you’d just like to give up the whole idea? Choose an option, please. (And by the way, you formatted the request wrongly. It has already been corrected: this is merely a notification.)

That’s the spirit. And if you submit anything as ambitious as a three-line episode summary, you get pulled apart like a rookie screenwriter at a pitch meeting:

Screen Shot 2014-05-07 at 10.09.24

There are misspellings. You have written too much: if you insist on overfiling, we will simply move your piece to a different slot inside the site (delicious). And, my favourite bit of all:

“The following fixes have been applied automatically: ‘…’ has been replaced with ‘.’ in accordance with IMDb rules.”

No judicious exceptions, no stretching a point. Ellipses are just banned, rather like the way all semicolons were excised for some years on the Tribune’s sport section. It’s a rule. And I suspect that “surveilling”, even if spelt correctly, will turn out  to be “not in the dictionary”. I’ll just change it now. They won’t like it.

For the first time in my life, I feel like a writer.