Researchers have a plan to give the GPT-3 AI some "common sense"

Scientists are giving the best language AI the power of vision, allowing it to become even more indistinguishable from human speech.

Hope Corrigan

9 Nov 2020 — 2 min read

As anyone who’s awkwardly explained what the word ‘virgin’ means to a young child who turns out to be enquiring about the sticker on some olive oil can agree, context is very important.

Language is so intricate and twisted, and English is arguably one of the worst. Nonsensical from incorporating a myriad of sources over the course of human history. Engulfing jokes and sayings until their origin is cryptic and the words have changed from the thin water of the womb to the thickest of blood.

It’s constantly evolving in the way that only a lit AF living language can, regardless of how much sense those words may actually make.

AI can be tricky enough without dealing with human intricacies. Recently, this football watching AI focussed on a bald man's head for most of a match, while a driverless car drove straight into a wall.

This is why, despite being incredibly impressive, the human language generating GPT-3 AI can get simple questions wrong.

The GPT-3 or Generative Pre-trained Transformer 3 is the third generation model of an AI developed by OpenAI to use deep learning of human language to produce text that is as indistinguishable as possible from that of a real human.

It does a pretty good job, but as MIT Technology Review explains it can be tripped up because it lacks context, or the seemingly inaptly named human phenomena of common sense.

Little things like asking it the colour of a sheep can result in an equal chance of ‘black’ or ‘white’ being the answer. Our language has so many nods to black sheep that it’s no surprise the AI assumes both are likely answers without realising that those references specifically imply the rarity of the variation.

So researchers at the University of North Carolina have decided to give the damn thing sight.

It’s not as easy as jamming a current visual learning AI and text based together, because again there needs to be more context. Most visual learning data sets, like Microsoft Common Objects in Context, are only paired with a few words to distinguish objects, so the researchers had to come up with something better.

While still using MS COCO, researchers added a technique they call "vokenization" that scans for visual patterns which provide more context to images. Rather than simply an image of a sheep, it can get 3D information about the image telling us more about the sheep in context.

This visual information would likely help, showing the AI that black sheep are far less common with many more white sheep images. Furthermore the AI could see that sheep are often in fields, not jumping over fences or being counted by individuals desperate to sleep.

But of course, this won’t be enough. The AI can’t tell how bad sheep can smell, in a real and true nose wrinkling way. It seems there’ll always be something to add to the ever growing knowledge of unsupervised AI learning.

Big Ideas Technology AI

Hope Corrigan

Byteside Newsletter

Related Posts

Explorer: Juicy tech reads to catch you up on 2025 so far

Petbarn's first ever app has an AI to answer your pet dilemmas

Byteside gift guide 2024: fun, weird, wonderful, nerdy gift ideas