"OpenAI's Sora Creates Confusing Gymnastics Video: Here's What Happened"
A video from OpenAI's new Sora AI video generator became very popular on social media on Wednesday. It shows a gymnast who suddenly grows extra arms and legs and even loses her head for a moment during what looks like an Olympic floor routine.
These strange mistakes in the video, which we call "jabberwockies," give us insight into how AI video generators work and how they might improve in the future.
Before we explore this further, let’s watch the video. In it, we see a gymnast performing a floor routine. As she flips and twirls, new limbs appear and change shape quickly. At one point, about 9 seconds in, her head disappears and then reappears on her body.
Venture capitalist Deedy Das, who shared the video on X, said, "As cool as the new Sora is, gymnastics is still a big test for AI video." The video led to many funny reactions, including one comment on Bluesky that said, "Hi, gymnastics expert here! This isn’t funny; gymnasts only do this when they’re in serious trouble."
We contacted Das, and he confirmed that he created the video using Sora. He also shared that his prompt was long and divided into four parts, using detailed instructions like "The gymnast starts from the back right corner with her right foot pointed behind in B-plus stance."
Das mentioned, "For the past six months, I've seen that text-to-video models struggle with complex movements like gymnastics." He wanted to try Sora because he thought it had improved character consistency. He noted that while it was better than before—when gymnasts would suddenly teleport or change outfits mid-flip—it still looked quite strange. "We hoped AI video would automatically learn physics, but that hasn’t happened yet!"
So, what went wrong?
To understand the video’s failures, we need to look at how Sora creates a gymnastics routine. During its training phase, OpenAI fed the model example videos of gymnastics routines along with many other types of videos. This helped the AI learn to connect images with text descriptions.
This training happens only once before the model is released. When you give Sora a written prompt later, it uses statistical connections between words and images to create a video. It predicts what comes next based on the last frame. However, Sora also has a method to keep things consistent over time. According to OpenAI's Sora System Card, "By giving the model foresight of many frames at a time," they aimed to ensure that a subject stays the same even if it goes out of view for a moment.
But it seems this problem isn’t fully solved yet. The fast-moving limbs in the video make it hard for Sora to predict the next frame accurately. The result is a confusing mix of gymnastics footage where the same gymnast performs running flips and spins. However, Sora struggles to put them in the right order because it relies on average movements from its limited training data, which likely didn’t include detailed information about limb movements.