This week, OpenAI shared a series of videos it says were produced with a new tool called Sora. Like Dall-e, OpenAI’s image-generation software, Sora can respond to prompts written in plain language.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

Here’s a trailer for a movie that doesn’t exist:

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB

Most of the major AI firms have been working on text-to-video and video-editing tools for years, and a few — mostly smaller start-ups — have released software that people can use right now. The latest version of Runway, for example, probably represented the state of the art up until yesterday, and is capable of producing short clips from text prompts:

Generate any world you can imagine with @runwayml.
Made completely with Text-to-Video. pic.twitter.com/zxkIQ7hDko

Sora — which, again, is not yet available to the public, so keep in mind we’re looking at media released by OpenAI — seems like a pretty major jump forward, in terms of realism, prompt interpretation, video length, and versatility.

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq

As with a lot of AI-generated media, it isn’t hard to find surreal flaws:

ok, this one is the most interesting to me. can somebody explain why the chair is not acting like a plastic chair would? Specifically, how is Sora "modelling" this, either incorrectly or correctly? pic.twitter.com/UY15ldb0zX

But what’s notable about these videos isn’t just how they look at first glance — given the current state of AI image generation, it wasn’t hard to imagine that AI videos would soon follow — but how they move, and how objects within them seem to interact with the world around them. AI image generators have tended to be pretty good at rendering visual approximations of their subjects, but lack context to an extent that their outputs can sometimes become absurd: physically impossible architecture; hands with way too many fingers; bicycles that don’t make mechanical sense. Video generators extend these shortcomings into motion, producing impressionistic videos that look as though they were animated by someone without basic spatial awareness or an intuitive sense of physics. So far, they’ve been pretty adept at rendering environments and scenes but pretty poor at representing motion, object permanence, and mechanical systems. OpenAI claims it’s figured out a way to begin to deal with this:

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

OpenAI provides a bunch of examples of what this means in its research post on Sora. Some of the videos suggest a rudimentary sense of space and physics. In some cases, clothing moves and wrinkles with the person wearing it; in others, objects and figures cast shadows. Many, like the floating-chair video above, or a GoPro-style clip of a mountain biker going down a trail — a situation in which many bodies and objects are interacting in very specific and strange ways — are far less convincing, and all remain uncanny in motion (slow motion seems to be an important trick for conveying realism in these demos, which makes sense).

OpenAI also demonstrated a few types of video editing — which has some fairly obvious commercial applications right out of the gate:

OpenAI just dropped their Sora research paper.

As expected, the video-to-video results are flipping spectacular 🪄

A few other gems: pic.twitter.com/MiRe2IYkcI

This is, again, a selective demo. We’re told a bit about how Sora works. We don’t know much about how (or on what) it was trained, or where that training data came from. We don’t know what it might look like as a final product, or how much it might cost to use. Nor do we have a great sense of what people will want from such tools: More than a year after AI-generated static imagery became available to anyone who wants it, its role in the world remains largely undetermined (advertisers love it, of course, but so do revenge pornographers). OpenAI says it’s “red-teaming” the tech right now, suggesting that, like Dall-e’s image-generation tools, Sora’s will have a number of boundaries to prevent misuse — whatever that might mean — and to minimize bad PR. Some folks are worried about what this means for, say, the film industry, while others imagine that passable text-to-video content is potentially disastrous in the context of, say, an election in an already degraded information environment.

AI influencer and professor Ethan Mollick makes a good point, though, about how quickly these things become unremarkable:

I have had a bunch of requests to discuss Sora, looks amazing, but it has not been released, so I can't give any deep impressions.

However, I do remember when DALL-E3 was being demoed by OpenAI it was mind-blowing, but by the time they released it there were a dozen competitors. https://t.co/NYgjxeQLBP

In terms of model output, OpenAI remains stubbornly ahead of other companies working on this stuff. But while these videos are a controlled demo of a single company’s product, it’s reasonable to assume that within a pretty short time, other companies — and open-source projects, or governments — will be able to achieve similar results, with their own priorities in mind.

By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us.

QOSHE - OpenAI’s New Product, Sora, Is Already Producing Wild Videos - John Herrman
menu_open
Columnists Actual . Favourites . Archive
We use cookies to provide some features and experiences in QOSHE

More information  .  Close
Aa Aa Aa
- A +

OpenAI’s New Product, Sora, Is Already Producing Wild Videos

5 1
17.02.2024

This week, OpenAI shared a series of videos it says were produced with a new tool called Sora. Like Dall-e, OpenAI’s image-generation software, Sora can respond to prompts written in plain language.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

Here’s a trailer for a movie that doesn’t exist:

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB

Most of the major AI firms have been working on text-to-video and video-editing tools for years, and a few — mostly smaller start-ups — have released software that people can use right now. The latest version of Runway, for example, probably represented the state of the art up until yesterday, and is capable of producing short clips from text prompts:

Generate any world you can imagine with @runwayml.
Made completely with Text-to-Video. pic.twitter.com/zxkIQ7hDko

Sora — which, again, is not yet available to the public, so keep in mind we’re looking at media released by OpenAI — seems like a pretty major jump forward, in terms of realism, prompt interpretation, video length, and versatility.

Prompt: “A stylish woman walks down a Tokyo street........

© Daily Intelligencer


Get it on Google Play