The Flaw That Could Ruin Generative AI

Alex Reisner

The Atlantic

0
11.01.2024

A technical problem known as “memorization” is at the heart of recent lawsuits that pose a significant threat to generative-AI companies.

Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars, depends on creative work owned by other people.

We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.

Read: These 183,000 books are fueling the biggest fight in publishing and tech

Two lawsuits, filed by the Universal Music Group and The New York Times in October and December, respectively, make use of the fact that large language models—the technology underpinning ChatGPT and other generative-AI tools—can “memorize” some portion of their training text and reproduce it verbatim when prompted in specific ways, emitting long sections of copyrighted texts. This damages the fair-use argument.

If the AI companies need to compensate the millions of authors whose work they’re using, that could “kill or significantly hamper” the entire technology, according to a filing with the U.S. Copyright Office from the major venture-capital firm Andreessen Horowitz, which has a number of significant investments in generative AI. Current models might have to be scrapped and new ones trained on open or properly licensed sources. The cost could be significant, and the new models might be less fluent.

Yet, although it would set generative AI back in the short term, a responsible rebuild could also improve the technology’s standing in the eyes of many whose work has been used without permission, and who hear the promise of AI that “benefits all of humanity” as mere self-serving cant. A moment of reckoning approaches for one of the most disruptive technologies in history.

Even before these filings, generative AI was mired in legal battles. Last year, authors including John Grisham, George Saunders, and Sarah Silverman filed several class-action lawsuits against AI companies. Training AI using their books, they claim, is a form of illegal copying. The tech companies have long argued that training is fair use, similar to printing quotations from........

© The Atlantic

visit website

Categories

Sources

Popular

The Flaw That Could Ruin Generative AI

Alex Reisner

© The Atlantic