This past week, Axel Springer, the German media conglomerate that owns Politico and Business Insider, signed a “multiyear licensing deal” with OpenAI worth tens of millions of Euros. According to the company, the deal “will enrich users’ experience with ChatGPT by adding recent and authoritative content on a wide variety of topics,” in the form of “summaries of selected global news content.” Its stories will also be used to train OpenAI’s models.

It’s a superficially familiar arrangement. For more than a decade, social media firms and more conventional media companies have tested out dozens of permutations of the tech-media partnership, with mixed results. This isn’t the first deal between OpenAI and a media organization, either: the Associated Press partnered with the company earlier this year. The Axel Springer deal, however, is the most comprehensive of its kind and a template, probably, for more to come.

It’s easy to fold such deals into the prevailing narrative of AI dominance, as venerable publishers line up to partner with tech firms once again, despite what happened last time around, and the time before that, and the time before that — to imagine that the Axel Springers of the world have no choice but to throw in with companies poised to remake the world, with or without their cooperation. Which may be how they see it. At least they’re getting paid.

Really, though, as they become more common, partnerships like this should complicate this story of inevitability for companies like OpenAI, the world-beating firm that, before it takes over the economy, and before it must be stopped from taking over the entire world, must first, for some reason, pay for a big subscription to Politico Pro.

Over the past two years, the story of generative AI was told directly by powerful tools and dazzling demonstrations, exemplified by OpenAI’s ChatGPT and Dall-e: fluent chatbots; image and video and audio generators; and programming assistants. These tools tended to minimize or make invisible the data on which they were trained, implying creativity rather than mimicry. Similarly, given access to the internet, tools like ChatGPT perform interpretive functions on the public web; while they don’t exactly claim outside data as their own, they get a lot of value out of summarizing it for the user.

Now, as competing products from different firms seem to be converging on a similar set of basic capabilities — in other words, as small startups, open source projects, and tech giants alike start to close the basic performance gap with OpenAI, and simultaneously start to figure out what their users, customers, or potential customers actually find valuable — the subject of training data is back at the center of the conversation around AI. Alex Graveley, creator of GitHub CoPilot, the popular programming assistant based on OpenAI’s technology, sums up the shift:

2023 AI in a nutshell: It's all about the data.

Good performance is on the other side of seeing training data that looks *nearly exactly* like what you’re trying to do.

Generate, verify, retrain on what works - this is the blueprint. When you plateau, optimize generation. ♻️

This is a broadly helpful way to think about large language models (LLMs), in that it demystifies them while making it easier to think about the ways they might still be useful, or at least valuable in literal terms. It also lends some credence to the idea, asserted in a wide range of lawsuits filed against AI firms over the past year by authors, artists, musicians, and media companies, that tools like ChatGPT are engines for something like copying (if not legal infringement).

In other words, Axel Springer’s deal — like OpenAI’s deal with the AP, or its canceled deal with Twitter — enables OpenAI to train its models to create content in the range of contemporary styles contained within Axel Springer’s portfolio: straight news in German and English; magazine features; breaking news, investigative stories, and blog posts about pretty much anything; newsletters and plenty of insidery news about American politics. The next GPT model will be a little bit better at approximating such content. If OpenAI or its commercial clients have ambitions to automate the news — to, for example, feed an AI tool fresh reporting to contextualize, or wire stories to summarize — this deal probably makes that prospect a little more plausible. If executives at a company like Axel Springer imagine that it might one day want to use OpenAI software to reduce its labor costs — err, to increase productivity — then this is probably a step in that direction. (Don’t confuse this sort of thing for some sort of interesting interaction between “the news” and “technology” as such —it’s a deal signed by executives at two large companies, for executive reasons.)

Maybe this is worth a lot to OpenAI; also, the value in reducing the number of media organizations that might be inclined to sue you is also probably greater than zero. I suspect the real value of such a deal, though, is in ensuring that its future products don’t end up existing in a vacuum of their own creation.

Some of OpenAI’s biggest competitors, in addition to having access to massive amounts of training data produced by their own users, have access to real-time, or at least recently updated, information and media about the world and its customers: Google has much of the web via its search engine, not to mention Gmail, Docs, and YouTube; Meta has Instagram and Facebook; for what it’s worth, Elon Musk’s xAI has X.

OpenAI products, like their competitors, have access to the web, which means all sorts of relevant and up-to-date content, including news. But as AI firms become more powerful, businesses with websites — including news publishers — are becoming more deliberate about how they make their content available. In October, the BBC took moves to prevent OpenAI from crawling its content, joining the New York Times, CNN, and Reuters.

AI companies licensing data is a good thing. But we should call out the hypocrisy of licensing data from big media companies while claiming that training on huge amounts of other copyrighted work, often from smaller companies and individuals, is fair use.

It’s becoming common… https://t.co/DSMgQi6BvM

Another looming challenge is that widely available AI tools are themselves accelerating the decline of the open web on which they depend. Platforms where human users post quality public content for fun or profit — art communities, programming databases, blog networks, and forums — are dealing with a glut of low-quality AI-generated garbage produced by hustlers, scammers, and attention arbitrageurs. (The most notable example of such an effect is Google, which is simultaneously struggling to filter AI garbage out of its search results and testing out a feature that replaces top results with its own AI-generated summaries, threatening to destroy the economy built around Search, one which much of the news media is heavily dependent.) ChatGPT is a much better product if it can browse the web for you without hitting a paywall every five seconds; at the same time, it’s rapidly becoming part of the story of the web’s ongoing ecosystem collapse, and inspiring publishers to further limit access to their stories, not just to voracious AI firms, but to everyone.

OpenAI’s deals with publishers are a hedge against a scenario in which scraping becomes harder and more legally perilous, training material more expensive, and real-time data more scarce — a scenario in which paying ChatGPT users might ask about the news, and ChatGPT might not have access a credible recent source to link, summarize, or otherwise relay. (Also a scenario in which, by the way, OpenAI’s products are a major venue through which people keep up with the news). It’s in the same speculative universe — and imagines a similar (and familiar) division between types of labor and content — as this newsreel produced by a company called Channel 1, in which the roles of the news anchor are automated with realistic avatars that talk about clips of real, newsworthy footage filmed by … someone, somewhere:

See the highest quality AI footage in the world.

🤯 - Our generated anchors deliver stories that are informative, heartfelt and entertaining.

Watch the showcase episode of our upcoming news network now. pic.twitter.com/61TaG6Kix3

The Axel Springer deal, in other words, amounts to a fairly specific work of prediction about the challenges that OpenAI thinks it might face in the next couple of years, on its way to presumptive dominance, as well as the opportunities it sees for itself in news. But the deal also raises another question: If the web is to be harvested by companies that give back nothing but spam, and a company like Axel Springer is destined to be reduced to a wire service for an automated news aggregator — if, not unlike the social platform “partners” before it, OpenAI hopes to seize and automate the lucrative parts of the news distribution while leaving the expensive, difficult, and risky aspects of media production to its partners — shouldn’t big media companies be asking for a little more?

By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us.

QOSHE - What Do AI Companies Want With the Media? - John Herrman
menu_open
Columnists Actual . Favourites . Archive
We use cookies to provide some features and experiences in QOSHE

More information  .  Close
Aa Aa Aa
- A +

What Do AI Companies Want With the Media?

7 0
17.12.2023

This past week, Axel Springer, the German media conglomerate that owns Politico and Business Insider, signed a “multiyear licensing deal” with OpenAI worth tens of millions of Euros. According to the company, the deal “will enrich users’ experience with ChatGPT by adding recent and authoritative content on a wide variety of topics,” in the form of “summaries of selected global news content.” Its stories will also be used to train OpenAI’s models.

It’s a superficially familiar arrangement. For more than a decade, social media firms and more conventional media companies have tested out dozens of permutations of the tech-media partnership, with mixed results. This isn’t the first deal between OpenAI and a media organization, either: the Associated Press partnered with the company earlier this year. The Axel Springer deal, however, is the most comprehensive of its kind and a template, probably, for more to come.

It’s easy to fold such deals into the prevailing narrative of AI dominance, as venerable publishers line up to partner with tech firms once again, despite what happened last time around, and the time before that, and the time before that — to imagine that the Axel Springers of the world have no choice but to throw in with companies poised to remake the world, with or without their cooperation. Which may be how they see it. At least they’re getting paid.

Really, though, as they become more common, partnerships like this should complicate this story of inevitability for companies like OpenAI, the world-beating firm that, before it takes over the economy, and before it must be stopped from taking over the entire world, must first, for some reason, pay for a big subscription to Politico Pro.

Over the past two years, the story of generative AI was told directly by powerful tools and dazzling demonstrations, exemplified by OpenAI’s ChatGPT and Dall-e: fluent chatbots; image and video and audio generators; and programming assistants. These tools tended to minimize or make invisible the data on which they were trained, implying creativity rather than mimicry. Similarly, given access to the internet, tools like ChatGPT perform interpretive functions on the public web; while they don’t exactly claim outside data as their own, they get a lot of value out of summarizing it for the........

© Daily Intelligencer


Get it on Google Play