AI shouldn't pilfer news content, media backers tell Senate

Benjamin Weiss

WASHINGTON (CN) — A group of witnesses representing local and national media organizations urged lawmakers Wednesday to step in and prevent artificial intelligence companies from using copyrighted news content without proper credit or compensation.

Members of Congress have for months tried to get their arms around emerging AI technology, consulting experts and stakeholders in an effort to hammer out a plan for regulating the fast-growing industry.

The legislative charge, championed in the Senate by its Judiciary Committee, has shone a spotlight on some of the potential pitfalls of artificial intelligence left unchecked, such as the negative effects of generative AI — tools that are trained with content available on the internet — on intellectual property rights.

A group of media advocates echoed those concerns Wednesday, warning lawmakers during a hearing in the Judiciary Committee’s privacy and technology subpanel that generative AI poses a threat to the journalism industry, particularly at the local level.

AI companies scrape news content from the internet to train their generative language models, said Danielle Coffey, president and CEO of the newspaper trade association the News Media Alliance.

According to research commissioned by her organization, AI models can produce “summaries, excerpts and even full verbatim copies of articles written and fact-checked by human journalists” if prompted by a user, Coffey said.

Reproductions of news content generated by AI are often used without authorization and are sometimes scraped from behind an outlet's paywall, Coffey added.

“These outputs compete in the same market, with the same audience, and serve the same purpose as the original articles that feed the algorithms in the first place,” she said.

Curtis LeGeyt, president and CEO of the National Association of Broadcasters told the Judiciary Committee that broadcast journalists are equally affected by this trend.

“The use of broadcasters’ news content in AI models without authorization diminishes our audience’s trust and our reinvestment in local news,” LeGeyt said. “Broadcasters have already seen numerous examples where content created by our journalists has been ingested and regurgitated by AI bots with little or no attribution.”

AI companies have argued that training their models on copyrighted news material and other content is protected under the “fair use” provision of U.S. intellectual property law, which allows the reproduction of copyrighted work for purposes such as research, criticism or commentary.

OpenAI, which owns the generative AI tool known as ChatGPT, made such a contention in a November filing with the U.S. Copyright Office, writing that training AI models falls “squarely in line with established precedents recognizing that the use of copyrighted materials by technology innovators in transformative ways is entirely consistent with the law.”

The interpretation of the fair use statute is a focal point of a lawsuit against OpenAI in December by The New York Times, which claims ChatGPT users can obtain word-for-word reproductions of the Times’ reporting, sidestepping the paper’s subscription fee and running afoul of intellectual property rights. In a statement published Monday, OpenAI doubled down on its claim that scraping news content for its generative models falls under fair use and noted that it gives organizations the opportunity to opt-out.

Roger Lynch, CEO of media company Condé Nast, told the Senate panel Wednesday that the opt-out feature does little to alleviate copyright issues.

“They’ve already trained their models,” Lynch said. “The only thing the opt-outs will do is to prevent a new competitor from training new models to compete with them.”

Coffey argued the uses of news material by AI developers goes “far beyond the guardrails set by the courts.”

Congress needs to clarify copyright law to ensure that the use of publisher content for training AI models does not fall under fair use, he continued, forcing companies to negotiate licensing deals with news organizations.

At least one expert, though, said that he was skeptical about whether lawmakers should intervene on behalf of the media.

Jeff Jarvis, a professor at CUNY’s Graduate School of Journalism, said Congress should “promise first to protect the rights of speech and assembly that have been made possible by the internet.”

Jarvis positioned the proliferation of AI models as an opportunity for users to more actively engage with news content. Such opportunities, he said, are put at risk “if we fence off the open internet.”

Pushing back on arguments against fair use, Jarvis pointed out that journalists use the copyright provision every day in their work. Limiting free use, or forcing AI companies to license news content, could set precedents “that may affect journalists,” he said, “but will also affect small open-source efforts to compete with Big Tech companies.”

“Please base decisions that affect internet rights on rational proof of harms, not media’s moral panic,” Jarvis said.

Coffey, meanwhile, said news organizations are not opposed to using AI, arguing instead that the press should not be “disregarded at the expense of these new and exciting technologies.”

“We want to help developers realize their potential in a responsible way,” she said. “A constructive solution will benefit all interested parties and society at large and avoid protracted uncertainty in the courts.”

Filed Under: Artificial Intelligence, Montana Today, news, technology

Categories: Politics

No excuse for AI to pilfer news content, media tells Senate