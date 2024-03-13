The Washington Independent used to be a proper online news outlet that employed full-time journalists before it folded more than a decade ago.

Now, it’s returned from the dead, with regular news coverage and articles by professional-sounding journalists who have introductory notes and contact numbers against their names.

However, as pointed out by Pulitzer Prize-winning journalist Spencer Ackerman – who worked for the Washington Independent during its first run – the reincarnated news platform only regurgitates recently-published, article-like content that does not appear to have been produced by human beings.

In other words, whoever runs the once highly-respected news website relies on generative artificial intelligence or GenAI to produce content that would otherwise be copyrightable had it been created by a human being.

Some of the “new” articles that the website is churning out look a lot like what actual journalists like Ackerman produced in the past.

“Now I have to worry that I have inadvertently trained my robotic replacement,” writes the Pulitzer Prize-winning journalist.

Ackerman isn’t some lone paranoid writer worried about the impending takeover of GenAI, a technology that can create text – as well as images, videos and music – in response to simple prompts.

He’s one of the journalists, nonfiction writers and newspaper publishers actively campaigning against the “unauthorised” use of GenAI, which, they say, poses an existential threat to their livelihoods.

Within the news industry, GenAI manifests itself in the form of chatbots, which are computer programmes that simulate real-life conversations with human beings.

These chatbots get trained on hundreds of millions of pre-existing articles on the internet and have become so efficient – in theory, at least – to compete against proper news outlets as a source of information.

The problem has gained such immediacy that The New York Times recently sued the companies that have created popular GenAI platforms like ChatGPT for misusing its copyrighted written works.

It claims that these companies used its articles to train their automated chatbots, which are now competing with the newspaper as a source of reliable information for the general public.

The issue at the root of the tussle between GenAI firms and content producers like The New York Times is whether using copyrighted material to train chatbots falls within or outside the scope of fair use.

“The question is whether AI systems can copy copyrighted material without compensation and for their own profit. We do not think the US copyright law allows it,” Justin A. Nelson, partner at law firm Susman Godfrey LLP, tells TRT World.

“Permission to use creative works is at the heart of any fair licensing system,” says Nelson, whose law firm has filed a class action suit on behalf of several nonfiction authors against OpenAI and Microsoft for allegedly training chatbots on their copyrighted materials without permission.

On their part, firms like Microsoft, which have invested heavily in GenAI firms such as ChatGPT and Copilot, assert that the doctrine of fair use permits taking advantage of copyrighted materials to train AI models.

In comments submitted before the director of the US Copyright Office of the Library of Congress late last year, the software company held that making an “intermediate copy” of a work to train an AI model is “completely different” from copying an expressive work to communicate the copyright holder’s original expression.

“Just as humans are permitted to learn from the ideas, concepts, and patterns in copyrighted materials, copyright law has for decades recognised that fair use principles allow intermediate copying and use of copyrighted materials for the purpose of learning and creating new, transformative works,” it says.

University of Texas School of Law Professor Oren Bracha agrees.

He says the argument that using copyrighted works for training chatbots is itself copyright infringement – irrespective of any similarity with the generated output – is wrong as a matter of both law and information policy.

As a matter of law, copies of copyrighted works made only for extracting information without exposing anyone to the expression of the works do not infringe, he tells TRT World.

“As a matter of information policy, if mere use in training were infringing, the burden on AI development would be very heavy. Copyright is everywhere, AI training must use many works, and licensing would be prohibitively expensive,” he adds.

However, The New York Times insists that many GenAI companies have lifted a “massive amount of content” from its website to create tools that mimic, closely paraphrase, and copy its work.

The newspaper says a “stunning 1.2 percent” of the dataset used to train OpenAI’s ChatGPT-2 was based on its content. The newspaper was also the fourth-largest source for Google’s C4 dataset, which powers GenAI products like Google’s Bard and its Generative Search Experience.