Copyright Law and Generative AI: What a mess
If composer Burt Bacharach was alive and working for the generative artificial intelligence industry, he might have a song called “Lawsuits Are Falling on My Head.” Or more likely, his estate would sue generative AI companies, accusing them of copyright violation for using his original material to help train their algorithms.
In the last month, several authors, artists and programmers have sued OpenAI, Meta, Microsoft and other companies utilizing generative AI and large language models, such as ChatGPT.
For instance, the New York Times’ bestselling authors Sarah Silverman, Chris Golden and Richard Kadrey filed a class action complaint against OpenAI and Meta claiming copyright violation.
Their attorneys, the Joseph Saveri Law Firm and independent lawyer Matthew Butterick, are also representing novelists Paul Tremblay and Mona Awad in another suit, as well as a group of illustrators and artists suing Stability AI, Midjourney and DeviantArt for copyright infringement of visual works.
They’re also involved in a suit for programmers who are claiming copyright violation when Microsoft and OpenAI scanned their code that they publicly posed on the popular software development site GitHub. Meta declined to comment, as did Stability AI. Others did not respond to requests for comment.
Their argument goes to the heart of how companies test and refine their algorithms.
Companies feed vast amounts of source material to software, whether it’s a large language model or an image system such as DALL-E or any of the other variations coming to market in a commercial race. Programs break down the examples into parts, study how they fit together, and then build sophisticated statistical models to determine, in a given context, which parts like to work together. What is stored are individual pieces or snippets or techniques with pointers to the part most likely to follow.
Users provide text prompts for the type of output that they seek. The software then follows chains of connections to create answers.
There are three obvious parts: the input, the algorithms and the output. On the input and algorithmic sides, many creators and corporate producers correctly say the training process—unless all they use is in the public domain—makes use of copyrighted material.
“A lot of these [copyright] issues have existed for some time,” says Mauricio Uribe, a partner and chair of the software and IT practice at law firm Knobbe Martens in its Seattle office, who adds that “it’s coming to the forefront of communications because of the availability [of generative AI].”
Material in
Shubha Ghosh, director of the Syracuse Intellectual Property Law Institute and a professor at the Syracuse University College of Law in New York, thinks that argument won’t go far.
“The difficulty might stem from showing voluntary copying as supposed to machine copying, which arguably is not actionable,” he says. “For example, copying into memory is not copyright infringement.”
At the same time, Uribe thinks that the software could potentially be a derivative work, one of the many areas protected by copyright, especially if there is enough content to reproduce a significant portion of an original copyrighted work.
Even if the model is considered a derivative work, that doesn’t preclude fair use.
“Intermediate copying, that is to say copying as a step to producing a new work, has been found to be fair use, for example, in the creation of a new platform or an emulator for video games,” Ghosh says.
“Artists say that it’s like a collage tool. We’re not storing the original works,” says Chris Callison-Burch, an associate professor of computer and information science at the University of Pennsylvania and researcher into large language models. “They get set aside.”
And material out
The output side can be another story because original materials can in theory be reconstructed.
“Retrieving substantially similar chunks is very rare,” Callison-Burch says, adding that works make “tiny, vanishingly small” contributions to the results, such as individual atoms in a large and complex molecule. But it is possible.
Proving that has happened is difficult if a would-be plaintiff doesn’t have access to the training data that would show inclusion. There are ways to try tricking the software into giving something away.
Callison-Burch offers one approach of inputting a sample from a book with a randomly chosen character, blanking out the name, and seeing whether the software can fill in the blank. It’s also impossible to assume that all such systems work identically, complicating the identification of a potential harm.
Rights and risks
“To go back in time, there was at one point a question of whether a photograph, something created with a machine, was something you could create copyright in,” says Samuel Lewis, co-chair of Cozen O’Connor’s copyright practice in its Miami office and also the chair of the technology committee.
“The question fundamentally becomes what of that is protectable, what of that can be registered?” Lewis says.
When it comes to broader questions of rights in the output, the recent U.S. Supreme Court decision in Andy Warhol Foundation for the Visual Arts Inc. v. Goldsmith has the potential to shake up tentative answers that people have.
“By my read of that Supreme Court decision, the majority opinion was saying fair use cannot usurp the normal derivative work in the normal sense,” Uribe says.
In other words, fair use doesn’t necessarily trump the rights and interests that copyright owners have in using one work to serve as the foundation for others—and to make money from that.
That creates at least the potential of liability.
“If at the end of the day, it’s something that looks like it’s been directly copied or looks really close, then it’s a real problem, especially if used commercially,” says Haverly MacArthur, a partner in the intellectual property practice at Adams and Reese in its Nashville, Tennessee, office.
MacArthur points to proposed law in the European Union called the Artificial Intelligence Act. One proposed rule demands that AI systems disclose copyrighted material that’s generated from a prompt, which could get complicated given how the software generates output.
But there are two other issues to consider. One is that if software cannot hold a copyright interest, a company might find that swaths of its marketing, instructional and other materials are no longer proprietary and have no copyright protection, making them open to use by competitors.
The other is the question of who holds liability if it exists. Some vendors of generative AI, such as OpenAI, which didn’t reply to multiple requests for an interview, require full indemnification in their terms of use. That isn’t limited to infringement, Lewis says.
“You could ask the AI to write up an article about a given person, and it may come back with something that is defamatory. If OpenAI is sued, it’s going to look at the customer who asked for the article to be prepared and expect them to indemnify and defend them.”
“If you look at the terms, they’re not looking at us as customers; they’re looking at us as the product,” Lewis adds.
Erik Sherman is a Massachusetts writer whose work has appeared in Fortune, the Technology Review and the Wall Street Journal.
See also:
“Law firms moving quickly on AI weigh benefits with risks and unknowns”