As we’ve discussed here in the past, the latest generation of Artificial Intelligence (AI) systems employ massive libraries of published material to “train” their bots and teach them to communicate quickly in a natural-sounding fashion. But where are the developers finding all of that material? Well… pretty much everywhere. They draw on research papers, journal articles, newspapers, and even books. But that could turn into a sticky and potentially costly problem for AI developers, particularly when it comes to books and novels. On Friday, computer chip maker Nvidia found that out the hard way when three authors filed a class action lawsuit against them in a California federal court. The authors claim that their novels were fed into the NeMo AI system without their permission and in violation of their copyrights. The rest of the industry will no doubt be watching this case closely because it could make this entire approach to AI development impractical. (Reuters)
Nvidia, whose chips power artificial intelligence, has been sued by three authors who said it used their copyrighted books without permission to train its NeMo, AI platform.
Brian Keene, Abdi Nazemian, and Stewart O’Nan said their works were part of a dataset of about 196,640 books that helped train NeMo to simulate ordinary written language, before being taken down in October “due to reported copyright infringement.”
In a proposed class action filed on Friday night in San Francisco federal court, the authors said the takedown reflects Nvidia’s having “admitted” it trained NeMo on the dataset, and thereby infringed their copyrights.
The tech industry has previously argued that AI systems are incapable of committing copyright infringement because they are not thinking entities capable of theft. They are simply running code the way they were programmed. Also, they are not legal entities and cannot “profit” from the text they create.
Many legal analysts disagree. Findlaw states that unless an AI company has been granted permission to use published material, the author or copyright holder may be able to file for copyright infringement in federal court. The New York Times has already sued OpenAI, the makers of ChatGPT on the same grounds. Axios recently argued that copyright law will be AI’s 2024 battlefield.
In order for these plaintiffs to prevail, however, they will have to be able to demonstrate that the AI developers and their bots are going past the boundaries of the Fair Use Doctrine as defined by the U.S. Copyright Office. The doctrine states that it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. That’s the reason that I can get away with using a few paragraphs from a Reuters article as I did above.
We’ve seen few instances where AI chatbots have lifted lengthy portions of someone else’s work and directly pasted them into the responses they generate. The systems tend to pick and choose pieces from multiple sources, seemingly almost at random, and then weave them together in their own “voice.” If the use isn’t excessive, that might sneak through under the “limited portions” clause of the doctrine. But when a chatbot is answering questions, it is ostensibly providing information, not offering commentary or criticism. And unless someone (human) is publishing the output, it’s really not “news reporting.” Even then, it seems that the person publishing it would be the target of the lawsuit, not the AI developer. I suppose the responses might qualify as “scholarly reports,” but a court would need to make that call. People ask AI systems a lot of strange and bizarre things.
Also unaddressed here is the question of attribution. When you are intentionally using someone else’s work, depending on the application, you should typically at least note the original source and in some cases, provide footnotes and credit the source. ChatGPT and these other applications almost never do any of that in my experience. The bot frequently just gives you the answer as if it thought it up by itself. If you ask it to cite sources, it generally will, but how many people typically do that?
If even one of these lawsuits is successful and survives the inevitable appeals that would follow, the entire AI industry could be on the brink. This lawsuit against Nvidia was filed as a class action case because the three authors are not simply looking for compensation for themselves. They are suing on behalf of every copyright holder whose work has been used in this fashion. They could bankrupt the entire industry in a very short amount of time. Further, those Large Language Models appear to be the only way that the AI developers can make their systems perform as they do currently. If that tool is removed, they could be back to the drawing board. And for at least some people fretting over the future of mankind under our coming robot overlords, perhaps that wouldn’t be the worst thing in the world.
Read the full article here