OpenAI Accused of Data Theft in California

June 29, 2023

The AI community has been shaken by a class action lawsuit, launched in the Northern District of California on the 28th June, that alleges OpenAI, the maker of ChatGPT has breached copyright laws by training its AI using private content without consent.

OpenAI Accused of Data Theft in California.

AI technology allows users to input a few words or phrases and generate output with a level of sophistication that is startlingly similar to human language. ChatGPT is currently the most successful attempt to code human language simulation, and companies such as Microsoft and Adobe are tapping into its potential to refine their products.

However, ChatGPT scrapes the web to teach itself, examining content written by humans, and attempting to define logical rules that will allow it to regurgitate the text in a fresh format.

The California lawsuit alleges: violation of the Communications Privacy act; violation of the Computer Fraud and Abuse act; violation of the California Invasion of Privacy act; violation of the California Unfair Competition Law, Business and Professions code; violation of Illinois’s Biometric Information Privacy act; violation of Illinois’s Consumer Fraud and Deceptive Business Practices act; negligence; invasion of privacy; intrusion upon seclusion; larceny/receipt of stolen property; conversion; unjust enrichment; failure to warn; and violation of New York General Business law.

At the heart of the lawsuit is the question of whether OpenAI is entitled to make a profit from other people’s work product — a question that was entirely moot before OpenAI transitioned into a for-profit company.

Google has faced similar claims that its search model is dependent on republishing other people’s copyrighted content. Part of Google’s defence is that a robots.txt file can request that a site is not indexed. No such flag currently exists for AI training bots.

Copyright and Machine Learning is a grey area because the technology is far out-pacing legislation. Experts have long-argued that the use of web scraping to train AI is a theoretical violation of copyright. However is seems impractical to enforce any kind of compensation for authors of blogs, social media posts, and private messages whose copyright is allegedly violated.

An additional level of legal complication arises if ChatGPT (or any other AI service) is used to create commercial material. Does the alleged breach of copyright rest solely with OpenAI, or does it extend to anyone using the service?

Anyone who thinks that courts will not find against big tech needs only look at the battles over privacy, and the transformative legislation that has made its way onto statute books as a result.

Regardless of the outcome of this legal action, it seems inevitable that it will not be the last attempt to place legal restrictions on the industry.

Ben Moss

Ben Moss has designed and coded work for award-winning startups, and global names including IBM, UBS, and the FBI. When he’s not in front of a screen he’s probably out trail-running.