Former OpenAI Researcher Slams Firm Over Use of Copyrighted Data
Artificial Intelligence (AI) is now rapidly transforming our digital landscape, primarily in the recent OpenAI that seeks to go beyond human-based tasks. The company has incurred criticism on its use of data, particularly towards copyrighted material on training its technologies. As a leading voice on this issue, former researcher at OpenAI Suchir Balaji went public criticizing the way the company uses internet data.
OpenAI's Balaji, citing ethical and legal concerns about the technologies he helped develop, left the organization in August 2023. Indeed, a year after his erstwhile employer released ChatGPT in 2022, it was reported that he had begun to question how a model such as this-one relying on copyrighted data, no less-could potentially damage the internet ecosystem.
In an interview to the New York Times, Balaji stated, "If you believe what I believe, you have to just leave the company." He further elaborated his concerns in an essay posted on his personal website, which analyses the ways in which copyrighted material from AI training datasets often finds its way into the outputs generated by models like ChatGPT. This judgment states that whatever ChatGPT produces, it is not fair use and heavily relies on copyrighted material, for which permission was not sought from the authors.
Balaji has then claimed that such usage of data threatens the livelihood of creators because such work is used without being compensated or even acknowledged. These criticisms led OpenAI to defend its practices, saying that it builds its AI models based on publicly available data in ways that it believes fall within the principles of fair use. They claim this is just fair to creators, necessary for innovation, and vital for the competitiveness of the U.S. in the global market.
As the debate continues about AI using copyrighted data, Balaji's exit and criticisms only underscore long-standing tensions within the tech industry around ethics, copyright, and creative work in the age of AI.