NYTimes vs. OpenAI: Copyright Challenges in Generative AI

NYTimes sues OpenAI for copyright infringement.

‍

Generative AI, as with other types of machine learning, requires training materials (large language models, labels, etc.) to learn. Use of these types of algorithms is becoming necessary to compete. However, you need to make sure that the algorithm you used is trained on available source material or your outputs could be subject to copyright infringement suits. Works of authorship are protected worldwide under copyright law.

‍

If content is available to train on, that does not necessarily mean the content is available for all purposes. We see a similar issue arise in the data privacy context where a data subject agrees to use data for one purpose, and then that data is shared to processors who use the same data for purposes not consented to. I expect to see interesting exhaustion arguments because the NYTimes appears to have been paid for some of the content. Were they paid enough to cover their copyright value?

Data Privacy