In November 2025, OpenAI issued a statement on its official website, naming the New York Times' evidence collection request in the copyright lawsuit as "crossing the line", saying that the other party wanted to obtain about 20 million ChatGPT user conversations to find out whether users used the model to bypass the New York Times paywall and copy the content of the report. OpenAI emphasized that these chats contain highly sensitive content such as passwords, payment information, health issues, and emotional distress, and that any large-scale transfer to a third-party team of lawyers conflicts with the platform's commitment to user privacy, so the company will do its utmost to block this request in court.
The dispute stems from a copyright lawsuit filed by the New York Times in late 2023, with core allegations that OpenAI and Microsoft exploited Times content to train models without authorization, resulting in some outputs that closely resemble the original. As the lawsuit progressed, the focus gradually shifted from whether the training data was legal to "how and to what extent the evidence can be obtained". Some courts have ruled that limited access to some of the conversation logs for evidence collection is discussable under strict confidentiality orders and de-identification measures, and the New York Times claims that it will not use this data to identify specific users. OpenAI emphasized that even if names and accounts are removed, the content itself may be enough to expose personal identity and privacy, so it asked the court to be more restrained in balancing copyright claims with user data security.
In this context, OpenAI has received broader evidence preservation orders in the past, being required to suspend the deletion of relevant chat records as usual and keep them centrally, and then the company reduced its obligation to legal retention of data for a specific period of time through appeals and negotiations, and promised not to use it for training or product improvement. In the future, how the court delineates the scope of disclosure of chat records will not only affect the outcome of this case, but also provide a demonstration boundary for how the entire platform AI service trades off log retention, privacy protection, and litigation evidence collection.
FAQsQ
: Why is the New York Times asking OpenAI for 20 million chat logs?
A: The New York Times wants to find evidence in these ChatGPT conversations that users have used the model to restore or reconstruct the Times' paid content, thereby supporting its claim that "the model reproduces copyrighted works in large numbers," which is an evidence discovery strategy in copyright litigation.
Q: What risks does OpenAI consider this forensic request?
A: OpenAI believes that even if the account information and name are deleted, the chat content itself contains details such as illness, work, family, finances, etc., which is enough for the parties to be indirectly identified, and the large-scale transfer of this data to the opposing legal team will pose serious privacy risks, so it calls this an "intrusion" on user privacy.
Q: What is the court's current attitude towards chat records?
A: On the one hand, the court issued an evidence preservation order requiring OpenAI to suspend the deletion of relevant logs, and on the other hand, in subsequent rulings, it only allowed limited evidence collection within the framework of the protection order, and did not directly support the New York Times to obtain all the requested data.
Q: Will the ChatGPT conversations of ordinary users be saved for a long time?
A: OpenAI's public statement is that under normal circumstances, after a user deletes a conversation, the relevant content will be removed from the system within a certain period of time and will no longer be used for training. However, during the New York Times lawsuit, some of the time period was subject to a court order and needed to be kept in a legal hold system until the end of the proceedings. Enterprise and users with zero data retention agreements are generally not covered by this dispute.
Q: What are the potential implications of this case for the AI industry as a whole?
A: The outcome of the case is not only related to whether the news content can be regarded as a fair use of training data, but also affects how the court views the evidentiary value of platform chat records in litigation in the future. AI companies will have to consider similar requirements they may face in the future when designing log retention policies, deletion mechanisms, and outbound delivery processes, which will push the industry to rebalance the boundaries between "data minimization" and "legal compliance."