How Does ChatGPT Training Violate Modern Privacy Laws?

How Does ChatGPT Training Violate Modern Privacy Laws?

Desiree Sainthrope is a distinguished legal authority whose work sits at the intersection of international trade agreements, global compliance, and the complex legalities of emerging technologies. With a career dedicated to navigating the intricacies of intellectual property and digital policy, she has become a leading voice in evaluating how generative AI challenges established legal frameworks. Her recent analysis of regulatory investigations into large-scale language models provides a roadmap for understanding the friction between rapid innovation and the fundamental right to privacy.

Many generative AI systems are trained on massive datasets scraped from the internet without explicit user consent. How do these practices conflict with current private-sector privacy laws, and what specific risks of discrimination or harm do they create for individuals?

The core conflict lies in the fact that current laws, like the Personal Information Protection and Electronic Documents Act, require organizations to obtain meaningful consent and limit data collection to what is necessary for a specific purpose. When a company scrapes the internet to feed a model, it bypasses the individual’s awareness, often capturing sensitive personal details that were never intended for AI training. This creates a volatile environment where people are exposed to potential breaches and systemic discrimination based on the information ingested about them. We are seeing a dangerous trend where data is treated as a raw commodity, ignoring the fact that behind every data point is a person who may suffer real-world consequences if that information is misused or misinterpreted by an algorithm.

Large language models often struggle with fulfilling data subject access requests or deleting specific personal information once it is ingested. What operational changes should companies implement to ensure accurate data retention, and how can they better handle inaccuracies within their models?

Companies must move away from the “collect everything” mentality and implement rigorous data deletion and retention policies that are actually functional within a neural network’s architecture. To handle inaccuracies, developers need to establish clear protocols for correcting false information generated by the model, which currently remains one of the most difficult technical hurdles. Operationally, this means building transparency into the very foundation of the system so that when a user requests an update or a deletion, the organization has a verifiable pathway to execute it. Based on recent findings, failing to have these safeguards in place is no longer just a technical oversight; it is a direct violation of privacy rights that requires immediate updates to training pipelines.

Oversight of AI often requires coordination between federal and provincial regulators. How does this multi-jurisdictional approach affect the enforcement process, and why might authorities choose recommendations and monitoring over immediate financial penalties when a company demonstrates a willingness to cooperate?

A multi-jurisdictional approach, such as the 2023 joint investigation involving federal and provincial commissioners from Quebec, British Columbia, and Alberta, ensures that different facets of the law are addressed, though the findings may vary based on specific regional statutes. Regulators often prefer a collaborative route—issuing recommendations and monitoring progress—because it fosters a more immediate shift in corporate behavior than a protracted legal battle over fines. When a company like OpenAI demonstrates a commitment to transparency and implements privacy-protective measures during an investigation, it allows the authorities to secure a promise of ongoing compliance. This cooperative model can lead to more effective long-term results, as it turns the investigation into a catalyst for the developer to overhaul its data practices in real-time.

Current privacy statutes were often written before the rise of generative AI. What specific gaps in governance are being exposed by these new technologies, and what steps should lawmakers take to modernize regulations so they provide a more robust guardrail for innovation?

The primary gap is the lack of explicit rules governing how massive datasets used for model building are sourced and processed, which has allowed some companies to treat privacy as an afterthought. Lawmakers are now being urged to modernize legislation to ensure that digital risks are addressed before a product is even launched, rather than reacting to harms after the fact. We need laws that clearly state that powerful technology must work for the benefit of the public, which means embedding guardrails that prevent the over-collection of data. Modernization should focus on creating a framework for “trustworthy innovation,” where the push for technological speed is balanced by a mandatory respect for fundamental rights and proactive governance.

Developers are being pushed to limit the collection of sensitive data while increasing transparency. What are the practical steps for building privacy-protective training pipelines, and how can organizations prove they are prioritizing fundamental rights over rapid technological deployment?

Practical steps include filtering out sensitive personal information before it enters the training set and being entirely transparent with users about how their data contributes to the model’s evolution. Organizations can prove their commitment by undergoing regular audits and providing regulators with frequent updates on their deletion policies and accuracy standards. It is about shifting the culture from “launch first, fix later” to a model where privacy is a top priority throughout the development lifecycle. By limiting the amount of consumer data used and improving user awareness of the implications of these tools, companies can build a foundation of trust that actually supports, rather than hinders, their long-term growth.

What is your forecast for AI privacy regulation?

I anticipate a significant shift toward “accountability by design,” where the burden of proof will fall squarely on the developers to demonstrate that their AI models are not infringing on human rights. We will likely see more unified global standards as jurisdictions learn from one another’s investigations, moving away from a fragmented landscape toward a more harmonized set of rules. For the reader, this means that while AI will become more integrated into daily life, the era of unchecked data scraping is coming to an end. We are entering a period where privacy will be the primary metric for a technology’s success, ensuring that innovation does not come at the cost of our collective or individual security.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later