Synthetic Data and GDPR Compliance: How Artificial Intelligence Might Resolve the Privacy-Utility Tradeoff 

Michael Cairo[*]


Data is in many ways the lifeblood of the digital economy. High-quality data oftentimes requires significant detail which may be at odds with the privacy concerns of the human subjects from whom data is extracted. The tension between the usefulness of a dataset and the data subject’s privacy has been referred to as the “privacy-utility tradeoff.” A novel application of artificial intelligence has potentially made it possible to resolve this tradeoff through the creation of “synthetic data,” anonymized data generated through general adversarial neural networks from authentic raw data. Unlike pseudonymized data, synthetic data retain properties that are statistically equivalent to the underlying data gathered from data subjects. As the cost of compliance with privacy laws across the world increases, synthetic data may prove to be a viable solution to the tension between protecting individual privacy rights and the demand in the big data market.

This Note argues that large BigTech companies should incorporate synthetic data into their business models to protect users’ private, personal data while retaining large profits derived their ad-driven business models. Part I provides an overview of GDPR, the patchwork of U.S. privacy laws, and recent caselaw that illustrates EU regulators’ strict approach to enforcement compared to their U.S. counterparts. Part II discusses how the Privacy-Utility Tradeoff and BigTech’s current business model renders compliance with data privacy regulations difficult. Part III explains how synthetic data can be used to resolve the Privacy-Utility Tradeoff.


“The classic saying is: ‘if you’re not paying for the product, then you are the product,’ . . . [but t]hat’s a little too simplistic. It’s the gradual, slight, imperceptible change in your own behavior and perception that is the product.”[1]

As the role of technology continues to expand in the daily lives of most people around the globe, data companies are having difficulty complying with a shifting regulatory landscape as new laws governing data privacy emerge.[2]

Europe’s behemoth, the General Data Protection Regulation (GDPR or the Regulation),[3] is the most robust, comprehensive, and jurisdictionally far-reaching data privacy regulatory framework to date. U.S. privacy law lags behind, with no equivalent omnibus federal privacy law and only a few state laws working to fill the federal gap caused by a sector-specific patchwork of privacy laws.[4] As such, within the global regulatory landscape, GDPR stands as a significant concern for BigTech giants like Google and Facebook, whose entire business models depend upon freely collecting and processing their users’ personal data to sell advertisements,[5] a common industry practice that persisted largely unencumbered for nearly two decades—until GDPR’s enactment.

Data privacy laws, and GDPR in particular by way of European regulators’ unrelenting enforcement regime, are forcing BigTech companies to adapt.[6] Four years into its implementation, GDPR now applies to every company that interacts with the personal data of any resident of the European Union (EU).[7] The Regulation grants every EU resident a private right of action against any entity which processes that resident’s personal information in violation of the rights included in the Regulation, such as compulsory data breach and recourse notices, or the penalties for data being obtained, collected, or used without data subjects’ informed consent.[8] Violations have cost data companies a whopping $2,779,699,894 from GDPR’s implementation in May of 2018 through February of 2023.[9]

Compliance with GDPR and other new data privacy laws that protect personally identifiable information has been extremely costly and labor-intensive for data companies, as the current approach to achieving compliance focuses on deidentification (manually removing users’ personally identifiable characteristics from large datasets) of billions of users’ personal data that have been collected and maintained by these companies for nearly twenty years.[10] Aside from the direct cost of deidentifying user data, this approach comes with an additional utility cost due to the “Privacy-Utility Tradeoff”: the inverse relationship between the lack of personally identifiable characteristics within collected data and the utility of the data.[11] Data that is rich with personal information enables its custodian to use it for a wide range of purposes, like personalized advertisements tailored toward user preferences. Thus, while deidentification allows for compliance with data privacy laws, it diminishes the value of collected data to BigTech companies.

The advent of artificial intelligence (AI) and its role in efficiently processing massive datasets has further complicated the ongoing privacy discussion with respect to healthcare, policing and surveillance.[12] However, AI and its role in efficiently processing massive datasets may offer an alternative to deidentification and a solution to the Privacy-Utility Tradeoff in the form of synthetic data. As the name suggests,[13] synthetic data is essentially “fake,” AI-generated data that mimics authentic user-generated data without using personal data which could be used to identify a user.[14] Crucially, GDPR and similar U.S. privacy laws do not apply to such data that cannot be used to identify an individual user because synthetic data generally falls within the definition of “anonymous” data under GDPR Recital 26,[15] rending the GDPR inapplicable.[16]

This Note focuses on the differences between EU and U.S. privacy law as applied to synthetic data. Additionally, this Note focuses on Google and Facebook in particular because they are widely recognized BigTech companies which have drawn much of the recent ire from regulators and the public over their handling of users’ personal data. However, the implications discussed herein pertain to many data companies of all sizes. Part I provides an overview of GDPR and the patchwork of U.S. privacy laws and recent caselaw that illustrates EU regulators’ heavy-handed approach to enforcement compared to their U.S. counterparts. Part II discusses how the Privacy-Utility Tradeoff and BigTech’s current business model renders compliance with data privacy regulations difficult. Part III explains how synthetic data can be used to resolve the privacy-utility tradeoff and proposes a new business model designed for compliance with GDPR.


The EU and U.S. have taken considerably different approaches to data privacy regulation. In the information age, those differences are starting to reignite the debate surrounding data privacy in the U.S.[17] One such difference is how data privacy is conceptualized in the EU versus the U.S. In Europe, for example, privacy is a fundamental right,[18] while in the U.S., it is a bit more complicated.[19] Another key difference is that the EU takes a centralized, uniform approach to data privacy regulation through GDPR, while the United States has instead opted for a sector-specific patchwork of federal legislation, whereby narrow regulations are promulgated by specific federal agencies and individual states are permitted to impose supplemental privacy laws.[20] The U.S. Congress is nowhere near passing comprehensive privacy legislation at the federal level.[21]

To understand the data privacy compliance issue discussed in this Note, this Part will serve as an overview of the relevant provisions of GDPR, federal privacy laws in the United States, California’s CCPA, and other emerging state privacy laws that might provide insight about what a federal privacy framework may look like.[22] This Part will begin with their jurisdictional reach, key definitions, the rights of data subjects, basic requirements for compliance, the manner in which violations are adjudicated, and the magnitude of the penalties for violations. Next, this Part will highlight the major differences in EU and U.S. privacy law and discuss how the fall of the EU-U.S. Privacy Shield[23] has given rise to substantial uncertainty for global compliance.


Until 2018, BigTech giants operated largely unrestricted in their data collection methods and in their use of collected data.[24] But this all changed on May 25, 2018, when GDPR went into effect.[25] Today, GDPR is largely regarded as the groundbreaking data privacy gold standard by data privacy experts.[26] In essence, EU Member States regard privacy as a fundamental right[27] with GDPR’s primary aim being the protection of EU residents’ personal data.[28] There are five key provisions of GDPR which are relevant to this Note, each of which are discussed, in turn, in the following “Basic Overview” section. 1. Basic Overview

The first relevant GDPR provision relates to its jurisdictional scope. GDPR grants EU regulators expansive extraterritorial jurisdiction over “controllers” and “processers” both within the EU and beyond, so long as the actions taken by these entities involve the personal data of an EU resident.[29] “Controllers” and “processors” are among those subject to liability under the Regulation. “Controller” means “the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data,” and “processor” means “a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller.”[30] The second relevant set of provisions, Articles 3 and 4(1)–(2), also address the jurisdictional scope of GDPR.[31] Specifically, GDPR applies to any enterprise or individual who is engaged in the “processing of personal data . . . regardless of whether the processing takes place in the [European] Union or not,” where “processing” means “any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation [sic], structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission.”[32] Further, the Regulation accounts for extraterritorial processing of personal data. Such “cross-border processing” as defined in Article 4(23)(b) encompasses the processing of personal data “which substantially affects or is likely to substantially affect data subjects in more than one Member State.”[33] In other words, GDPR applies to any company or individual who collects or processes the personal data of anyEU resident, regardless of where the entity doing the collecting, or the data subject, are physically located—so long as such activities are merely likely to substantially affect residents of more than one EU Member State.[34]

The third set of relevant provisions defines the types of data and individuals about which GDPR is concerned. The Regulation defines “personal data” as “any information relating to an identified or identifiable natural person (‘data subject’),”[35] and defines “identifiable natural person” as “one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”[36] These definitions are rather broad but nonetheless more instructive than those set forth by the U.S. patchwork of privacy laws discussed in Part I.B.2.a. infra.

The fourth set of relevant provisions concerns the rights of individuals afforded GDPR data privacy protections. All data subjects are afforded eight basic rights and are entitled to certain disclosures regarding the use of their data. The eight user rights are: (1) the right to information;[37] (2) the right of access;[38] (3) the right to rectification;[39] (4) the right to erasure;[40] (5) the right to restriction of processing;[41] (6) the right to data portability;[42] (7) the right to object;[43] and (8) the right to avoid automated decision-making.[44] Additionally, Article 34 mandates that controllers disclose data breaches “without undue delay,” and requires controllers to maintain adequate technical and organizational security measures to prevent, and mitigate the severity of, data breaches.[45]

The fifth set of relevant provisions dictates when a controller is permitted to process a data subject’s personal data. Article 6, the most frequently violated provision,[46] permits the processing of personal data to only six legal bases: (1) the controller has obtained the data subject’s consent; (2) the data processing is necessary for the performance of a contract to which the data subject is a party; (3) the data processing is necessary for compliance with a legal obligation to which the controller is subject; (4) the data processing protects the “vital interests of the data subject;” (5) the data processing is necessary for the performance of a task carried out in the public interest; or (6) the data processing is necessary for carrying out the controller’s legitimate interests, “except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject.”[47] If none of these six conditions are met, GDPR prohibits the processing of any data subject’s personal data by any controller. Violators may be subject to painfully high financial penalties. 2. Penalties for Violations

Financial penalties are not just reserved for controllers who violate Article 6, however. Article 77 of GDPR provides every data subject with a private right of action against any processor who touches their personal data.[48] Moreover, Article 83 grants EU regulators with enforcement authority against processors for non-compliance, a two-tiered penalty hierarchy, and provides guidelines for imposing appropriate penalties.[49] Depending on the violation and the “nature, gravity and duration of the infringement,” fines can amount to 2% of a violator’s global annual turnover or €10 million, whichever is higher; or 4% of global annual turnover or €20 million, whichever is higher.[50] For companies like Meta (previously known as Facebook) or Google, who raked in $117 billion and $257 billion in annual revenue in 2021, respectively fines can be as high as $5.1 billion and $10.2 billion, respectively, per infringement.[51]

GDPR’s use of private rights of action as a means of enforcement illustrates a notable difference between EU and U.S. approaches to data privacy regulation. Unlike in the EU, U.S. federal law generally does not permit a private right of action in data privacy cases, with a few exceptions.[52] As discussed further below, this is just one of several key differences which exist between EU and U.S. methods of data privacy regulation.

B. U.S.: Privacy Patchwork

In addition to the lack of private rights of action for U.S. citizens, there are two clear differences between the EU and U.S. approaches to data privacy. First, the EU explicitly recognizes privacy as a fundamental human right, whereas the U.S. Constitution does not recognize any explicit right to privacy.[53] Rather, the Supreme Court has interpreted the overlap of multiple enumerated rights within the Bill of Rights as creating an implied right to privacy.[54] Secondly, as previously mentioned in this Note, the U.S. lacks a comprehensive data privacy framework at the federal level which even remotely resembles GDPR. Instead, a patchwork of several federal and state laws narrowly focuses on data privacy in specific industries. 1. U.S. Common Law on the “Right” to Privacy in the Digital Age

Supreme Court decisions from the last several decades illustrate the United States’ evolving attitude toward privacy as a constitutional right. Judicial attitudes toward modern privacy stem largely from Bill of Rights jurisprudence with a particular focus on the Fourth Amendment.

In Katz v. United States,[55] the Court established that the standard for determining whether the Fourth Amendment precludes government search and seizure under the Fourth Amendment is whether a criminal defendant had a “reasonable expectation of privacy” of his person, or in the area or item being searched. Two years after Katz, the Supreme Court held that privacy is a fundamental right in Griswold v. Connecticut,[56] explaining that a constitutional right to privacy can be found when reading several guarantees within the Bill of Rights together. The foregoing cases suggest that specific guarantees in the Bill of Rights have penumbras, formed by emanations from those guarantees that help give them life and substance. Various guarantees create zones of privacy. The right of association contained in the penumbra of the First Amendment is one, as we have seen. The Third Amendment, in its prohibition against the quartering of soldiers ‘in any house’ in time of peace without the consent of the owner, is another facet of that privacy. The Fourth Amendment explicitly affirms the ‘right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures.’ The Fifth Amendment, in its Self-Incrimination Clause, enables the citizen to create a zone of privacy which government may not force him to surrender to his detriment. The Ninth Amendment provides: ‘The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.’[57]

The Court’s recognition of these penumbras and their associated implied rights was not met without controversy, but nonetheless provided a basis for the Court to recognize that Americans’ right to privacy is as worthy of being protected as much as other enumerated constitutional rights. For example, in United States v.Jones, the U.S. Supreme Court, in considering an appeal of a Fourth Amendment challenge to law enforcement’s warrantless access to a criminal defendants’ location data derived from his cell phone provider, gave credence to the “mosaic theory” as a means of establishing a limited right to privacy.[58] Justice Scalia, writing for the five-justice majority in Jones, did not adopt the lower court’s mosaic theory, but five other justices wrote or joined opinions that echoed the lower court’s[59] reasoning. They reasoned that while an individualized search (or, for our purposes here, a singular instance of data collection) may not necessarily intrude on an individual’s right to privacy, an aggregated collection of seemingly innocuous individual pieces of collected data taken together can, like a mosaic,[60] create a vivid and intrusively detailed picture of how a person lives their life in such a way that effectively strips them of privacy.[61] Thus, while the Jones Court did not strike down this particular acquisition of an individual’s data, they opened the door to the idea that such permissionless access, in the aggregate, may amount to certain unconstitutional privacy violations.

The Court took the Jones ruling a step further in the subsequent Riley v. California[62] case. The Court wrestled with the privacy-utility tradeoff in the criminal context, acknowledging that while personal data stored in a defendant’s cell phone can be useful to law enforcement, data “cannot itself be used as a weapon to harm an arresting officer or to effectuate the arrestee’s escape.”[63] The Court held that warrantless searches of the data stored on a criminal suspect’s cell phone violates the Fourth Amendment,[64] and the exigency exception[65] to the Fourth Amendment’s warrant requirement does not always apply. Notably, despite this decision’s seeming resemblance to Katz, the Court reached this decisionunder a theory of property law rather than along the privacy-focused lines of Katz,[66] evincing the Court’s reluctance to address the issue of “privacy” absent federal legislation.

Another victory for data privacy rights would emerge through the Court’s eventual challenging of the third-party doctrine. Historically, data privacy in the United States has been limited by the broad application of the third-party doctrine, which holds that “a person has no legitimate expectation of privacy in information he voluntarily turns over to third parties,”[67] “even if the information is revealed on the assumption that it [[]] will be used only for a limited purpose.”[68] In her concurrence in Jones, Justice Sotomayor seemed to disfavor the third-party doctrine, stating that “it may be necessary to reconsider the premise that an individual has no reasonable expectation of privacy in information voluntarily disclosed to third parties.”[69]

In 2018, the Court attacked the doctrine head-on in Carpenter v. United States, holding “[i]n light of the deeply revealing nature of [smartphone location information], its depth, breadth, and comprehensive reach, and the inescapable and automatic nature of its collection, the fact that such information is gathered by a third party does not make it any less deserving of Fourth Amendment protection.”[70] In his Carpenteropinion, Chief Justice Roberts quoted Justice Brandeis’ dissent in Olmstead v. United States,[71] writing that “the Court is obligated—as ‘[s]ublter and more far-reaching means of invading privacy have become available to the Government(sic)’—to ensure that the ‘progress of science’ does not erode Fourth Amendment protections.”[72] As prophetic as Justice Brandeis’ concerns in Olmstead may seem when read nearly one hundred years later, one explanation for the U.S.’s lack of a federal data privacy framework could be that the American tradition of fostering economic growth through free enterprise tips the scale to favor utility over privacy, discussed in Part II infra. 2. Federal Privacy Laws

Fourth Amendment jurisprudence contemplates the idea of privacy as a right,[73] and has grappled with the Privacy-Utility Tradeoff most clearly in the law enforcement context, but adequate privacy legislation suitable for the digital age has not followed.

The Federal Trade Commission (FTC) is the primary agency responsible for enforcing federal consumer protection laws, including data privacy laws. The FTC has the authority to take action against companies that engage in deceptive or unfair practices related to data privacy, guided primarily by the Commission’s own guidelines.[74] The Office of Civil Rights under the Department of Health and Human Services enforces data privacy and security matters pertaining to healthcare and patient data under its authority granted in Health Insurance Portability and Accountability Act of 1996 (HIPAA).[75] The Federal Communications Commission (FCC) regulates data privacy related to broadband providers, and the Department of Education (DOE) enforces data privacy regulations under the Family Educational Rights and Privacy Act (FERPA) which applies to educational institutions.[76] Other federal statutes grant individuals a private right of action to sue for limited damages, with some jurisdictions unclear as to whether explicit privacy violations related to one industry-specific categorization of data apply to others.[77]

Professor Steven M. Bellovin and his coauthors summarized the central problem with this federal patchwork approach nicely: Protected sectors range from health (HIPAA) to finance (FCRA), and often hinge the statutory shield on the definition of “personally identifiable information” (PII). Put simply, if a fact (i.e., a datumin the database) contains PII, then it is protected and cannot be shared; if the fact does not contain PII, then it is not protected and may be shared freely. The problem comes from delineating PII from non-PII.[78]

To complicate matters further, and central to this Note’s thesis, Professor Paul Ohm challenged the idea that one can reliably separate personally identifiable information from the surrounding benign information,[79] as discussed infra in Part II. Thus, GDPR’s centralized authority, with universal definitions, rights, and obligations, appears to have several advantages over the U.S.’s patchwork structure, which the U.S. could seek to learn from to improve its own data privacy regulation efforts. Nonetheless, CCPA is the closest that American law has to GDPR and may be a step toward regulatory clarity. Common threads between CCPA and other emerging state privacy laws are beginning to appear and may provide insight into what a single, centralized federal privacy statute will look like. a. Consumer Data Privacy: FTC Act

The Federal Trade Commission is the chief data regulator in the U.S. by way of its charge to protect consumers.[80] The FTC’s current role in U.S. consumer data privacy law essentially boils down to using its enforcement authority (granted in § 5 of the FTC Act) to take action against companies who fail to, or who deceptively, obtain users’ consent for how their data is used.[81] The FTC also enforces industry-specific federal data privacy statutes, including Gramm-Leach-Bliley Act (GLB Act), the Children’s Online Privacy Protection Act (COPPA), the CAN-SPAM Act, and the Telemarketing and Consumer Fraud and Abuse Prevention Act (“Do Not Call Rule”).[82] Neither the FTC Act nor the others named above contain a private right of action, so the Commission itself, rather than private litigants, is tasked with investigating suspected violations and deciding whether to bring a lawsuit.[83]

In the U.S., each sector-specific privacy law has its own definition of personally identifiable information. Commentators have rightfully pointed out that such an approach essentially renders the term impossible to understand.[84] Generally, definitions of “personal information” or “personally identifiable information,”—the U.S. corollaries to GDPR’s “personal data,” under federal privacy laws—operate on the assumption that “personal information” worthy of statutory protection is simply information that can be used to identify a person.[85] For example, under the Video Privacy Protection Act (VPPA), personally identifiable information is defined as “information which identifies a person.”[86] As Professors Schwartz and Solove point out, this definition’s utility is in its openness and flexibility to respond to changing circumstances, but the definition’s flaw is that it “simply states that PII is PII.”[87] In contrast, the GLB Act, a financial privacy statute, covers “nonpublic personal information,” and defines it as “personally identifiable financial information (i) provided by a consumer to a financial institution, (ii) resulting from a transaction or service performed for the consumer, or (iii) otherwise obtained by the financial institution.”[88] These are very different definitions which only add to the confusion that is privacy law in the U.S.

When reviewing the definition of “personal information,” it is also important to understand what is and what is not protected under these federal statutes. In other words, it is important to know when PII becomes non-PII. Most federal statutes reflect the assumption that when certain information that can be directly linked to a person (e.g., full name, social security number, bank account number, IP address, etc.) is removed from a dataset, then it is considered “deidentified” or “anonymized” and the entity in possession is no longer subject to the same privacy and security requirements.[89] The GLB Act, for example, specifically excludes publicly available information and any consumer list attained without using personally identifiable financial information.[90] The FTC’s final rule under the GBL Act excludes deidentified data from the statute, classifying it as “[i]nformation that does not identify a consumer, such as aggregate information or blind data that does not contain personal identifiers such as account numbers, names, or addresses.”[91] Conversely, HIPAA is much more specific and enumerates eighteen particular identifiers that constitute “protected health information” (“PHI”),[92] the healthcare corollary to GDPR’s “personal data.” The eighteen identifiers are: names; postal address information other than town or city, state and zip code; telephone numbers; fax numbers; email addresses; social security numbers; medical record numbers; health insurance numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers including license plate numbers; URLs; IP addresses, biometric identifiers including finger and voice prints; and full face photographic images and any comparable images.[93] The statute also provides permissible methods for deidentification that removes PHI from HIPAA’s scope. b. Health Data: HIPAA

HIPAA is one of the only data privacy laws in place in the U.S. that resembles GDPR at the federal level due to its robust and comprehensive nature. HIPAA applies only to “covered entities,” which include healthcare providers, insurers and clearinghouses, and to “business associates” that receive data from covered entities.[94] HIPAA’s purpose is to protect the privacy and security of patients’ sensitive healthcare information, denoted as “protected health information” or “PHI.”[95] PHI is defined as: any individually identifiable health information that is transmitted or maintained in any form or medium; is held by a covered entity or its business associate; identifies the individual or offers a reasonable basis for identification; is created or received by a covered entity or an employer; and relates to a past, present or future physical or mental condition, provision of health care, or payment for healthcare to that individual.[96]

To address the obvious privacy-utility tradeoff inherent in dealing with private personal health information, Congress passed the Health Information Technology for Economic and Clinical Health Act of 2009

(HITECH)[97] which updated HIPAA to impose strict compliance standards and hefty monetary penalties for unauthorized disclosures of

PHI.[98] Detailed, intimate health data is an extremely valuable resource to a pharmaceutical company, for example, and HITECH’s regulatory teeth are an attempt to quell abuse. Like GDPR, the Privacy Rule[99] and the Security Rule[100] under HITECH and HIPAA set forth minimum necessary privacy and security standards regarding PHI, to which covered entities and business associates must adhere to remain in compliance. Notably, HIPAA is the most robust codification of the four principles set forth in the FTC’s Fair Information Privacy Practices: notice, choice, access, and security.[101] The Office of Civil Rights within the Department of Health and Human Services is the primary enforcer of the Privacy Rule, and it can levy civil monetary penalties of up to $1.75 million per calendar year per type of violation.[102] For example, Anthem, Inc., a business associate providing administrative services to a health insurer, paid $16 million—the highest settlement for a HIPAA violation to date—when a series of cyberattacks targeting Anthem exposed the electronic PHI (ePHI) of over 78 million individuals.[103]

While HIPAA is a good start to protecting privacy in a universally sensitive area of an individual’s life, its dependency on deidentification as the ultimate line of defense is no longer sufficient due to risks of reidentification.[104] 3. California Consumer Privacy Act (CCPA)

California’s CCPA is the first American consumer data privacy statute that resembles GDPR, due to CCPA’s comprehensive and centralized nature.[105] CCPA grants California residents specific rights, including the right to notice, the requirement of user consent, the right to erasure, the right to opt out from the sale of personal information, and the right to be free from discrimination if a consumer chooses to opt out of a company’s data collection practices.[106] However, unlike GDPR, CCPA does not contain a right to correction, and the penalties imposed by CCPA are much lower than those imposed by GDPR.[107] Broadly, the statute applies to for-profit businesses that collect the personal information of any California residents, and which: earn $25 million or more in global annual revenue, collect personal information from 50,000 or more consumers, or derive 50% of their revenue from selling data.[108] Like GDPR, CCPA applies to all businesses who collect California residents’ data regardless of where in the world the business is located.[109]

CCPA defines “personal information” as “information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.”[110] Personal information also includes direct identifiers; commercial information; biometric information; internet activity; geolocation data; audio, electronic, visual, thermal, olfactory or similar information; employment information, education information; and psychographic information.[111]

CCPA enforcement falls primarily under the authority of the California Attorney General, who investigates and takes action against companies who violate any portion of the statute.[112] Individuals also have a private right of action, but only for data breaches in which the individual’s unencrypted personal information is disclosed “as a result of a business’s failure to implement and maintain reasonable security procedures and practices appropriate to the nature of the information.”[113] Statutory damages range from $100 to $750 per consumer, per “incident,” or breach.[114]

Many commentators regard CCPA as a step in the right direction for U.S. privacy laws, as it simplifies the definition of “personal information” such that it is both broad enough to keep up with new data collection practices, but specific enough to be administrable.[115] CCPA also places the burden on tech companies to protect user information, rather than permitting them to escape liability through disclaimers and limitations of liability in their terms of service and privacy policies.[116]

C. Schrems II & Fall of the EU/U.S. Privacy Shield

While EU law has long offered robust data privacy rights and enforcement capabilities (even prior to GDPR’s enactment), similarly robust protections have remained noticeably absent from U.S. law. This gap between U.S. and EU privacy laws has repeatedly complicated bilateral trade as it pertains to data transfers.[117] Before GDPR went into effect in 2018, EU residents had no way to directly control what happened to their data after it was transferred to different countries. Accordingly, to ensure that EU residents enjoyed the “adequate levels” of data protection afforded to them under Articles 25 and 26 of the EU Data Protection Directive (95/46/EC), corrective measures were taken in 2000 and again in 2016: the Safe Harbor Agreement[118] and the EU-U.S. Privacy Shield, respectively.[119] Ultimately, however, neither measure would prove to be sufficient, as both were deemed invalid under EU law[120] thanks to the efforts of a bold, young Austrian lawyer and data privacy activist named Max Schrems.[121] Thus, U.S. tech companies are now back in regulatory limbo with EU data privacy law.

The Safe Harbor Agreement aimed to ensure that U.S. companies provided EU residents’ personal data the same levels of protection it would otherwise receive in the EU.[122] Under the Safe Harbor Agreement, American companies subject to the jurisdiction of the FTC and the Department of Transportation (DOT) were eligible to self-certify their inclusion into the Safe Harbor program.[123] Relying on the prohibition against deceptive or unfair trade practices under § 5 of the Federal Trade Commission Act, the FTC would take enforcement actions against any such companies that failed to comply with applicable EU privacy laws.[124] Although the European Commission deemed the substance of the Safe Harbor Agreement to be sufficient to comply with Articles 25 and 26,[125] the decision was not met without public dissent.

In the EU case, Schrems v. Data Protection Comm’r, Max Schrems sued Facebook for transferring his personal data from Ireland to the United States.[126] He argued that the Safe Harbor Agreement in its entirety was incompatible with EU privacy law. Specifically, Schrems argued that Edward Snowden’s revelation of the U.S.’s domestic surveillance program (PRISM)[127] evidenced the U.S. government’s unfettered access to EU data subjects’ personal data without requiring a court order and without providing any means of redress.[128] The Court of Justice of the European Union (CJEU), hearing the case on appeal in 2015, agreed with Schrems and held that the Safe Harbor Agreement was invalid because U.S. law fails to ensure an adequate level of protection under Directive 95/46, stating: [o]nce the personal data has been transferred to the United States, it is capable of being accessed by the National Security Agency (NSA) and other federal agencies, such as the Federal Bureau of Investigation (FBI), in the course of the indiscriminate surveillance and interception carried out by them on a large scale.[129]

In response to the invalidation of the Safe Harbor Agreement after Schrems I, the U.S. Department of Commerce and the European Commission set forth the EU-US Privacy Shield Framework[130] in 2016—the same year GDPR was passed—to provide an alternative legal means to transfer data from the EU to the U.S. While BigTech titans were quick to praise the new Shield,[131] critics like Schrems himself were quick to point out that the Shield would not resolve the privacy concerns, namely the NSA’s surveillance program, that led to the fall of the Safe Harbor program.[132] The Shield was substantively the same as the Safe Harbor Agreement, save for a few additions. Namely, the Shield incorporated the latest requirements and rights in the newly passed GDPR, included a U.S. Ombudsperson responsible for providing guidance on redress to EU residents, and was accompanied by a handful of letters from U.S. intelligence officials assuring EU residents that they can sue the NSA if they became subject to unlawful surveillance[133] —that is, if they could establish standing in court.[134]

However, the Shield would meet the same fate as the Safe Harbor Agreement in 2020, when Max Schrems struck again. In Data Protection Commissioner v. Facebook Ireland Ltd. (Schrems II), the CJEU invalidated the EU-US Privacy Shield, finding that the Privacy Shield was incompatible with GDPR and the European Union’s Charter of Fundamental Human Rights.[135] GDPR’s Articles 46(1) and 46(2)(c) require that EU data subjects whose personal data is transferred to another country are “afforded a level of protection essentially equivalent to that guaranteed within the European Union,” with such protection including “appropriate safeguards, enforceable rights and effective legal remedies.”[136] The CJEU determined that the U.S. surveillance program did not have “appropriate safeguards,” pointing to section 702 of the Foreign Intelligence Surveillance Act[137] and Executive Order 12,333.[138] The CJEU additionally found that U.S. law does not offer “enforceable rights and effective legal remedies” for EU residents whose data is transferred to the United States, pointing to Presidential Policy Directive 28.[139]

In the aftermath of these decisions, companies dealing with EU residents’ personal data can still perform cross-border data transfers through standard contractual clauses (SCCs) in their terms of service.[140] However, they are now responsible for ensuring that the receiving country has laws in place that meet the requirements of GDPR, unlike the

U.S.[141] According to Stockholm-based business and tech law firm, Sharp Cookie Advisors: The recipient is obliged to inform the exporter of any impediments to its compliance to the SCC’s [sic]. If the existence of local surveillance laws . . . would impede the alignment with the GDPR, then the exporter (read your customers) must stop the transfer and end the contract. If the data exporter fails its obligations under the SCC, the lead supervisory authority must intervene and may prohibit the transfer.[142]

This puts all transnational companies dealing with personal information in quite a bind, as they inherently operate globally by way of being internet-based companies. Companies must now follow convoluted processes for compliant EU-U.S. data transfers.[143]


The vast differences in how data privacy is implemented in the EU and the U.S. have made compliance difficult for data companies operating globally.[144] Additionally, the tradeoff between protecting user privacy and the usefulness of the retained data lies at the core of privacy laws like GDPR, HIPAA and CCPA. BigTech is particularly resistant to compliance with GDPR, as many data companies are accustomed to the U.S.’s lack of a robust omnibus privacy framework (though CCPA is beginning to change that). Enforcement decisions in the U.S. and the EU often highlight the fact that the compliance difficulties these tech companies face are caused by the overarching business model adopted by BigTech. This business model is simply not conducive to user privacy—as it is instead entirely predicated on maximizing data utility.[145] As per the privacy-utility tradeoff, efforts to improve data privacy are often at odds with the goal of prioritizing data utility.[146]

A. Privacy-Utility Tradeoff

Conceptually, the term “privacy-utility tradeoff” is used to refer to the incompatibility of the usefulness of data collected from individual users and the privacy which those individual users enjoy.[147] The tradeoff can be succinctly summarized as follows: “perfect privacy can be achieved by publishing nothing at all—but this has no utility; perfect utility can be obtained by publishing the data exactly as received from the respondents, but this offers no privacy.”[148] To illustrate, consider the following: Figure 1: Privacy-Utility Tradeoff for Small Datasets Figure 1[149] illustrates the inverse relationship between privacy and utility. Maximally private data has no utility, and maximally useful data is not private. The “ideal situation” at the dotted intersection, where privacy and utility are maximized, is illusory. An increase in either utility or privacy necessitates a decrease in the other. Figure 2[150] illustrates how the tradeoff is more easily managed in smaller datasets with fewer variables to be deidentified. Figure 2: Privacy-Utility Tradeoff for Large Datasets When the privacy-utility tradeoff is applied to larger datasets with hundreds or thousands of attributes, however, Figure 2 demonstrates how quickly privacy can destroy utility and vice-versa. The “big data trade-off shift” differential can be explained by considering the primary shortcoming of deidentification: the possibility of reidentification, or “linkage,” of the users underlying deidentified data.[151] Further, Figure 2 demonstrates that the current state of the tradeoff for large datasets is such that even if the dataset is rendered effectively useless by way of deidentification, the deidentified data is still not fully anonymous, i.e., it can be linked back to the original data user, thus defeating privacy as well.[152]

Drawing inspiration from HIPAA, data companies currently rely upon deidentification and pseudonymization (the process of replacing personally identifiable information with artificial identifiers to protect individuals’ privacy while still allowing the data to be used for specific purposes) as the primary method of compliance with data privacy laws.[153] However, as demonstrated by Figure 2, these methods are not only ineffective at achieving user privacy, but are also ineffective for avoiding GDPR liability because deidentified data can be linked back to the original data subject and falls within the Regulation under Recital 26.[154] In addition to the failure of deidentification and pseudonymization efforts to achieve significant improvements in privacy, such efforts’ reduction in data utility run counter to the quintessential BigTech business model.

B. The BigTech Business Model

The business model underlying the meteoric rise of BigTech companies like Alphabet (Google’s parent company) and Meta (formerly known as Facebook) can be summarized in two words: ad revenue. According to both companies’ 10-K filings with the U.S. Securities and Exchange Commission (SEC), “substantially all” of Facebook’s $70.7 billion in annual revenue, and Alphabet’s nearly $161.9 billion in annual revenue, is earned from advertising.[155] Because advertising requires the attention of potential customers, it follows logically that BigTech’s business model revolves around the amount of attention its users give to their devices. The MD&A section of Alphabet’s 10-K provides a rather off-putting affirmation of this idea: “[o]ur users are accessing the Internet via diverse devices and modalities, such as smartphones, wearables and smart home devices, and want to feel connected no matter where they are or what they are doing.”[156] The “How we make money” section of Alphabet’s 10-K states that “[t]he goal of our advertising products is to deliver relevant ads at just the right time and to give people useful commercial information, regardless of the device they’re using.”[157] Knowing how “relevant” an advertisement is and what time is “just the right time” requires these companies to collect an immense amount of personal information to accurately characterize their users and predict their behavior in anticipation of the ads they are likely to be most responsive to.

BigTech’s public relations teams make these operations sound rather innocuous, however as more information about their operations continues to be revealed, it is perhaps more precise to say that Google’s and Facebook’s business models depend upon “surveillance capitalism”—a term coined by Harvard professor Shoshana Zuboff, used to describe the practice by which BigTech monitors, monetizes, and subtly influences their users’ behavior.[158]

To increase the attractiveness of their advertising capabilities to marketers, Google and Facebook both collect an eerily vast amount of personal data about each of their users—all of it subject to GDPR liability. Both platforms are equipped with biometric recognition features which allow them to identify a user’s face and voice.[159] Additionally, these companies can utilize users’ search queries and webpage engagement data to identify each user’s consumer preferences and religious and political beliefs based on interactions with related webpages.[160] By using scheduling features such as Google Calendar, or Facebook events, these companies can track what users will be doing in the future.[161] These platforms’ location services additionally allow them to track where users go, how they get there, how long they spend there, and how often they spend time at specific locations to predict where their users live and work and which locations they otherwise frequent.[162] Such tracking still occurs even if a user turns the location functions off on their device, sign out of their Google or Facebook accounts, or even completely delete their account.[163] In a series of interviews with prominent Silicon Valley BigTech pioneers responsible for designing much of the modern digital world, Netflix’s “The Social Dilemma” asserts that social media and BigTech are not just using ads in response to users’ digital activity, but are instead using them to influence their behavior on and off the screen.[164]

Illustrative here is the story of Cambridge Analytica. In 2016, the U.K.-based political consulting firm hired by a U.S. presidential contender’s campaign was able to target users at an extremely granular level using a technique called “psychographic targeting.”[165] By leveraging user data scraped from Facebook’s platform, the company was able to send advertisements particularly designed to override humans’ innate cognitive defenses by appealing to their most visceral emotions to incite fear or excitement that would theoretically motivate someone to vote for their client in the upcoming election.[166] Although this scandal ultimately ended in a $5.1 billion fine levied against Facebook by the FTC, Facebook CEO Mark Zuckerberg is not yet out of the woods when it comes to data privacy compliance—in fact, he is not even close.[167] When analyzing these practices and revelations in the context of data subjects’ rights under GDPR as discussed in Part I.A., one can easily imagine why compliance is so difficult for BigTech. The methods by which these companies strive for compliance (deidentification and pseudonymization) create a disconnect between data and its underlying user, while the lifeblood of the BigTech business model is such a data-to-underlying-user connection. In fact, Facebook cites “decreases in user engagement, including time spent on our products” and “failure to accept our terms of service, as part of changes that we implemented in connection with . . . GDPR” as risks to their financial performance.[168] Alphabet’s 10-K contains similar language that indicates that decreased usage and GDPR and CCPA are both major threats to their revenue model as well.[169] Both companies are investing plenty of time and capital into compliance while both U.S. and EU regulators are wasting no time assessing fines for violations.[170]

C. Current Compliance Methods & the Re-Identification Problem

Current measures for privacy compliance generally involve some variation of data “anonymization”[171] techniques which essentially come down to stripping data of personal identifiers.[172] “Anonymization” is an umbrella term used to describe a variety of techniques (e.g., deidentification, pseudonymization) aimed at removing personal identifiers from sets of personal data.[173] Each of these techniques may themselves have multiple meanings or involve distinct processes across different laws or jurisdictions.

For example, deidentification under HIPAA refers to stripping a dataset of the eighteen enumerated identifiers.[174] HIPAA’s Privacy Rule provides a “safe harbor” for such deidentified data.[175] The statute provides two permissible methods for deidentifying data: (1) removal of the eighteen enumerated identifiers or (2) expert certification.[176] The identifiers include names, telephone numbers, addresses, biometric identifiers, medical record numbers, etc.[177] Expert certification requires that a “person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable” applies such knowledge to “determine[] that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information.”[178]

This system is far from perfect, however. As Professor Paul Ohm pointed out in 2010, “anonymization” is a bit of a misnomer as it has been used in privacy scholarship.[179] “Deidentification” is a more precise term, because “anonymization” in its colloquially defined use refers to simply stripping raw data of its unique identifiers rather than rendering the data into a state where its origin is indeed truly “anonymous.”[180] For a practical example of “deidentification,” picture a dataset measuring consumer preferences for Coke or Pepsi which contains a list of respondents’ names, sex, dates of birth, ZIP codes, addresses, and email addresses. A deidentified version of the same dataset would likely preserve the indicated beverage preference; ZIP code, to determine geographic variance; and dates of birth, to assess variance among age demographics. It would omit obviously identifiable information like the respondents’ names, addresses, and email addresses.

Over ten years ago, however, Ohm and other scholars lifted the veil on how deidentified data still has the potential to be linked back to individual underlying data subjects through the comparison of the deidentified data set with additional information relevant to the data subject (a phenomenon known as “reidentification”), thus defeating the purpose of deidentification. His article[181] illustrated how deidentification is wholly insufficient as a privacy protection measure by highlighting, among other indications, a study conducted by professor of computer science, Latanya Sweeney, in which she was able to identify 87.1 percent of Americans using only their ZIP code, date of birth and sex, each of which are supposedly non-identifying data points which are likely to be found within “deidentified” data sets.[182] Although two subsequent studies were unable to replicate that 87.1 percent finding, they did successfully reidentify 63 and 61 percent of data subjects from 1990 and 2000 U.S. census data, respectively.[183] Pseudonymization is effectively analogous to deidentification and carries the same risks of reidentification.[184] The distinction is that instead of merely redacting or omitting identifiers, pseudonymized data renames them with a string of characters and the controller of the dataset preserves a legend that can be used to link the data subject with the unique identifier to which their data has been assigned.[185] In a pseudonymized dataset, “John Smith” becomes “user027462.”

The risk of reidentification has even been recognized in GDPR, which provides: “personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered as information on an identifiable natural person.”[186] In other words, even if Google strips a dataset of its personally identifiable characteristics, it is still treated as personal data under Article 4(1) and Recital 26 of the Regulation if an observer can pair that data with any additional information that will allow him to discover its corresponding data subject.

In short, deidentification and pseudonymization both strip the underlying data of any way to attribute the data to an individual user when taken alone. However, deidentified and pseudonymized data can all be traced back to an individual with relative ease once an observer gains access to the database, either by cross-referencing dates of birth with sex and ZIP code in a deidentified dataset as Professor Sweeney did, or by accessing a pseudonymized dataset’s corresponding legend.[187] With the explosion in the volume of personal data collected in the past several years, reidentification can reasonably be expected to become easier as techniques simultaneously improve.

D. Recent Enforcement Actions

Following GDPR’s implementation, total fines for violations have reached a staggering $$2,779,699,894, as of February 2023[188] According to a 2020 report[189] by U.K. software company Exonar, “39% (appx. $244 million)[190] of GDPR-related fines were due to insufficient security, [and] 25% of fines (appx. $159 million) were related to unsecured or over-retained data.”[191] The GDPR Enforcement Tracker tells a similar story.[192] Data from the Tracker indicates that the two most frequent categories of GDPR violations are: (1) “[i]nsufficient technical and organizational measures to ensure information security,” typically related to data breaches where user data is exposed like in the British Airways case[193] (though this category also includes violations whereby a user discovers that a company is storing their personal data with insufficient cybersecurity measures);[194] and (2) “[i]nsufficient legal basis for data processing,” which can include processing data without sufficient consent under Article 6.[195]

In the United Kingdom, the British Information Commissioner’s Office (ICO) found that British Airways violated Articles 5(1)(f)[196] and 32[197] of the Regulation when a 2018 cyberattack exposed the data of nearly 430,000 customers due to what ICO found to be inadequate security measures.[198] The ICO initially imposed a $200 million fine but reduced it to $25 million on appeal due to the financial hardship the airline had been under due to the devastating economic impact of the COVID-19 pandemic in 2020.[199]

In a seminal example of an enforcement action taken in response to an Article 6 violation, the French data regulator, le Commission Nationale de l’Informatique et des Libertés (CNIL), imposed a $57 million fine on Google—and rejected its appeal to reduce it.[200] CNIL found that Google had not provided “sufficiently clear” information to its consumers regarding how it processes their personal data for the purpose of providing targeted ads, and that the company failed to obtain the consumers’ informed consent.[201] The ruling on failure to obtain informed consent rested on two separate grounds. The agency court first noted that consumers had to click through five to six pages before they could meaningfully access the settings on how their data was collected and how it would be used.[202] They then ruled that the collected consent is neither “specific” nor “unambiguous” because a user has to agree to Google’s terms of service and privacy policy before accessing the platform, therefore giving their consent in full before having a chance to modify the collection options offered.[203] Thus, users’ consent was not lawfully obtained, therefore nullifying their “consent” as a lawful basis for processing under Article 6(1)(a).[204] In the U.S., we consider these kinds of clickwrap agreements to be routine. These enforcement actions thus illustrate the contrast between standard industry practices and the practices which GDPR mandates.

In a July 2021 SEC filing, Amazon disclosed to the public that it was issued a fine of approximately $797 million by the Luxembourg National Commission for Data Protection for failing to comply with GDPR regarding the processing of personal data.[205] A French privacy organization called La Quadrature du Net complained to CDNP in 2018, alleging that Amazon’s targeted advertising strategies involved undisclosed data collection tactics for which it failed to obtain user consent, as required under GDPR.[206] Little is known about the details of this enforcement action, however, as the CNPD has stated that due to secrecy laws in Luxembourg, it cannot comment on individual cases or complaints.[207] It is uncertain if the CNPD will publish its findings, as they are usually anonymous unless special powers are invoked.[208] According to one U.K. law firm, details from the complaint suggest that the case focused on whether Amazon had a sufficient lawful basis for processing personal data, and Amazon’s argument that it could process personal data based on a contract with data subjects.[209]

Privacy enforcement has ramped up in the United States as well, though not as aggressively as in the EU. As the 2010 FTC Report highlights, “[s]ince 2001, the FTC has used its authority under a number of statutes—including the FCRA, the GLB Act, and Section 5 of the FTC Act—to bring 29 cases against businesses that allegedly failed to protect consumers’ personal information.”[210] However, contrary to the authors’ congratulatory tone employed in the 2010 FTC Report, twenty-nine cases in nine years pales in comparison to GDPR’s aggressive enforcement, with 611 fines issued between the Regulation’s enactment in January 2018 and April 2021.[211] In the FTC’s Privacy & Data Security Update: 2019, the Commission states that they have brought “more than 130 spam and spyware cases and 80 general privacy lawsuits,” and based on figures from the two preceding years, these numbers appear to be cumulative.[ 212] To be fair, GDPR’s enforcement covers a broad array of data privacy and security actions that may not fall within the scope of the FTC’s § 5 authority, which may explain the FTC’s apparent dearth of enforcement actions when compared to GDPR. Additionally, while Amazon’s $797 million fine is the largest GDPR fine issued to date, the FTC has issued massive fines.[213] Recently, the Commission fined Facebook $5 billion—the largest fine ever issued for a consumer data privacy violation.[214] According to the settlement, the FTC found that Facebook misrepresented users’ ability to control the privacy of their information and deceptively shared information about users and their friends with third-party applications.[215]

The use of synthetic data may have reduced the severity of, or even fully prevented, the data privacy violations at issue in some of the enforcement actions discussed above. If the data that had been breached in the British Airways case, for example, was synthetic data, the number of customers whose identifiable personal data was exposed could have been reduced. In the Google and Facebook cases above, both companies would still have had to obtain meaningful consent to process and collect user data (which necessarily precedes the generation and use of synthetic data), so the use of synthetic data would not likely have changed the outcome. However, regardless of the consent issue, any company is susceptible to a data breach, and storing synthetic data rather than personal data would be consistent with GDPR’s principles of data minimization and related privacy principles, and would thus likely reduce the harm and consequent fines resulting from a potential data breach.

As discussed in Part I infra, Schrems II upped the ante even more for U.S.-based companies who enjoyed the protections of the EU-U.S. Privacy Shield that insulated them from liability for processing EU residents’ data.[216] By contrast, the current state of affairs following the collapse of the EU-U.S. Privacy Shield, with public scrutiny and enforcement actions ramping up, increased interest in privacy, and emergent privacy laws, gives data privacy compliance heightened urgency, with reliable means of limiting data privacy infractions becoming increasingly valuable to data companies. As such, due to its potential to proactively prevent or mitigate such infractions, synthetic data could prove to be a very useful compliance tool at this particular juncture.



The multi-layered compliance challenges faced by data companies as a result of shifting global privacy laws, increasingly aggressive enforcement, and conflicts with prevailing business models have created a need, now more so than ever, for innovations in the BigTech space. Synthetic data may represent this much-needed innovation. Synthetic data stands out as a particularly useful compliance tool because it can help companies reliably adhere to data privacy regulations, within both the U.S. and EU, while resolving the privacy-utility tradeoff which plagues traditional compliance methods.[ 217] Because synthetic datasets are entirely fabricated, they are truly anonymous in the sense that the underlying data subjects cannot possibly be identified, thus rendering GDPR, CCPA, HIPAA, and other major privacy laws inapplicable. This Part first describes the process of developing synthetic data, then discusses its efficacy under EU and U.S. privacy law.

A. Synthetic Data Primer

AI-created synthetic data might prove to be a potent solution to the compliance issues that more effectively balances the needs of tech companies with the privacy rights of consumers.[218] Synthetic data is essentially “fake” data made from real data that is statistically equivalent to the authentic personal data that it is given.[219] It uses an original dataset comprised of personal data and creates an entirely new “synthetic” dataset in an irreversible one-way hashing process that makes it impossible for hackers or malicious insiders to recreate the original personal data or to identify its source.[220] Unlike pseudonymized data, synthetic data cannot be used to identify original users.[221]

Synthetic data is created using two methods of AI: variational autoencoders (VAE) and generative adversarial networks (GAN).[222] As Unite AI’s Dan Nelson explained: VAEs are unsupervised machine learning models that make use of encoders and decoders. The encoder portion of a VAE is responsible for compressing the data down into a simpler, compact version of the original dataset, which the decoder then analyzes and uses to generate a representation of the base data. A VAE is trained with the goal of having an optimal relationship between the input data and output, one where both input data and output data are extremely similar. **** When it comes to GAN models, they are called “adversarial” networks due to the fact that GANs are actually two networks that compete with each other. The generator is responsible for generating synthetic data, while the second network (the discriminator) operates by comparing the generated data with a real dataset and tries to determine which data is fake. When the discriminator catches fake data, the generator is notified of this and it makes changes to try and get a new batch of data by the discriminator. In turn, the discriminator becomes better and better at detecting fakes. The two networks are trained against each other, with fakes becoming more lifelike all the time.[223]

“Generators” are thus able to generate increasingly lifelike fake datasets as time goes on.[224]

Stanford researchers have used an apt analogy in explaining GAN-generated synthetic data: counterfeit money.[225] The generator component studies the details of the dollar bill, creates what it believes to be an indistinguishable copy, and the discriminator scrutinizes its details and sends it back to the generator when it finds a distinction. The process repeats until the discriminator cannot separate the authentic from the counterfeit. Not only can synthetic data optimize compliance, but it can foster innovation by creating simulations.[226] In practice, synthetic data has been the key to recent advancements in self-driving cars, development of vaccines to SARS-CoV-2, and it is even being used by Facebook now to train AI algorithms to identify language that resembles bullying to augment their content moderation practices.[227] Thus, there are numerous potential benefits available to companies who make use of such synthetic data technology, even beyond those relating to the issues discussed in this Note.

B. Privacy Law Exceptions for Synthetic Data

Synthetic data is generally exempt from the provisions of many data privacy regulations. For example, GDPR’s Recital 26 provides: The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. . . . The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.[228]

Notably, Recital 26 distinguishes between pseudonymized data and anonymized data, as discussed in Part II.B. To reiterate, the term “anonymized” in its conventional use (referring to pseudonymized or deidentified data) is a misnomer because such data may be reidentified, and is thus not truly “anonymous” under Recital 26. Synthetic data, however, does fall under the definition of “anonymous” as provided for in Recital 26 because it can never be linked back to the underlying data subject. Thus, as per Recital 26, “[t]he principles of data protection should therefore not apply” to synthetic data, which is not regulated under


Similarly, synthetic data does not fall under the several definitions of “personal information,” or the equivalent term, under the various U.S. data privacy laws, even HIPAA’s definition of PHI, as discussed in Part

I. infra.[229] U.S. privacy laws generally follow HIPAA’s permissible methods of creating deidentified data, meaning data where specific identifiers have been removed.[230]

C. Synthetic Data as a Compliance Solution

Because synthetic data escapes the various definitions of “personal data” discussed in this Article, its future as a data privacy compliance tool is quite promising—and investors have taken notice.[231] Its value proposition is simple: a data controller can collect a small, representative dataset, replicate its utility with stunning accuracy to a global scale, all while reducing the associated exposure to data privacy regulations, and at a fraction of the cost of traditional compliance methods. 1. Benefits of Synthetic Data

The principal benefits of synthetic data are statistical equivalence, , ease in achieving regulatory compliance, and cost-effectiveness (given the high value of synthetic data and its relative low cost of generation). Synthetic data startups are raising significant amounts of capital as investors have begun to realize synthetic data’s potential in financial technology, healthcare, government, telecom, pharmaceuticals, e-commerce, transportation and logistics, manufacturing, and, as this Note suggests, consumer-based data platforms.[232]

Synthetic data is already being utilized as a HIPAA-compliant, and extremely efficient, replacement for PHI for COVID-19 vaccine researchers.[233] For example, the National Institute of Health partnered with Syntegra, a synthetic data company, to generate a comprehensive synthetic database using over 2.6 million COVID patients’ health information.[234] Because patients’ healthcare providers validated the data, the synthetic data was accurate. NIH exclusively utilized this synthetic data in their research efforts, without retaining any real patient data. And bea was And because the synthetic data was fed into NIH’s database, no patients’ privacy was ever at risk of being violated in through NIH’s research process. In fact, in January of 2021, the Department of Health and Human Services, recognizing the great potential of synthetic data in healthcare, opened the Synthetic Health Data Challenge, offering $100,000 in prize money for competitive solutions.[235] The goal of the program is to “[e]ngag[e] the broader community of researchers and developers to validate the realism and demonstrate the potential uses of the generated synthetic health records through a challenge,” with a focus on synthetic opioid, pediatric, and complex care patient records.[236]

Google’s fate in France would have likely been different in 2019 if the company had used synthetic data.[237] If it had used synthetic data that was truly anonymized and fell outside of GDPR’s scope, it could have avoided liability and still provided useful consumer trends for advertisers that were statistically equivalent to the data that ended up costing them $57 million. Because Google currently dominates the search engine market (so much so that they are facing antitrust action in the U.S.) with a whopping 88% market share in the U.S. market for general search engines and 70% in the search advertising market, they are in such a powerful position that advertisers will have a hard time finding a better place to take their advertising expenditures.[238]

Moreover, data companies may realize cost savings from synthetic data use through reductions in time and labor costs required to manually deidentify personal data, the expense of manually labeling datasets purchased from a collector for another purpose, or by leveraging synthetic data’s predictive capacity to entirely replace tests or surveys needed to collect the data in the first place.[239] Data companies could also be saving shareholders millions, if not billions, per year in foregone enforcement fines for GDPR violations or similar data privacy laws as other jurisdictions start to catch on. Most of the companies generating synthetic data are private, and therefore so are their financials and price points. Amazon Web Services, however, offers access to its own synthetic data generator for $995 per year.[240] Such a cost represents a significant decrease from the typical costs associated with current deidentification processes. 2. Drawbacks

While promising, synthetic data is not entirely foolproof. Much like any algorithm or dataset, the outcomes are only as good as the underlying data. Synthetic data (just like traditional, identifiable data) that is not effectively controlled for racial bias can exacerbate discriminatory outcomes.[241] Some researchers claim that they can eliminate bias in GAN-generated data by employing weak supervision and weighing input variables susceptible to bias,[242] though some commentators remain skeptical of this claim.[243]

Additionally, there is a minimal, but still present, possibility of “leakage” of PII if synthetic data is not paired with additional privacy preserving features like differential privacy.[244] At a high level, differential privacy is a technique whereby a statistician includes enough noise into a dataset to induce a sufficient level of deniability so that an entry of “yes” or “no” into a dataset becomes “maybe.”[245] This reduces the data’s utility by design to make it less useful for hackers, but also reduces utility for a lawful custodian.[246]

In the consumer data context, integrating synthetic data into the BigTech advertising model will likely reduce the precision with which these companies can bring advertisements. Because Google currently dominates the search engine market (so much so that they are facing antitrust action in the U.S.) with an 88% market share in the U.S. market for general search engines and 70% in the search advertising market,[247] they are in such a powerful position that advertisers will have a difficult time finding a better place to take their advertising expenditures. Even if synthetic data is not as useful as genuine user data, GDPR applies significant legal and regulatory risk evenly to competitors.[248] Thus BigTech’s dominance in the attention market is highly unlikely to change solely due to the use of synthetic data.

Further, as discussed in Part I infra, psychographic targeting may present significant risks, as illustrated in the Cambridge Analytica affair. Synthetic data’s hampering of such targeted advertisement efforts may thus be viewed as a positive by some.[249] Additionally, there is a valid concern regarding the validation of the underlying dataset from which synthetic data is generated. Without adequate validation methods, like the NIH’s use of PHI validated by hospitals,[250] synthetic data could be used nefariously to mislead people who rely on it if the data or validation methods are not available for independent scrutiny.


The statistical and functional equivalence between synthetic and authentic data alleviates the tension between BigTech’s enormous appetite for personal data and the privacy requirements of current data privacy laws. Synthetic data, by definition, is anonymized data under GDPR’s Recital 26 and similarly falls outside of the scope of the U.S.’s several definitions of “personal information,” as there is no way for an outside observer to identify the original data subjects underlying the synthetic dataset. Yet, despite this lack of identifiability, synthetic data preserves the statistical outcomes, and thus the utility, of the underlying authentic data. As one commentator aptly noted, “[i]f an organization can identify all of its personal data, take it out of the data security and compliance equation completely—rending it useless to hackers, insider threats, and regulation scope—it can eliminate a huge amount of risk, and drastically reduce the cost of compliance.”[251] Thus, an innovation such as synthetic data could help privacy-conscious data subjects and anxious BigTech CEOs alike sleep better at night knowing that the big data engines are still humming while user privacy is being protected.

By incorporating truly anonymous, privacy-compliant, synthetic data into the BigTech business model, companies like Google and Facebook could continue to operate in their current, highly successful fashion while resolving the challenges presented by the Privacy-Utility Tradeoff by protecting their users’ privacy while continuing to profit off of mass data collection. Facebook and Google could provide advertisers with synthetic datasets that reflect unique consumer consumption trends that, while not as specific and granular as they are currently, are effective enough for them to track changes in market trends and advertise to potential consumers. Consumers who are unbothered by the amount of personal data currently collected by BigTech companies can opt into data collection from their use of the platforms or their devices, decreasing controllers’ threshold for the volume of useful data. In exchange, advertisers can offer discounts for users who opt in. This would provide BigTech enough seed data for a synthetic dataset to accurately replicate and would give advertisers a means to continue to reach their target audience.

Even if this practice were not as effective for targeted advertisements, causing advertisers and political campaigns to gripe at the decline in the return on investment from marketing expenditures, data controllers and their shareholders can avoid hefty blows to their bottom lines caused by violations of GDPR and similar forthcoming privacy regulations and mitigate reputational damage as users have begun to prioritize privacy. Ultimately, the potential decline in advertising effectiveness is vastly outweighed by the substantial public policy interest in protecting individuals’ rights to privacy and providing users with a way to escape the invasive and Orwellian digital world we have found ourselves in.[252]


.* Associate attorney at Horizons Law & Consulting Group specializing in corporate and securities work for startups in the blockchain and digital asset industry. The author wishes to thank the University of Florida Levin College of Law for receipt of the Governor’s Scholarship and Dean Amy Stein of the University of Florida Levin College of Law for her work on this Article and for an outstanding education in the intersection of artificial intelligence and the law.

1. THE SOCIAL DILEMMA (Exposure Labs 2020).

2. See Matthew Humerick, Taking AI Personally: How the E.U. Must Learn to Balance the Interests of Personal Data Privacy & Artificial Intelligence, 34 SANTA CLARA HIGH TECH. L.J. 393, 418 (2018); Thomson Reuters, Top Five Concerns with GDPR Compliance, (last visited Sept. 25, 2020); Elizabeth L. Feld, United States Data Privacy Law: The Domino Effect After the GDPR, 24 N.C. BANKING INST. 481, 486 (Mar. 2020).

3. Council Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Council Directive 95/46/EC (General Data Protection Regulation), 2016 O.J. (L 119) 1 (EU) [hereinafter GDPR].

4. See, e.g., California Consumer Privacy Act, CAL. CIV. CODE § 1798.100–199.95 (West 2018); see also Biometric Information Privacy Act, 740 ILL. COMP. STAT. 14/15 (2008).

5. See Facebook, Inc., Annual Report (Form 10-K at 7) (Dec. 31, 2019) (“We generate substantially all of our revenue from selling advertising placements to marketers. Our ads enable marketers to reach people based on a variety of factors including age, gender, location, interests, and behaviors.”); see also Alphabet, Inc., Annual Report (Form 10-K at 9) (Dec. 31, 2019) (“We generated over 83% of total revenues from the display of ads online in 2019.”).

6. Bob Violino, Data privacy rules are sweeping across the globe, and getting stricter, CNBC (Dec. 22, 2022, 11:21 AM), sweeping-across-the-globe-and-getting-stricter.html [].

7. GDPR, supra note 3, at art. 3.

8. Id. at art. 77.

9. GDPR Enforcement Tracker, CMS LEGAL, [] (last visited Feb. 17, 2023). Fines are reflected in Euros and were converted to U.S. Dollars by a EUR/USD exchange rate of 1.0696 as of February 17, 2023.

10. See David M. Parker et al., Privacy and Informed Consent for Research in the Age of Big Data, 123 PENN ST. L. REV. 703, 711 (2019). See also Jeffrey Dobin, The CCPA, Facebook’s Potential $60 Billion Fine & How AI Improves Compliance, MOSTLY AI (Feb. 18, 2020), compliance/ [].

11. Steven M. Bellovin et al., Privacy and Synthetic Datasets, 22 STAN. TECH. L. REV. 1, 4 (2019).

12. See generally Kashmir Hill, The Secretive Company That Might End Privacy as We Know It, N.Y. TIMES (Jan. 18, 2020), view-privacy-facial-recognition.html []; Parker et al., supra note 10.

13. Webster’s Dictionary defines “synthetic” as “devised, arranged, or fabricated for special situations to imitate or replace usual realities.” Synthetic, MERRIAM-WEBSTER DICTIONARY, [] (last visited Nov. 21, 2020).

14. Javier Tordable, Synthetic Data Creates Real Results, FORBES (Aug. 26, 2020, 1:10 PM), [].

15. GDPR, supra note 3, at Recital 26.

16. See infra Part III.A.

17. See, e.g., Washington Post Editorial Board, Congress Has Another Chance at Privacy Legislation. It Can’t Afford to Fail Again, WASH. POST (May 9, 2021), it-cant-afford-to-fail-again/2021/05/08/9409fa28-af5c-11eb-ab4c-986555a1c511_story.html []; Lauren Feiner, Congress Has Failed to Pass Big Tech Legislation in 4 Years Leading Up to the Next Election, CNBC (Oct. 31, 2020), .html []. See also Lauren Feiner, FTC Commissioners Agree They

18. See Charter of Fundamental Rights of the European Union, 2012 O.J. (C 326) 2 at arts. 7–8, 10–11, [].

19. A hypothetical conversation between an EU and a U.S. citizen highlights how much more complex privacy law is in the U.S. compared to the EU even well before GDPR’s passage. See Daniel Solove, The Chaos of US Privacy Law, LINKEDIN (Oct. 24, 2012), http:// [].

20. See infra Part I.B.

21. See, e.g., Maria Curi, Outlook for Big Tech Dims as Omnibus Excludes Key Measures, BL (Dec. 20, 2022, 9:51 AM), outlook-for-big-tech-bills-dims-as-omnibus-excludes-key-measures [ D9MN]; Alex LaCasse et al., A look back at privacy and data protection in 2022, IAPP (Dec. 20, 2022), []; Gopal Ratnam, Lawmakers will face familiar technology issues next Congress, Roll Call (Dec. 13, 2022, 7:00AM), will-face-familiar-technology-issues-next-congress/ [].

22. See California Privacy Rights Act 2020 Cal. Legis. Serv. Prop. 24 (West) (amending CCPA, effective Jan. 1, 2023).

23. The EU-U.S. Privacy Shield previously shielded U.S. technology companies from liability for violations of EU law when collecting European residents’ data until it was invalidated by European regulators. Ruth Boardman & Ariane Mole, Schrems II: Privacy Shield Invalid, SCCS Survive. What Happens Now?, BIRD & BIRD (July 15, 2020), [].

24. Stephen Zafarino, The GDPR and the Effect on US Ad Tech, CIO (June 28, 2018, 9:40AM), [].

25. GDPR, supra note 3; see also Kimberly A. Houser & W. Gregory Voss, GDPR: The End of Google and Facebook or A New Paradigm in Data Privacy?, 25 RICH. J.L. & TECH. 3, 58 (2018).

26. Giovanni Burttarelli, The EU GDPR as a Clarion Call For a New Global Digital Gold Standard, EUROPEAN DATA PROTECTION SUPERVISOR (Apr. 1, 2016), publications/press-news/blog/eu-gdpr-clarion-call-new-global-digital-gold-standard_en [].

27. GDPR, supra note 3, at Recital 1 (“The protection of natural persons in relation to the processing of personal data is a fundamental right.”).

28. Id. at Recital 4 (“The processing of personal data should be designed to serve mankind. The right to the protection of personal data is not an absolute right; it must be considered in relation to its function in society and be balanced against other fundamental rights, in accordance with the principle of proportionality.”).

29. Id. at art. 3. See also Ben Wolford, Does the GDPR apply to companies outside of the EU?, GDPR.EU, [] (last visited Feb. 18, 2023).

30. Id. at arts. 4(7)–(8).

31. Id. at arts. 3, 4(1)–(2).

32. Id. at arts. 3, 4(1).

33. Id. at art. 4(23)(b) (emphasis added).

34. Id.

35. Id. at art. 4(1).

36. GDPR, supra note 3, at art. 4(1).

37. Id. at arts. 13–14.

38. Id. at art. 15.

39. Id. at art. 16.

2. Penalties for Violations Financial penalties are not just reserved for controllers who violate Article 6, however. Article 77 of GDPR provides every data subject with a private right of action against any processor who touches their personal data.48 Moreover, Article 83 grants EU regulators with enforcement authority against processors for non-compliance, a two-tiered penalty hierarchy, and provides guidelines for imposing appropriate penalties.49 Depending on the violation and the “nature, gravity and duration of the infringement,” fines can amount to 2% of a violator’s global annual turnover or €10 million, whichever is higher; or 4% of global annual

40. Id. at art. 17.

41. Id. at art. 18.

42. GDPR, supra note 3, at art. 20.

43. Id. at art 21.

44. Id. at art. 22.

45. Id. at art. 34.

46. As of February 18, 2023, at least 500 fines for “insufficient legal basis for data processing” under Article 6 have been issued, totaling $489,616,113.21. GDPR Enforcement Tracker, supra note 9.

47. GDPR, supra note 3, at arts. 6(1)(a)–(f).

48. Id. at art. 77.

49. Id. at art. 86.

50. Id. at arts. 77, 83(2).

51. Facebook, Inc, supra note 5, at 80; Alphabet, Inc., supra note 5, at 50.

52. CHRIS D. LINEBAUGH, CONG. RSCH. SERV., LSB10303, ENFORCING FEDERAL PRIVACY LAW—CONSTITUTIONAL LIMITATIONS ON PRIVATE RIGHTS OF ACTION (2019); see also Davidson Lentz, The Top 9 Federal Data Privacy Laws, TN CYBERSECURITY LAW (Nov. 14, 2019), (noting that the FTC Act, COPPA, GLBA, HIPAA, and FERPA do not contain private rights of action, but FCRA, CFAA, and ECPA do.).

53. Shannon Togawa Mercer, The Limitations of European Data Protection as a Model for Global Privacy Regulation, 114 AJIL UNBOUND 20, 22 (2020).

54. Griswold v. Conn., 381 U.S. 479, 484 (1965) (“Various guarantees create zones of privacy.”).

55. Katz v. U.S., 389 U.S. 347, 360 (1967) (Harlan, J., concurring).

56. 381 U.S. at 485 (“The present case, then, concerns a relationship lying within the zone of privacy created by several fundamental constitutional guarantees.”).

57. Id. at 484 (internal citation omitted).

58. United States v. Jones, 565 U.S. 400 (2012).

59. United States v. Maynard, 615 F.3d 544, 562 (D.C. Cir. 2010), aff’d in part sub nom. United States v. Jones, 565 U.S. 400 (2012) (“A person who knows all of another’s travels can deduce whether he is a weekly church goer, a heavy drinker, a regular at the gym, an unfaithful husband, an outpatient receiving medical treatment, an associate of particular individuals or political groups—and not just one such fact about a person, but all such facts.”).

60. Rob Silvers, Marilyn (illustration), in Robert S. Silvers, Photomosaics: Putting Pictures in Their Place, MASS. INST. OF TECH. 84 (1996), 2/38491951-MIT.pdf [].

61. Orin S. Kerr, The Mosaic Theory of the Fourth Amendment, 111 MICH. L. REV. 311, 312–15 (Dec. 2012).

62. Riley v. California, 573 U.S. 373 (2014).

63. Id. at 387.

64. Katz v. United States, 389 U.S. 347, 360–61 (Harlan, J. concurring) (stating that “a person has a constitutionally protected reasonable expectation of privacy . . . [and] that electronic as well as physical intrusion into a place that is in this sense private may constitute a violation of the Fourth Amendment[.]”).

65. Riley, 573 U.S. at 390 (“Moreover, in situations in which an arrest might trigger a remote-wipe attempt or an officer discovers an unlocked phone, it is not clear that the ability to conduct a warrantless search would make much of a difference.”).

66. Katz, 389 U.S. at 360–61.

67. Smith v. Maryland, 442 U.S. 735, 743–44 (1979).

68. United States v. Miller, 425 U.S. 435, 443 (1976).

69. United States v. Jones, 565 U.S. 400, 417 (2012) (Sotomayor, J., concurring).

70. Carpenter v. United States, 138 S. Ct. 2206, 2223 (2018).

71. Olmstead v. United States, 277 U.S. 438, 473–74 (1928) (Brandeis, J., dissenting).

72. Carpenter, 138 S. Ct. at 2223 (quoting Olmstead, 277 U.S. at 473–74 (Brandeis, J., dissenting)).

73. See Samuel D. Warren & Louis D. Brandeis, The Right to Privacy, 4 HARV. L. REV. 193, 213 (1890); see also Jed Rubenfeld, The Right of Privacy, 102 HARV. L. REV. 737, 744–47 (1989).


75. See 42 U.S.C. § 1320.

76. Family Educational Rights and Privacy Act, 20 U.S.C. § 1232g.

77. See Omer Tene, Privacy Law’s Midlife Crisis: A Critical Assessment of the Second Wave of Global Privacy Laws, 74 OHIO ST. L.J. 1217, 1225 (2013).

78. Bellovin et al., supra note 11.

79. Paul Ohm, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, 57 UCLA L. REV. 1701, 1701 (2020).

80. See 15 U.S.C. § 41.

81. See 15 U.S.C. § 45 (prohibiting the use of deceptive or unfair trade practices, which has been interpreted broadly to cover data privacy and antitrust).

82. See 15 U.S.C. §§ 6801–6809 (consumer financial data); 15 U.S.C. §§ 6501–6506 (children’s online privacy); 15 U.S.C. §§ 7701–7713 (unsolicited electronic messages); 15 U.S.C. §§ 6101–6108 (telemarketing calls).

83. The FTC’s latest data privacy report from 2010 provides a good history of the FTC’s role in data privacy governance. The 2010 Report proposed a legal framework for Congress to impose that features privacy by design, simplified consumer choice, and transparency as core components, though no such law has been passed to date. See FED. TRADE COMM’N, 2010 ANNUAL REPORT 38, 39–79 (2010). The proposed framework in the 2010 FTC Report is based on the FTC’s “Fair Information Practice Principles” (FIPPs), published in 2000. The four key components are: notice, choice, access and security. These principles were modeled after contemporaneously emerging European privacy legislation and appear to be similar to a more primitive form of GDPR’s core principles, however FIPPs are merely a nonbinding set of guidelines, as Congress failed to codify them into law. See FED. TRADE COMM’N, PRIVACY ONLINE: FAIR INFORMATION PRACTICES IN THE ELECTRONIC MARKETPLACE i (2000) [hereinafter FIPPs Report], default/files/documents/reports/privacy-online-fair-information-practices-electronic-market place-federal-trade-commission-report/privacy2000.pdf [].

84. See Bellovin et al., supra note 11, at 4; see also Paul M. Schwartz & Daniel J. Solove, The PII Problem: Privacy and a New Concept of Personally Identifiable Information, 86 N.Y.U. L. REV. 1814, 1829–35 (2011); Ohm, supra note 79, at 1701; Tene, supra note 77, at 1217.

85. Schwartz & Solove, supra note 84, at 1819.

86. Video Privacy Protection Act of 1988, 18 U.S.C. § 2710(a)(3) (defining personally identifiable information as “information which identifies a person.”).

87. Schwartz & Solove, supra note 84, at 1829.

88. Gramm-Leach-Bliley Act of 1999, 15 U.S.C. § 6809(4)(A).

89. See Schwartz & Solove, supra note 84, at 1828. See also infra Part II.A (discussing why de-identification and anonymization are misnomers and how current technology has rendered de- identification as an insufficient method of privacy protection).

90. 15 U.S.C. § 6809.

91. 16 C.F.R. § 313.3(o)(2)(ii)(B) (2001); see also Benjamin Charkow, The Control Over the De-Identification of Data, 21 CARDOZO ARTS & ENT. L.J. 195, 198 (2003).

92. 45 C.F.R. § 164.514(e) (2013).

93. Id.

94. Id. § 160.103; see also Centers for Medicare & Medicaid Services, Are You a Covered Entity?, A-ACA/AreYouaCoveredEntity.html [] (May 26, 2022, 10:37 AM).

95. 45 C.F.R. § 160.103 (2000).

96. Id.

97. Health Information Technology for Economic and Clinical Health Act (“HITECH Act”), Pub. L. No. 111-5, 123 Stat. 226 (2009) (codified at 42 U.S.C. §§ 300jj, 17901).

98. 45 C.F.R. § 160.404(b)(2)(i)(A), (B) (2009).

99. 45 C.F.R. §§ 160.101–.552, 164.102–.106, 164.500–.534 (2013).

100. 45 C.F.R. §§ 160.101–.552, 164.102–.106, 164.302–318 (2013).

101. See FIPPs Report, supra note 83, at 13.

102. 45 C.F.R. § 160.404(b)(2)(i)(A)–(B) (2009). For more information about the enforcement process, see Enforcement Process, DEP’T OF HEALTH & HUM. SERV. (Sept. 17, 2021), ex.html [].

103. Anthem Resolution Agreement, DEP’T OF HELATH & HUM. SERV. (Oct. 15, 2018),

104. Ohm, supra note 79, at 1740.

105. California Consumer Privacy Act, CAL. CIV. CODE § 1798.100–199.95 (West 2018).

106. Id. § 1798.110.

107. Id. § 1798.155(a) (“Any business, service provider, contractor, or other person that violates this title shall be liable for an administrative fine of not more than two thousand five hundred dollars ($2,500) for each violation or seven thousand five hundred dollars ($7,500) for each intentional violation or violations . . . .).

108. Id. § 1798.140(c).

109. Id. § 1798.80(a).

110. Id. § 1798.140(o)(1).

111. Id.

112. Id. § 1798.155.

113. Id. § 1798.150(a)(1).

114. Id. § 1798.150(a)(1)(A).

115. See, e.g., Lauren Davis, The Impact of the California Consumer Privacy Act on Financial Institutions, Across the Nation, 24 N.C. BANKING INST. 499, 499–501 (2020).

116. California Consumer Privacy Act, Cal. Civ. Code § 1798.100 (West 2018).

117. See Ruth Boardman & Ariane Mole, Schrems II: Privacy Shield Invalid, SCCS Survive. What Happens Now?, BIRD & BIRD (July 15, 2020), 2020/global/schrems-ii-judgment-privacy-shield-invalid-sccs-survive-but-what-happens-now []; Davide Szép, America’s Tech Giants: It’s Back to the Drawing Board on European Data, 92 N.Y. ST. BAR ASS’N. J. 45 (Nov. 2020).

118. Commission Decision 2000/520/EC, 2000 O.J. (L 215) [hereinafter Decision 2000/520/EC].

119. European Commission Press Release IP/16/433, Restoring Trust in Transatlantic Data Flows Through Strong Safeguards: European Commission Presents EU-U.S. Privacy Shield (Feb. 29, 2016), [ WTX5].

120. Data Protection Commissioner v. Facebook Ireland Ltd. Court of Justice of the European Union Invalidates the EU-U.S. Privacy Shield., 134 HARV. L. REV. 1567, 1569 n.28 (2021).

121. Anne Beade, Max Schrems, Reluctant Austrian David to Internet Goliaths, TECH XPLORE (Apr. 21, 2021), david.html [].

122. See Emily Linn, A Look into the Data Privacy Crystal Ball: A Survey Of Possible Outcomes for the EU-U.S. Privacy Shield Agreement, 50 VAND. J. TRANSNAT’L L. 1311, 1322– 23 (2017); see also Sherri J. Deckelboim, Consumer Privacy on an International Scale: Conflicting Viewpoints Underlying the EU-U.S. Privacy Shield Framework and How the Framework Will Impact Privacy Advocates, National Security and Businesses, 48 GEO. J. INT’L L. 263, 279–81 (2016).

123. Id.

124. Linn, supra note 122, at 1322–23; 15 U.S.C. §§ 41–58.

125. Commission Decision 25/08/2000, 2000 O.J. (L 215) 7.

126. Case C-362/14, Schrems v. Data Prot. Comm’r, ECLI:EU:C:2015:650, ¶¶ 1–2 (Oct. 6, 2015) [hereinafter Schrems I].


128. Schrems I, ECLI:EU:C:2015:650 at ¶¶ 26–30.

129. Id. at ¶ 31. For a cogent synopsis of the Schrems I holding, see Linn, supra note 122, at 1320–25.

130. See European Commission, Directorate-General For Justice And Consumers, Guide To The Eu-U.S. Privacy Shield (2016) [] (archived Sept. 21, 2017); Privacy Shield Framework: Overview, INT’L TRADE ADMIN., article?id=OVERVIEW [] (last visited Apr. 30, 2021).

131. See James Titcomb, Facebook Signs Up to Privacy Shield Data Treaty, THE TELEGRAPH (Oct. 16, 2016, 8:15 PM), up-to-privacy-shield-data-treaty/ []; John Frank, EU-U.S. Privacy

132. “[T]he replacement [for the Safe Harbor program] that the Commission has proposed right now called ‘privacy shield’ is basically safe harbor once again.” European Parliament, ‘Privacy Shield: Safe Harbour with teeny tiny changes’- Max Schrems, YOUTUBE (Mar. 18, 2016), []; see also Amar Toor, EU-US Privacy Shield Agreement Goes Into Effect: Tech Companies Welcome New Data Transfer Agreement, But Activists Say it Doesn’t Do Enough to Protect Privacy, THE VERGE (July 12, 2016), sfer-privacy [].

133. Allison Callahan-Slaughter, Comment, Lipstick on a Pig: The Future of Transnational Data Flow Between the EU and the United States, 25 TUL. J. INT’L & COMP. L. 239, 253–54 (2016); see also Linn, supra note 122, at 1333; see also Letter from Robert S. Litt, Gen. Couns. of the Off. of the Dir. of Nat’l Intel., to Justin S. Antonipillai, Couns., U.S. Dep’t of Com. & Ted Dean, Deputy Assistant Sec’y, Int’l Trade Admin. (Feb. 22, 2016), https://www.privacy [].

134. Establishing Article III standing in challenges to U.S. government surveillance is a notoriously difficult hurdle to meet. See Margaret B. Kwoka, The Procedural Exceptionalism of National Security Secrecy, 97 B.U. L. REV. 103, 121–24 (2017); Christopher Slobogin, Standing and Covert Surveillance, 42 PEPP. L. REV. 517, 532–33 (2015) (arguing that unconstitutional surveillance programs might be allowed to continue without affirmative action from the legislature or the executive branch to discontinue them); see also Steven Graziano, An Unconstitutional Work of Art: Discussing Where the Federal Government’s Discrete Intrusions Into One’s Privacy Become an Unconstitutional Search Through Mosaic Theory, 17 MINN. J.L. SCI. & TECH. 977, 992 (2016).

135. Case C-311/18, Data Prot. Comm’r v. Facebook Ireland Ltd., ECLI:EU:C:2020:559, 163–64 (July 16, 2020).

136. Data Protection Commissioner v. Facebook Ireland Ltd. Court of Justice of the European Union Invalidates the EU-U.S. Privacy Shield., 134 HARV. L. REV. 1567, 1569 n.28 (2021) (citations omitted) (“The CJEU noted that ‘the assessment of the level of protection afforded in the context of such a transfer must,’ inter alia, consider the relevant laws of a third country ‘as regards any access by the public authorities of that third country to the personal data transferred.’ The court found that the GDPR’s protections apply to the commercial transfer of data to a third country, regardless of the likelihood that that data will ‘be processed by the authorities of [that] third country . . . for the purposes of public security, defence and State security.’”).

137. 50 U.S.C. § 1881a.

138. Exec. Order No. 12,333, 3 C.F.R. § 200 (1982), reprinted as amended in 50 U.S.C. § 3001.

139. Press Release, Off. of the Press Sec’y, Presidential Policy Directive — Signals Intelligence Activities (Jan. 17, 2014), presidential-policy-directive-signals-intelligence-activities [].

140. Leah Shepherd, EU Adopts New Standard Contractual Clauses for Data Transfers, SHRM (July 28, 2021), standard-contractual-clauses-data-transfers.aspx [].

141. Id.

142. Sharp Cookie Advisors, Schrems II a Summary – All You Need to Know, GDPR SUMMARY (Nov. 23, 2020), [ LNRS].

143. Caitlin Fennessy, The EU-US Data Privacy Framework: A new era for data transfers?, IAPP (Oct. 7, 2022), data-transfers/ [].

144. Id.

145. See, e.g., Martin J. Conyon, Big technology and data privacy, 46 Cambridge J. Econs. 1369, 1369 (2023) (“The collection of individually identifiable data is at the heart of the Facebook business model. The large social networking companies use personal data as a resource, store and bundle that data, and sell it to third parties.”).

146. See Samuel G. Goldberg et al., Regulating Privacy Online: An Economic Evaluation of The GDPR 17–19 (Law & Econs. Ctr. Geo. Mason U. Scalia L. Sch., Rsch. Paper Series No. 22- 025, 2022) (finding “a reduction of approximately 12% in both EU user website pageviews and website e-commerce revenue . . . after GDPR’s enforcement deadline).

147. Amir Tabakovic, Only a Little Bit Re-Identifiable?! Good Luck With That…, MOSTLY AI (July 31, 2020), [ /TP82-DLQZ]; see also Ohm, supra note 79, at 1752.

148. Shuchi Chawla et al., Toward Privacy in Public Databases, in 2 THEORY OF CRYPTOGRAPHY CONFERENCE 363, 364 (Joe Kilian ed., 2005).

149. Tabakovic, supra note 147.

150. Id.

151. See Ohm, supra note 79, at 1724. Professor Ohm’s Article expertly details the numerous ways in which reidentification and linkage between individual people and their data within “anonymous” data sets can occur.

152. Tabakovic, supra note 147.

153. 45 C.F.R. §§ 164.502(d), 164.514(a)–(b) (2000).

154. GDPR, supra note 3, at Recital 26.

155. Facebook 10-K, supra note 5, at 62; Alphabet 10-K, supra note 5, at 9.

156. Alphabet 10-K, supra note 5, at 27 (emphasis added). “MD&A” is an abbreviation for “Management’s Discussion and Analysis” in SEC filings. This section’s purpose is to provide the public with a more detailed explanation of events and conditions underlying the financial data disclosed.

157. Id. at 6.


159. See Ina Fried, What Facebook Knows About You, AXIOS (Jan. 2, 2019), ddac8cd47c34.html []; see also Rob Mardisalu, What Does Google Know About You: A Complete Guide, THEBESTVPN (July 9, 2018), does-google-know-about-you/#infographic []. The infographic and the accompanying article provide an illustrative visual about the information Google collects on its users and how the company uses it.

160. Mardisalu, supra note 159.

161. Id.

162. Id.

163. See Ryan Nakashima, AP Exclusive: Google Tracks Your Movements, Like it or Not, ASSOCIATED PRESS (Aug. 13, 2018), a07c1af0ecb []; see also Alfred Ng, Facebook Still Tracks You After You Deactivate Account: Deactivation does nothing for your privacy, CNET (Apr. 9, 2019), [].

164. See THE SOCIAL DILEMMA (Exposure Labs 2020); see also Jonathan Haidt & Tobias Rose-Stockwell, The Dark Psychology of Social Networks: Why it Feels Like Everything is Going Haywire, THE ATLANTIC (Dec. 2019), social-media-democracy/600763/ []; Tristan Harris, Our Brains are No Match for Our Technology, N.Y. TIMES (Dec. 5, 2019), 2019/12/05/opinion/digital-technology-brain.html [].

165. Sue Halpern, Cambridge Analytica and the Perils of Psychographics, NEW YORKER (Mar. 30, 2018), perils-of-psychographics [].

166. Id.

167. Mike Isaac & Cecilia Kang, Facebook Expects to Be Fined Up to $5 Billion by F.T.C. Over Privacy Issues, N.Y. TIMES (Apr. 24, 2019), technology/facebook-ftc-fine-privacy.html []; Natasha Lomas, First Major GDPR decisions looming on Twitter and Facebook, TECHCRUNCH (May 22, 2020), []; see also Emily Price, The EU Could Hit Facebook with Billions in Fines Over Privacy Violations, DIGITAL TRENDS (Aug. 12, 2019), [].

168. Facebook 10-K, supra note 5, at 11.

169. Alphabet 10-K, supra note 5.

170. Isaac & Kang, supra note 167.

171. Ohm, supra note 79, at 1706–09.

172. Bellovin et al., supra note 11.

173. Ohm, supra note 79, at 1706–15.

174. 45 C.F.R. § 164.514(e) (2013).

175. Id. § 164.514(e).

176. Id. § 164.514.

177. Id. § 164.502(d)(2).

178. Id. § 164.514.

179. Ohm, supra note 79, at 1716.

180. Id.

181. Ohm, supra note 79, at 1707–08.

182. Id. at 1719–20 (citing Latanya Sweeney, Uniqueness of Simple Demographics in the U.S. Population (Lab’y for Int’l Data Priv., Working Paper No. 3, 2000)).

183. Philippe Golle, Revisiting the Uniqueness of Simple Demographics in the US Population, ASS’N FOR COMPUTING MACH. 77, 78 (2006).

184. Sophie Stalla-Bourdillon & Alison Knight, Anonymous Data v. Personal Data – False Debate: An EU Perspective on Anonymization, Pseudonymization and Personal Data, 34 WIS. INT’L L.J. 284, 286–87 (2016).

185. See id.

186. GDPR, supra note 3, at Recital 26.

187. Bellovin et al., supra note 11, at 9–18; see also Ohm, supra note 79, at 1744–48.

188. GDPR Enforcement Tracker, supra note 9.

189. Sead Fadilpašić, Majority of GDPR Penalties Issued as a Result of These Two Problems, IT PRO PORTAL (Oct. 16, 2020), problems/.

190. Fines reflected here were current as of October 16, 2020, but large fines such as the £183 million (approximately $219 million USD) fine levied upon British Airways by the U.K.’s Information Commissioner’s Office was reduced to £20 million (approximately $24 million USD) on appeal due to the financial hardship experienced by the airline during the COVID-19 pandemic. Carly Page, U.K. Privacy Watchdog Hits British Airways With Record-Breaking £20 Million GDPR Fine, FORBES (Oct. 16, 2020, 5:39 AM), /16/ico-hits-british-airways-with-record-breaking-fine-for-2018-data-breach/?sh=60b15bd9481a [].

191. Fadilpašić, supra note 189.

192. GDPR Enforcement Tracker, supra note 9.

193. Press Release, UK Information Commissioner’s Office, Intention to Fine British Airways £183.39m Under GDPR for Data Breach (July 8, 2019); see also Statement, UK Information Commissioner’s Office, Intention to Fine Marriott International, Inc More Than £99 Million Under GDPR for Data Breach (July 9, 2019).

194. GDPR, supra note 3, at art. 32(1).

195. See supra Part I.A.

196. “Personal data shall be . . . processed in a manner that ensures appropriate security of the personal data, including protection against unauthorized(sic) or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organizational(sic) measures (‘integrity and confidentiality’).” GDPR, supra note 3, at art. 5(1)(f).

197. GDPR, supra note 3, at art. 32.

198. See Press Release, UK Information Commissioner’s Office, supra note 193.

199. Page, supra note 190.

200. Press Release, Commission Nationale de l’Informatique et des Libertés (CNIL), The CNIL’s Restricted Committee Imposes a Financial Penalty of 50 Million Euros Against Google LLC (Jan. 21, 2019) [hereinafter CNIL Decision].

201. Id.

202. Id.

203. Id.

204. Lomas, supra note 167.

205. Inc., Quarterly Report (Form 10-Q) (June 30, 2021), 20210630.htm#i5986f88ea1e04d5c91ff09fed8d716f0_103.

206. Amazon Fined 746 Million Euros Following our Collective Legal Action, La Quadrature du Net (July 30, 2021), euros-following-our-collective-legal-action/ [].

207. Press Release, Decision Regarding Amazon Europe Core S.À.R.L., Luxembourg National Commission for Data Protection (Aug. 6, 2021), actualites/international/2021/08/decision-amazon-2.html [].

208. Jonathan Armstrong & Katherine Eyres, Client Alert: Amazon fined €746 million by Luxemburg Data Protection Regulator for GDPR infringements, Cordery Legal Compliance (Oct. 18, 2021), [ 55GY].

209. Id.

210. Lomas, supra note 167, at n.17 (highlighting a list of exemplary cases).

211. GDPR Enforcement Tracker, supra note 9.

212. See FED. TRADE COMM’N, PRIVACY & DATA SECURITY UPDATE: 2019, at 2 (2020), privacy-data-security-report-508.pdf [] (The FTC brought “130 spam and spyware cases and 75 general privacy lawsuits,” in 2018, and “130 spam and spyware cases and 50 general privacy lawsuits” in 2017.); see also FED. TRADE COMM’N, PRIVACY & DATA SECURITY UPDATE: 2018, at 3 (2019), privacy-data-security-update-2018/2018-privacy-data-security-report-508.pdf [ 24NR-DTHG]; FED. TRADE COMM’N, PRIVACY & DATA SECURITY UPDATE: 2017, at 2 (2018), -commissions-enforcement-policy-initiatives-consumer/privacy_and_data_security_update_2017 .pdf [].

213. Press Release, Federal Trade Commission, FTC Imposes $5 Billion Penalty and Sweeping New Privacy Restrictions on Facebook (July 24, 2019), events/press-releases/2019/07/ftc-imposes-5-billion-penalty-sweeping-new-privacy-restrictions

214. Id.

215. Complaint at 1–5, United States v. Facebook, Inc., Case No. 19-cv-2184 (D.C. Cir. 2019), [ QN-CD5D].

216. Schrems II Landmark Ruling: A Detailed Analysis, NORTON ROSE FULBRIGHT (July 2020), landmark-ruling-a-detailed-analysis []; Schrems II Confirms Validity of EU Standard Contractual Clauses, Invalidates EU–U.S. Privacy Shield, JONES DAY (July 2020), [].

217. Randy Koch, Opinion: TikTok’s Data-Privacy Problem Has an East Solution: ‘Synthetic Data’, MARKETWATCH (Oct. 2, 2020, 7:57 AM), tiktoks-data-privacy-problem-has-an-easy-solution-synthetic-data-2020-10-02?siteid=yhoof2 [].

218. Tordable, supra note 14; Stefanie Koperniak, Artificial data give the same results as real data — without compromising privacy, MIT NEWS (Mar. 3, 2017), 2017/artificial-data-give-same-results-as-real-data-0303 [].

219. Grace Brodie, Five Compelling Use Cases for Synthetic Data, HAZY (June 1, 2020), [ Q5].

220. Daniel Nelson, What is Synthetic Data?, UNITE AI (Sept. 14, 2020), [].

221. Id.

222. Id.

223. Id.

224. Id.

225. Bellovin et al., supra note 11, at 5.

226. Nelson, supra note 220.

227. See Yashar Behzadi, Why Synthetic Data Could Be the Ultimate AI Disruptor, TDWI (June 28, 2019),

26. Synthetic data, however, does fall under the definition of “anonymous” as provided for in Recital 26 because it can never be linked back to the underlying data subject. Thus, as per Recital 26, “[t]he principles of data protection should therefore not apply” to synthetic data, which is not regulated under GDPR. Similarly, synthetic data does not fall under the several definitions of “personal information,” or the equivalent term, under the various U.S. data privacy laws, even HIPAA’s definition of PHI, as discussed in Part I. infra.229 U.S. privacy laws generally follow HIPAA’s permissible ruptor.aspx []; see also Brandi Vincent, NIH Partners with Israeli Startup to Generate Synthetic COVID-19 Data, NEXTGOV (June 18, 2020), -covid-19-data/166255/ []; The Rise of Synthetic Data to Help Developers Create and Train Algorithms Quickly and Affordably, INSIDEBIGDATA (May 8, 2018), algorithms-quickly-affordably/ []; Tordable, supra note 14.

228. GDPR, supra note 3, at Recital 26 (emphasis added).

229. 45 C.F.R. § 160.103 (2014).

230. See supra Part I.B.2.b.

231. See, e.g., Emil Protalinski, Zapata raises $38 million for quantum machine learning, VENTUREBEAT (Nov. 19, 2020), for-quantum-machine-learning/ [].

232. Sri Muppidi, Growing Applications of Synthetic Data, SIERRA VENTURES (Sept. 22, 2020, 6:00 AM), [].

233. Julia Evangelou Strait, Synthetic data mimics real patient data, accurately models COVID-19 pandemic, Wash. U. Sch. Med. St. Louis (Apr. 27, 2021), https://medicine. [].

234. Brandi Vincent, Synthetic Data Engine to Support NIH’s COVID-19 Research-Driving Effort, NEXTGOV (Jan. 14, 2021), data-engine-support-nihs-covid-19-research-driving-effort/171421/ [ F8SS].

235. Department of Health & Human Services – Office of National Coordinator for Health Information Technology, Synthetic Health Data Challenge, CHALLENGE.GOV (accessed Apr. 30, 2021),

236. Synthetic Health Data Generation to Accelerate Patient-Centered Outcomes Research, DEPT. HEALTH & HUM. SERVS., synthetic-health-data-generation-accelerate-patient-centered-outcomes [ DZCR] (July 21, 2021, 5:00PM).

237. See CNIL Decision, supra note 200.

238. Katherine Kemp, The US is taking on Google in a huge antitrust case—it could change the face of online search, TECH XPLORE (Oct. 21, 2020), google-huge-antitrust-caseit-online.html [].

239. Tordable, supra note 14 (“Acquiring and storing all of this data [for autonomous vehicle development] from live tests of real cars on real roads would have been too expensive and cumbersome.”).

240. See Synthetic Data Generator, AMAZON WEB SERVS., 20210116190558/ Data-Generator/B07XPJ8Z7M [] (last visited Nov. 29, 2020).

241. Todd Feathers, Fake Data Could Help Solve Machine Learning’s Bias Problem—if We Let It, SLATE (Sept. 17, 2020, 9:00 AM), artificial-intelligence-bias.html [].

242. Id. (citing Choi et al., Fair Generative Modeling Via Weak Supervision, ASS’N FOR COMPUT. MACH. (July 13, 2020), [].

243. Sage Lazzaro, AI experts refute Cvedia’s claim its synthetic data eliminates bias, Venture Beat (July 26, 2021, 2:20PM) claim-its-synthetic-data-eliminates-bias/ [].

244. See Bellovin et al., supra note 11, at 18–21.

245. Id.

246. Id.

247. Tiago Bianchi, Worldwide desktop market share of leading search engines from January 2015 to December 2022, Statista (Jan. 6, 2023), worldwide-market-share-of-search-engines/ [].

248. See supra Part I.A.2.

249. Rebecca Walker Reczek et al., Targeted Ads Don’t Just Make You More Likely to Buy — They Can Change How You Think About Yourself, HARV. BUS. REV. (Apr. 4, 2016), how-you-think-about-yourself [].

250. Vincent, supra note 234.

251. Randy Koch, GDPR, CCPA and Beyond: How Synthetic Data Can Reduce the Scope of Stringent Regulations, HELP NET SEC. (Apr. 14, 2020), 2020/04/14/synthetic-data/ [].

252. See supra Part II.A.2.

Leave a Reply

Your email address will not be published. Required fields are marked *