By the 1960s, the US government was using powerful mainframe computers to store and process an enormous amount of data on nearly every American. Corporations also used the machines to analyze sensitive information including consumer purchasing habits. There were no laws dictating what kind of data they could collect. Worries over supercharged surveillance soon emerged, especially after the publication of Vance Packard’s 1964 book, The Naked Society, which argued that technological change was causing the unprecedented erosion of privacy.
The next year, President Lyndon Johnson’s administration proposed merging hundreds of federal databases into one centralized National Data Bank. Congress, concerned about possible surveillance, pushed back and organized a Special Subcommittee on the Invasion of Privacy. Lawmakers worried the data bank, which would “pool statistics on millions of Americans,” could “possibly violate their secret lives,” The New York Times reported at the time. The project was never realized. Instead, Congress passed a series of laws governing the use of personal data, including the Fair Credit Reporting Act in 1970 and the Privacy Act in 1974. The regulations mandated transparency but did nothing to prevent the government and corporations from collecting information in the first place, argues technology historian Margaret O’Mara.
Toward the end of the 1960s, some scholars, including MIT political scientist Ithiel de Sola Pool, predicted that new computer technologies would continue to facilitate even more invasive personal data collection. The reality they envisioned began to take shape in the mid-1990s, when many Americans started using the internet. By the time most everyone was online, though, one of the first privacy battles over digital data brokers had already been fought: In 1990, Lotus Corporation and the credit bureau Equifax teamed up to create Lotus MarketPlace: Households, a CD-ROM marketing product that was advertised to contain names, income ranges, addresses, and other information about more than 120 million Americans. It quickly caused an uproar among privacy advocates on digital forums like Usenet; over 30,000 people contacted Lotus to opt out of the database. It was ultimately canceled before it was even released. But the scandal didn’t stop other companies from creating massive data sets of consumer information in the future.
Several years later, ads began permeating the web. In the beginning, online advertising remained largely anonymous. While you may have seen ads for skiing if you looked up winter sports, websites couldn’t connect you to your real identity. (HotWired.com, the online version of WIRED, was the first website to run a banner ad in 1994, as part of a campaign for AT&T.) Then, in 1999, digital ad giant DoubleClick ignited a privacy scandal when it tried to de-anonymize its ads by merging with the enormous data broker Abacus Direct.
Privacy groups argued that DoubleClick could have used personal information collected by the data broker to target ads based on people’s real names. They petitioned the Federal Trade Commission, arguing that the practice would amount to unlawful tracking. As a result, DoubleClick sold the firm at a loss in 2006, and the Network Advertising Initiative was created, a trade group that developed standards for online advertising, including requiring companies to notify users when their personal data is being collected.
The Future of Personal Data Collection
Personal information is currently collected primarily through screens, when people use computers and smartphones. The coming years will bring the widespread adoption of new data-guzzling devices, like smart speakers, censor-embedded clothing, and wearable health monitors. Even those who refrain from using these devices will likely have their data gathered, by things like facial recognition-enabled surveillance cameras installed on street corners. In many ways, this future has already begun: Taylor Swift fans have had their face data collected, and Amazon Echos are listening in on millions of homes.
We haven’t decided, though, how to navigate this new data-filled reality. Should colleges be permitted to digitally track their teenage applicants? Do we really want health insurance companies monitoring our Instagram posts? Governments, artists, academics, and citizens will think about these questions and plenty more.
And as scientists push the boundaries of what’s possible with artificial intelligence, we will also need to learn to make sense of personal data that isn’t even real, at least in that it didn’t come from humans. For example, algorithms are already generating “fake” data for other algorithms to train on. So-called deepfake technology allows propagandists and hoaxers to leverage social media photos to make videos depicting events that never happened. AI can now create millions of synthetic faces that don’t belong to anyone, altering the meaning of stolen identity. This fraudulent data could further distort social media and other parts of the internet. Imagine trying to discern whether a Tinder match or the person you followed on Instagram actually exists.