Lexicon: Beware the Data Brokers

Shadowy companies, digital and analog, have found ways to collect user data for profit for many years. Facebook has made the practice easier than ever while eluding the the untoward name.

By • -

When a word or phrase suddenly and (sometimes) unexpectedly hits our collective consciousness, you cannot stop hearing or reading it. Lexicon is The Ringer’s running guide to collecting and defining these terms, and sometimes tracing their origins. It’s a never-ending pursuit, but one we’re happy to entertain.

Data brokers are known by many names: information brokers, information resellers, data vendors. They often seem shadowy because they are outside the immediate view of consumers—existing in the background of our digital lives, watching, learning, but rarely interacting with us. Many times they have either official or boring names to discourage any interest. They function, essentially, as highly sophisticated internet lurkers. Have you scrolled through Instagram or Facebook or Twitter, harvesting updates on what your connections are doing, where they’re going, what political articles they’ve posted? You never interact with them; they don’t know how intensely you’re analyzing their activity. You store that information away in your brain for later reference. The difference is that the human brain is fallible and forgetful—and that you likely won’t lease that information to third parties to make money. (If you do, I hope we are not Facebook friends.)

The conversation about data went mainstream when news broke that Facebook user information was collected by a third-party quiz app and used illicitly by Cambridge Analytica. Facebook CEO Mark Zuckerberg was quick to point out that what occurred was a “breach of trust,” but not a hack. As we continue to learn the breadth of the breach, “data brokers” are repeatedly mentioned, but what are they? The term has been used to connote malicious actors working within the Facebook ecosystem to hoard personal information and use to their benefit, but that’s a narrow view of a more complicated dynamic. As a 2014 study put it, “While many scholars have commented on the rise of consumer finance, few have sought to critically explore a largely invisible segment of this industry: the multi-billion dollar consumer data broker industry.”

According to the FTC, data brokers are “companies that collect information, including personal information about consumers, from a wide variety of sources for the purpose of reselling such information to their customers for various purposes, including verifying an individual’s identity, differentiating records, marketing products, and preventing financial fraud.” Companies familiar and not fall into this category: Acxiom collects marketing-focused data, Nielsen is a common name in TV ratings but not often thought of this way, and CoreLogic gathers and licenses real estate data. The Gartner definition explains that the collected information is typically gathered via API.

The practice predates the internet by a wide margin. Advertisers (and other entities) have long collected information about consumers. This sort of information collection has its roots as far back as the advent of the Gutenberg press, according to Marilyn M. Levine, the founder of the Association for Independent Information Professionals and a former professor of library and information science at the University of Wisconsin. “Information brokering as we now think of it as a business opportunity for the individual information professional was begun by the French in 1935,” Levine writes. This is when the idea of exchanging or licensing data for profit started, and it was done via phone calls, where an organization called the Societe Francaise de Radiophonie would offer information for a fee. People could call them up as well and pay for requested information. In the 1950s, the organization added an air of exclusivity and charged a yearly subscription fee.

Americans soon adopted the idea, Levine writes. “Among them were Matthew Lesko, who turned a home-based newsletter on how to get free information from federal government agencies into a $750,000 a year business, now called Washington Researchers.” Levine also mentions Roger Summit as an information-brokering pioneer. Summit is often called “the father of online search,” thanks to his early work on Dialog, an “information retrieval system” that was a precursor to databases like LexisNexis and EBSCOhost.

In the 1950s, the era of “the hidden persuader,” “motivational research” was used to determine how certain colors, logos, and packaging could influence buyers and subconsciously encourage them to buy a product. And then in the ’70s, advertisers turned to “fact-based marketing,” using results from research as selling points, versus the creatively bent ads of the ’60s.

In the late ’70s, the world was introduced to what sounds an awful like the precursor to data brokers. Kelly Warnken self-published The Directory of Fee-Based Information Services 1977. It was like a Yellow Pages for personal information. Four years later a publisher supported another Warnken’s book: The Information Brokers: How to Start and Operate Your Own Fee-Based Service. The advent of the personal computer set the stage for information—or data —brokering as we know it today. Essentially, the internet threw gasoline on a long-burning fire.

Kate Knibbs recently explored what data brokers collect—what “your data” really means. What they do with it covers a wide range: They can target you for political campaigns or advertising campaigns; they can help people who are looking for you find you—whether that is a long lost relative or someone with malicious intentions. They can provide information to a potential employer and determine whether you get hired for a job. They can aid telemarketers and email spammers in filling your voicemail and inbox with junk.

In the last few weeks, Zuckerberg has made a concerted effort to point out that Facebook doesn’t sell data to advertisers, an attempt to distance his company from the brokers. Meanwhile, Cambridge Analytica has been rightly painted as one of these notorious outfits. It sold data collected via Facebook’s API to clients, including Donald Trump’s 2016 presidential campaign. But Cambridge Analytica is not the only broker. Despite Zuckerberg’s denials, Facebook is another.

The social network is a large database of consumer information that collects Likes, conversations, and political preferences, among much more. It allows the Facebook login button to live across the web to gather yet more information. Though the company does not collect revenue from the direct sale of the information it has catalogued, it has made that data available to outside partners—though the company plans to change this practice in the wake of the scandal. No, Facebook doesn’t give advertisers your name or personal details, but it does tell them how to target ads to you. The social network makes money off of licensing (essentially renting out) that information.

During Zuckerberg’s meeting with Congress, he used the example of a ski shop to illustrate how the company works with advertisers. “What we allow is for advertisers to tell us who they want to reach and then we do the placement. So, if an advertiser comes to us and says, ‘Alright, I’m a ski shop and I want to sell skis to women,’ then we might have some sense because [of] people shared skiing-related content or said they were interested in that. They shared whether they’re a woman. And then we can show the ads to the right people without that data ever changing hands and going to the advertiser.” The company makes money by assisting advertisers in targeting you based on data you’ve shared.

Further tying Facebook to the information-brokering business model is its reliance on users opting out. It puts the onus on users to untangle themselves from Facebook’s service. You have to opt out of facial recognition and ad targeting because they are active by default. Opt-out culture is a textbook data-broker move; it assumes people are comfortable being be tracked. If they are not, the ensuing process is labyrinthine. (Here’s an extensive database to help start the process.) The steps involved with finding data brokers’ information and then attempting to opt-out are reminiscent of what it’s like to wade through Facebook’s settings.

It is strange to learn that a company you’ve never heard of knows a lot about you. Cision is a “holistic PR software” company, and I found out years ago that it had a profile on me before I knew it existed. J.P. Morgan Chase employees were likely shocked to learn how much Palantir knows. Facebook gets to elude the data-broker designation because it’s familiar for different reasons, and, paradoxically, it has been relatively transparent about collecting information.

Facebook has chosen to position itself as an adversary to data brokers. The social network has labeled Cambridge Analytica and researcher Aleksandr Kogan the enemy, the bad guys who took advantage of its platform. Facebook necessarily distances itself from them, but in reality, not only did the company create a system wherein those two entities could legally harvest user data, but its own basic purpose is quite similar. Its new mission is to rebrand without changing the business model—acting as the a data broker for data brokers.

Molly McHugh

Keep Exploring