What Happens to Big Data If the Small Data Is Wrong?

Big data proponents may have a bigger problem on their hands than the privacy concerns of consumers.

Last week, the data broker Acxiom launched a website, AboutTheData.com, to give individuals a peek into what is known about their households, education levels, income, purchasing behavior, ethnicity, political affiliations, etc. The exercise was done with the idea of transparency, to make people feel more at ease with the information being collected. Acxiom claims to hold information on 190 million consumers in the U.S. that it sells to retailers and other marketers.

What I found out about myself going through the site didn’t make me any more or less concerned about the data being collected. But, what did cause me concern was the level of inaccuracy in my personal report. Household information relative to number of children was off by three kids. Occupation (craftsman/blue collar) gave me a giggle especially considering when my wife and I married she was the one who brought a toolbox to the union. Political affiliation was wrong as were a couple other minor points.

But, if so many points could be wrong about myself—someone who shares information with the hope that one day it will actually be of some benefit to me—then what happens if data points are also wrong for millions of others? I asked others to go on the Acxiom site and errors, although not egregious, were common.

Separately, Melanie Hicken, a reporter for CNNMoney, aged 26 and single, found that she was married and the mother of two teenagers, which she described as "just about biologically impossible."

Even as Acxiom shared some information, critics suggested the company was holding back on data that individuals might find more troubling. Others suggested the company was making profiles available only so that consumers could help it clean up its database.

Discussion Questions

How accurate is the information that companies are compiling about consumers as part of their big data efforts? What is your reaction to Acxiom’s launch of AboutTheData.com?

Poll

20 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Zel Bianco
Zel Bianco
10 years ago

As one article stated, “Acxiom is not used to dealing directly with consumers, so you have to cut them slack on their first attempt.” Well, maybe so, but we are not talking about the local weather here, where being right maybe half the time is acceptable. Business uses this data to make decisions about you and what action to take. So much of the time, this data is wrong and there need to be ways to correct it, and the new web site from Acxiom is not enough. They need to go much further.

Maybe they can help the politicians here in NYC to be more accurate. Last time I checked the name, Bianco is of Italian origin, so I would like to know why campaigns are calling my house and leaving messages in Spanish? I would not be surprised if I start getting offers for hair care products and tools. The first I have no use for as I am bald, and the second I have no idea how to use.

Ian Percy
Ian Percy
10 years ago

“It’s the software stupid.”

Nothing made by humankind since the beginning of time has poorer quality than software. Read any of the ‘Agreements’ you click on without reading and you’ll be shocked. It’s like a car dealer saying “We make no guarantee this new car will start in the morning.” Some estimates suggest over 15 faults per 1,000 lines of code—though a more kind estimate that we use is 1/1000. The renowned software guru Caspers Jones says that in a typical application there are 750 faults, one third or 250 of which are capable of crashing the program or generating erroneous results. And when you have programs interacting with each other the level of faults and errors goes up exponentially.

We like to blame the “garbage going in” as the cause of the “garbage coming out” but the truth is more often the problem lies with what the data actually “goes into”—how it is being cut and diced.

Al McClain
Al McClain
10 years ago

I guess the first thing we all do or did is check the data on ourselves. In my case, it is mostly accurate but there is so little there it is hard to see how marketers could use it to accurately target me. And, they definitely were off base on my interests. The real troublesome thing is that you can get someone’s info. just by having their name, last 4 of their SS#, and their birth date. It would be nice if our personal data, such as it is, were much more secure than that, but it isn’t.

Herb Sorensen, Ph.D.
Herb Sorensen, Ph.D.
10 years ago

So I was glad about some of the glaring inaccuracy of Acxiom, but this was interesting and probably useful from a marketing point of view, anyway.

My present mission in life is two-fold, knowing that there are on the order of a quadrillion seconds “spent” by shoppers in stores, globally, every year:

1. To catalog all these seconds by country, region, class of trade, retailer, store, aisle, category and down to individual items.

2. To understand every one of those seconds – through sampling.

This IS a big data/little data mission, and ever since we began the tracking of shoppers on a second by second basis in 2001 using RFID, video, and lots more methodologies, we have largely avoided privacy issues because each shopper we track is anonymous to us. That is, other than their shopping behavior, we HAVE no identity information.

This has led to articulation of two levels of truth we seek to study:

1. The behavior of the shopping CROWD in aggregate. Probably the single most important things I have learned over the years relate to crowd behavior, because, at the end of the day, the success of a self-service retailer is largely driven by their management of the CROWD. And there is LOTS of room for improvement at this level.

2. But of course crowd behavior is simply the sum of INDIVIDUALS’ behavior. But I always have the crowd behavior for perspective, so when looking at individual behavior, and aggregating it, I am simply assembling the pieces of that “understanding every single second” puzzle!

One of the most important things I have learned is the relative homogeneity of crowds – even around the world. When it comes to the BIG issues, any thousand people will behave much like any other thousand people. The key here is “much like,” but not in every detail. Viva la differences!

John Boccuzzi, Jr.
John Boccuzzi, Jr.
10 years ago

My favorite quote about Big Data comes from Scott Taylor, a Big Data/Master Data Guru. “Big Data needs little data and little data is master data. If the master data is not accurate you’re just messing around”.

I agree with the article that Acxiom is attempting to crowdsource the accuracy of their data. Not a bad idea. I went to the site to check the accuracy of my data and Acxiom came back and said they could not find me. Maybe that will limit the amount of junk mail I receive or did I just sign up?

David Biernbaum
David Biernbaum
10 years ago

Big Data has flaws and glitches, and it’s not nearly as reliable as many CPG and retailers believe, mostly because it relies too much on a process that is inherently biased even if not intentional. That said, big data works most effectively for very big brands that have sufficient history to work with.

Mark Price
Mark Price
10 years ago

While consumer data may be inaccurate on an individual basis, in general, the data is more right than wrong. As a result, marketers are still better off using that information than just operating off of gut feel or simply “spraying and praying.” Test and control results are conclusive—use of the right customer attributes results in higher open, click and response rates.

Aboutthedata.com provides consumers with transparency about key facts in their records and gives them a chance to improve the accuracy, which will result in more targeted communications and offers to them—all to the good.

Peter J. Charness
Peter J. Charness
10 years ago

Well no surprises that the data was “less than perfect” although some of the fields that were clearly “software inferred” were way off. But I cleaned it up as best I could presuming that somehow that would benefit me in the long term. (If it really does, I’d probably be willing to help out a bit more) Smart move by Acxiom; no-cost data cleansing one household at a time.

Mark Heckman
Mark Heckman
10 years ago

Syndicated data has always been about averages and norms. When it comes to managing data at the household level, there are always going to be accuracy issues given the dynamic nature of households and their many moving parts.

I would say the right question would be, “Is the data accurate enough for marketers to use in a targeted fashion and get enhanced response rates while delivering more relevant offers to the customer?” Thankfully, we have measurements in place to determine if the data is reasonably accurate by the type of response we get when we communicate to the shoppers in the context of the household information on file.

However, those companies that are building reasonable customer interfaces so that shoppers can amend, append and edit their own household information, will likely always have better data than a “closed system” where shopper data is collected once and rarely updated.

Brian Numainville
Brian Numainville
10 years ago

I too checked my data and found that is was about 80% right…more off on my personal interests than anywhere else. But that could be a big issue when used for marketing to me based on those said interests! But smart move on Acxiom to get their data cleansed and updated for free!

Ralph Jacobson
Ralph Jacobson
10 years ago

At least two challenges appear in this discussion. The software capabilities, or lack thereof, and the sources of the data being captured. Retailers, CPG brands, etc., are still dealing with the issue of trusted data as they wrestle with big data analytics. If the source data cannot be trusted, the output must be discarded. Further, if the software used for the analysis is suspect, then we have bigger fish to fry with the entire effort.

Carol Spieckerman
Carol Spieckerman
10 years ago

Aboutthedata.com has received quite a bit of attention in social media, not surprisingly, most of it calling out inaccuracies. This certainly doesn’t help Acxiom’s credibility and it may in fact taint it for others who play in the space. How many people are going to tweet out how amazingly accurate the data pool is even when a high level of accuracy is achieved? Anything less than perfect is tweet-worthy-in-a-bad-way.

Ben Ball
Ben Ball
10 years ago

This must be a very popular discussion. As of noon EDT, you seem to have crashed the Acxiom server!

Cathy Hotka
Cathy Hotka
10 years ago

Despite being the only person in the United States with my name, Acxiom “couldn’t verify” my identity.

It’s fashionable to talk about how big data will help improve the shopping experience, but it’s rare to see any evidence of it. I’d love to hear from my fellow panelists about ways in which data like this has improved their store experience.

James Tenser
James Tenser
10 years ago

Checking my own profile last week on AboutTheData.com, I found significant errors and omissions, which I did not correct. I won’t, either, until Acxiom and Equifax and Claritas and Harte-Hanks and Dun & Bradstreet begin to compensate me for providing that service.

So far, at least, these entities can’t punish us for their own mistakes in the same routinely callous manner as the credit bureaus. That is, unless you consider ridiculously off-target offers to be a form of punishment.

So the revelation here is that so-called Big Data built from individual consumer records is highly suspect at the granular level. At a somewhat higher, more aggregate level it is somewhat predictive, that is, provably better than nothing.

To Acxiom’s credit, it has with this release struck a mighty blow for the principle of transparency. All citizens should have a right to know what others know about them. That includes commercial entities and government, in my opinion.

Ron Larson
Ron Larson
10 years ago

Acxiom had basic facts wrong about me. It can be worse than errors on your credit reports.

Big Data analyzers still need to check for data problems. Big data cleaning requires big soap.

Gordon Arnold
Gordon Arnold
10 years ago

It is interesting how we have two discussions so close in supporting one another. In this discussion we see a company collecting data on people to be sold for marketing efforts and interests. And in another we attempt to address customers that provide misinformation, or lies as per the article. What is concerning, as I see it, is the attempt to place blame on either the sales person or the client. There is no demonstration of management addressing the need for truth in information as it pertains to testing for soundness and validity. There are ways to filter records for errors that contaminate the data rendering it in many cases useless.

The discussion we have here clearly puts into question Acxiom’s ability to collect and provide useful information to it’s own customers. Lists of information are sold by the record size and most providers give the customer a percentage of reliability factor with an age from last update and validity test.

Experienced sales and marketing individuals all have a go-to company selling lists that give them an edge in their prospecting needs. If nothing else this discussion gives us a glimpse at just how hard and time consuming prospecting is.

W. Frank Dell II, CMC
W. Frank Dell II, CMC
10 years ago

Since day one, the saying is garbage in garbage out. Even today, most retailers have problems just getting clean, accurate scan data. Both big data and many retailers have another problem—consumer information changes. Anyone checking their credit file will likely find errors. They move, they change jobs, they get married, they have children, children move out and consumers get older, but data collection is one time.

The challenge is how to clean and update data without troubling the consumer. Big data efforts are for filling the database, in many cases from different data sources. These sources make limited cleaning and updating efforts, so the problem is compounded. Big data and retailer data need more frequent updating and reasonableness testing, otherwise they become just more garbage information which is found all over the internet.

Liz Crawford
Liz Crawford
10 years ago

I agree that this website smells a bit fishy. If thousands of people clean the data…it’s a free glimpse into their lives. That is worth money. Not to mention the buzz and eyeballs. Hmmm…I would be very wary about logging onto a site just to reveal info about myself. But then, I’m in the business.

Joel Rubinson
Joel Rubinson
10 years ago

The information about me was WRONG in many ways. In particular, it attributed some ludicrous interests to me (I’m a golf enthusiast but I’ve never played golf!).

BrainTrust