Editorial: A fight to verify what’s true amid explosion of Big Data — and how a UMass prof aims to help

Last modified: Tuesday, December 29, 2015

How big is Big Data? Companies that track it keep having to add zeros to the total out there, buckets of zeroes. One expert predicts the volume of digitized data will increase 4,300 percent each year by 2020.

Already, there are far more bits of information than people can handle. The executive director of the University College of London’s Big Data Institute calculates that only .05 percent of existing data is analyzed.

What used to be called the “information explosion” has gone supernova.

But those digital bits that do get a look can have a profound impact on people’s lives — raising concerns that lie at the heart of an interesting project led by Alexandra Meliou, an assistant professor of computer science at the University of Massachusetts Amherst.

As Meliou notes, digital data used to be better controlled — and guarded. Today, the proliferation of data sources available through the Internet raises questions about the quality, and reliability, of that information. Using a $550,000 grant from the National Science Foundation, Meliou is working with tech industry leaders like Google to better understand how “bad data” comes to be, and how those who mine data can spot trouble linked to mistakes by people or machines.

People get hurt when data errors go undetected. They are prescribed the wrong medicines, Meliou notes, or get turned down for loans or health care for reasons as simple as sloppy data entry.

Things go badly wrong when that bad data sits in government computers. When Governing magazine this year polled 75 analysts in 46 states, it found that 69 percent of them frequently or often encounter problems with data “integrity.”

In addition to creating problems for citizens, government data mistakes are costly. A 2014 audit in California identified 200,000 hours of questionable sick leave and vacation time worth $6 million. Meliou cites estimates that bad data costs the U.S. $600 billion a year.

Her five-year project will zero in on how data is accumulated and shared to gain insight into how such information is weakened by bad curation or being taken out of context.

For its part, Google runs the risk of misinforming its global customers if its algorithms sweep in bad data.

A Google research scientist, in an email interview with the Gazette, said the goal of Meliou’s project is to better understand what makes good data good — and how errors come about and can be avoided. “The web allows people to freely share data, but meanwhile makes it challenging to separate the wheat from the chaff,” wrote Xin Luna Dong of Google.

People take as gospel the data summaries that appear at the tops of their Google searches. But what if fiction, or outright fraud, lurks there? Knowing the difference is essential to what Google calls its Knowledge Graph.

Meliou admits that given the avalanche of data, it’s a challenge to figure out how to detect errors and avoid misleading conclusions that are the fruit of bad data.

The best Meliou can do, she reasons, is arm software developers with tools that improve their odds of screening out misinformation.

If the mistakes they catch include bad data that might guide a self-driving car, it could be life-saving.


Daily Hampshire Gazette Office

115 Conz Street
Northampton, MA 01061


Copyright © 2019 by H.S. Gere & Sons, Inc.
Terms & Conditions - Privacy Policy