We've all heard the saying - “garbage in, garbage out.” Essentially, if you put bad data into a system, it's going to spit out bad results. This age-old theory has become widely accepted in the technology world; however, for big data, it's simply not true.
Most data is not inherently inaccurate, vague, or worthless - sensor-based data is generally valid. The problem lies in the sheer breadth of data, with many industrial sites generating petabytes of data a day. Couple this with the fact that data often comes in incompatible formats, measuring different trends - now you've got what looks like “garbage.” This leads to less informed decision making - McKinsey & Co estimates that only 1% of the data from the roughly 30,000 sensors on offshore oil rigs gets used for decision making because of the challenge of accessing it.
One solution to this data problem would be to collect less of it; however, then you run the risk of missing an important detail. For instance, in 2015, researchers at Lawrence Livermore National Library (LLNL) were experiencing rapid and unexpected variations in electrical load for Sequoia, one of the world's most powerful supercomputers. These swings were causing substantial management problems for local utilities, as the swings would drop from 9 megawatts to a few hundred kilowatts for seemingly no reason. It wasn't until LLNL cross checked various data streams that they discovered the swings were coinciding with scheduled maintenance for a massive chilling plant. Had they chosen to simply collect less data, they may have never thought that their issue was actually caused by their coworkers in the facilities department.
Luckily, the future of data is promising. With new technologies emerging to help companies get a better hold on their massive amounts of data, employees can spend less time weeding through “garbage” and more time making critical operations decisions.
Read more about how automation in software development and IT management is going to change the face of big data.