One of the fundamental requirements for big data users is big data control. Failure to properly store, audit and maintain data chain of custody undermines our individual and collective privacy. This failure also may be at odds with federal law and policy.
Without data control, there is no data compliance. Fortunately, there is a host of big data analytics models that are inherently far more respectful of our privacy than others.
Big Data, Big Privacy Challenges
The fundamental challenge of big data and big privacy is that predicative analytics tools are most effective when they capture and integrate maximum types of data, such as voice, video, geolocation, biometric, structured and unstructured text. Of course, the fundamental challenge for big privacy is that co-mingling and integration of data increases the likelihood that individuals’ personally identifiable information (PII) will be exposed and shared with unauthorized parties.
Threats to privacy increase exponentially when governments and commercial users lose chain-of-custody control of their data or become reliant on closed, proprietary systems that hold data hostage in vendor networks. That’s why it’s important to choose big data analytics architectural models that are open, do not require customers to surrender their data to vendors, allow governments and commercial clients to decide who can see aggregated data and predicative findings, and calibrate the level of anonymization of PII to their need.
Spotting Flimsy Frameworks
Models that are most respectful of data control requirements do not require governments and commercial sector companies to turn over control of their data to vendors. Under less successful data control models, third-party data scientists use proprietary algorithms to conduct their own analyses of data before returning it to the original owner for positive control.
Some closed, proprietary data analytics models also charge by volume of data analyzed. In a world where the amount of available data to aggregate, correlate and predict is increasing exponentially, charging by volume is great for vendor profits, but not so good for clients captured in this expense model.
Four Pillars of Effective Data Control
Good data control frameworks begin with several core precepts. Precepts that strive for maximum compliance with the intent and spirit of our privacy and civil rights are more sustainable in the long term.
Effective frameworks are built on the following non-negotiables:
1. Open Architecture
No one vendor has all the answers, and the very best capabilities reside across the entire big data analytics enterprise. Innovation is too fluid and too fast to lock into one company’s closed intellectual property. Opening base architectures to new ideas, capabilities and innovation is vital to the vibrant tools that can exist within a strong data control framework.
2. Total Ownership of Big Data
No model that requires turning over appropriately collected data to a third-party vendor can be strong on data control. Even if the vendor model is sound on data control, it is impossible for independent auditors to assess those controls if they cannot fully see the data chain-of-custody process and evaluate the veracity of the vendor’s secret algorithm.
3. Customizable Anonymization and Minimization
Different users have different requirements for the protection of PII. Those responsible for detecting insider threats and correlations for national security purposes have a special responsibility to protect data because they are required to access the most private data about persons of concern. People like me who hold security clearances waive certain privacy protections, so data anonymization and minimization requirements are different from the expectations of the traveling public.
Data control systems must be customizable. One size of closed, proprietary frameworks certainly does not fit all.
4. Sharing Determinations Made by the Data Owner
Strong data control models let the owner determine who can access the data and with whom it should be shared, both in its raw and correlated forms. Models that mandate sharing all data with third-party providers are, by definition, weak data control frameworks.
Fantasy World or Bright Security Future?
In a strong data control world, vendors provide exquisite data analytics tools that are auditable, customizable, owned in totality by the customer and agile enough to incorporate innovation from across the technology spectrum. Weak data control models that drive customers to transfer control of their data to proprietary, third-party vendors will struggle, since data owners must always have positive control of their sensitive information.
The strong data control world is not a fantasy. It does, in fact, exist. Adherence to this model is a win-win for government agencies and consumers seeking to leverage strong privacy protections and premier data analytics without ceding control of their data.
Senior Executive for National Security, IBM