Data use and generation today are both awesome and daunting to manage. What is the best way to manage this mountain of dispersed and disparate data? A possible answer lies in the concept of ‘data fabric’ as a means to unify data. This is an integrated layer of data and connecting processes that “utilizes continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments.”
So what does that mean in everyday speech?
Think of the data fabric as a bedsheet spanning across data sources. Every source is connected, or woven, into the bedsheet through some form (metadata, API services, security information, etc.). Therefore, they can be linked to other storage and computing functions. Data silos come down by feeding data into one large connector.
Seems straightforward. In a hybrid environment, the concept sounds appealing, too. The details, of course, are where it gets tricky to implement.
How Does It Work?
To begin, we need to understand that data fabric architectures do not happen overnight. Putting one in place is a journey, requiring knowing your existing data (or ‘data mess’). Next, you need a plan to automate, harmonize and manage all sources with common connectors and components, removing the need for custom coding.
Remember when mobile phones had proprietary connectors? Nowadays, almost all of them use USB-C. Loosely, removing custom coding is a similar concept. Proprietary connectors, components and code have their use, but if the purpose is to connect your data sources, the commonality is your friend. You need a data framework that enables a single and consistent data management, allowing seamless access and processing. The framework is built on the following principles:
- Knowledge, insights and semantics. Knowing what data is out there, with high visibility, and how to access it.
- Unified governance and compliance. A common playbook and set of rules so all users can play on the same field.
- Intelligent integration. Defined and managed on-ramps, lanes and off-ramps. Could you safely drive on the interstate if jumping on and off was not managed? Data fabric can lead to improved workload management.
- Orchestration and life cycle. Take advantage of new tools, such as machine learning, to limit the number of accidents on that interstate. The unified data source view means the system can limit pile-ups.
Data Fabric as a Means of Protection
If a skeptical CISO is reading and wondering “won’t this just expose my data from a single source?” they would be correct. Supply chain attacks have shown the problems single sources can cause. Would the same concern not apply here?
Quite possibly, but the answer lies in build, configuration and maintenance. Correctly used, data fabric can make your business more efficient and add data protection. The key is ensuring the right defensive and privacy guardrails are built-in, including but not limited to data masking and encryption.
But like all centralized systems, there are some drawbacks.
Where Data Fabric Can Backfire
Centralization will always come with its own problems. If you mismanage data fabric architecture, you could face cascading failure. While they may not be efficient, architecture and security measures through obfuscation, lack of coordination and disparity (intentional or not) offer a level of resilience. Think of it as a type of unintended segmentation and backup measure.
For example, data fabric could limit, or remove, historical records of data transactions. Depending on the business type, using data fabric architecture could be a very risky decision. If your business relies on processing transactions, not having historical record backups could put you in a bad position if destructive malware or ransomware hits, severely limiting your disaster recovery.
Is Data Mesh Right for You?
As mentioned before, there is a huge benefit in having common connectors. However, these come with a price. Building and managing complex data pipelines that permit common connections make the system more complex. With that comes fragility. It also increases the likelihood of latency.
To contrast, let us look at a related, but different concept: data mesh. Whereas the data fabric relies heavily on artificial intelligence and automation, driven by rich metadata, data mesh relies more on the structure and culture of the organization to bring together data product uses.
Let’s say you’re a CISO or a CIO, or perhaps even a risk or technology officer, who wants to implement data mesh. You would push for a change program that defines data needs upfront, where your data product owners shift to align data with those needs. Data fabric is centralized and requires control to operate, whereas data mesh is federated and requires alignment to operate.
Building Data Fabric Into Your Environment
So, what do you do once you’ve chosen the data fabric approach? Begin with small steps, starting with your DevOps team. Rolling out data fabric requires a good deal of planning, meaning software and IT teams working together is crucial. It is also smart to include your security and business teams. Keep in mind, if the entire enterprise will rely on this ‘bedsheet’ to connect their data, you’ll need input from all of the stakeholders.
Also, migrating to a data fabric implementation is a great time to adopt some security by design thinking. This can do wonders for your business and technical resilience, and think longer-term about data destruction. Cataloging and tagging your data is a key performance marker of how successful your project will be, so do not shy away from investing serious effort into your metadata requirement. In the end, your AI/ML work will be relying on it.
Gartner calls out data fabric and data mesh as strategic tech trends to keep an eye out for in 2022. Before you decide which may be right for you and which can improve your defensive posture, remember that your risk tolerance and business operation needs will drive which architecture solution is best for you.