What are data federation, federated data models & federated queries?
In theory, data federation helps you eliminate the hassle of managing different types of siloed data in different formats stored in different places. But as is often the case in data management, theory only takes you so far.
What is data federation?
Data federation is a technology that virtually unifies data from different sources and makes it accessible under a uniform data model. The underlying data stores in a federated data store continue to operate autonomously.
But data consumers can run federated queries as though the data were combined. This can be a significant advantage for organizations with many heterogeneous database types since users don’t need to know the data language for each database to run a query.
The difference between data federation and…
Data integration
Data integration combines various data types and formats from an organization’s data sources into a data lake or data warehouse to provide a unified fact base for analytics.
In the sense that both data integration and data federation unify data, they are similar ideas. But there’s a key difference: data unification is temporary with data federation. Since unification is never permanent, data federation is more useful for prototyping and other temporary, ad-hoc needs.
Whereas data integration usually (but not always) involves migrating data from one place to another. The permanence of data integration enables engineers to share, reuse, and collaborate better on their solutions. It also enables important techniques like change data capture and disaster recovery.
Data virtualization
Because the concepts are so similar, we’ve written a more extensive guide on the difference between data virtualization and data federation.
In a nutshell, data virtualization is a broad set of capabilities that includes data federation. This means that all federalized data is also virtualized data. But all virtualized data is not necessarily also federalized because data virtualization includes capabilities beyond the scope of data federation.
For instance, data virtualization tech can transform data without requiring advanced technical knowledge. But virtualizing data in this way is not the same as federating it. In practice, however, the difference between data virtualization and federation is more academic than practical.
How data federation tools work
Data federation tools work by providing a unified access point through which you can query data. In short, they create a federated database. An example of data federation can help us understand the technical details of how these tools work.
Imagine that you run a pet store that tracks all in-store transactions in an Azure database. Later, you launch a website and decide to store online transactions in an Oracle database. Now, you have two different databases which you’ll need to query to analyze performance. A simple query you might run would be to see how much dog food you sold per week online and in-store last month.
Data federation makes the Oracle and Azure databases accessible under a common, federated data model so you can accomplish your goal with a single query. Even though the databases may have slight differences in schema, you can analyze data as though their schema is the same.
Below, you can see a simple visual of an example federated data query pushed to each database.
You can learn more about how various data federation technologies work at the links below:
The drawbacks and benefits of data federation
Data federation is an excellent tool for short-term use cases that don’t need to scale. It enables business users to forgo more resource-intensive, technical data integration for small-scope, one-off requests.
But data federation is not meant to build an enterprise-level data strategy.
You can liken data federation technology to an egg slicer. It’s perfect for slicing eggs, but for anything else, you’ll need a more versatile tool, like a set of kitchen knives.
StreamSets and data federation
StreamSets is not a data federation tool. Yet it solves many of the same problems that data federation software does, without the drawbacks. Like data federation tools, StreamSets can help users create temporary tables from disparate data sources.
But unlike data federation, StreamSets helps you ingest, integrate, and manage structured, semi-structured, and unstructured data from relational and non-relational databases. Those users who employ federated queries to avoid the technical requirements of data integration tools can use StreamSets’s no-code interface to build robust data pipelines.
That way, without being required to write code, business users can leverage the reusability, recoverability, shareability, and performance that integrated data provides and federated data can’t.