The difference between Slowly Changing Dimensions and Change Data Capture
The differences between slowly changing dimensions (SDC) and change data capture (CDC) are subtle. Learn the technical differences here.
Difference Between Slowly Changing Dimensions and Change Data Capture

While some might observe that the difference between Slowly Changing Dimensions (SCD) And Change Data Capture (CDC) is subtle, there is, in fact, a technical difference between the two processes.

Both processes detect changes in a source database and deliver the changed data to a target database. The difference between the two is almost entirely about what happens in the target database to the data.

What are Slowly Changing Dimensions (SCD)?

There are actually six types of SCD with the most common being Type 1, Type 2 and Type 3. SCD types 4, 5, and 6 are inefficient and overly complicated for maintaining a history of all changes or overwriting old data, which are the two essential purposes of Slowly Changing Dimensions.

In Type 1, any new data that is ingested overwrites existing data. In Type 2, new data are inserted as new records and the data that would have been overwritten are flagged as inactive or closed with effective time and expiration time assigned to the change to maintain a history. In Type 3, one column is designated for storing previous data (i.e. the data that would’ve been overwritten in Type 1).

In short, Type 1 stores no historical data, Type 2 stores all historical data, and Type 3 stores limited historical data.

Type 1
Data is overwritten.
Type 2
History is maintained, and new data is inserted as new rows.
Type 3
Stores two versions per record: a previous and a current value.
Support for SCD is key for a modern data integration tool to be considered truly modern. StreamSets supports both type 1 and type 2 Slowly Changing Dimensions. Check out a few SCD patterns to see examples of how they can help you manage customer records.

What is change data capture?

CDC is a method of detecting and extracting new or updated records in a source and loading just this new information into your destination. Very often, the alternative to CDC is a full load from one table to another, resulting in a very costly and time-consuming operation. By sipping into your target database just the delta or changed data, you get a much more streamlined process.

There are actually three different ways of performing CDC: log-based, query-based, and trigger based. Differences that are explored elsewhere in our blogs in detail. Essentially, however, log-based CDC updates a log for every INSERT, UPDATE or DELETE and reads that information when it is time to insert into the target database, while trigger CDC kicks off a trigger for every operation with the same result. Log-based CDC is considered to be more efficient than a trigger CDC method. Query-based CDC involves using queries to find differences between datasets and can be untenable with larger datasets as it can require much more resources to perform this comparison.

Updates a log after every operation
Kicks off a trigger after every operation
Compares two versions of the data with queries

CDC looks the most like Type 1 Slowly Changing Dimensions, overwriting new data as it appears. It is most useful when you’re not worried about maintaining a history of all the changes to your database. Like most other modern data integration systems, StreamSets supports log-based CDC.

Choosing Slowly Changing Dimensions vs. Change Data Capture

The choice between SCD and CDC is generally decided based on what the tool you’re using supports. But that’s not the way it should be.

Choosing SCD over CDC or vice versa should be based on business processes, not technical limitations. SCD is ideal for organizations that must maintain a record of all changes to the data flowing through their systems. And CDC is ideal if your business process requires only that the changed data arrive at your target.

Take control of your data management methods

Regardless of the method chosen, StreamSets can help you get your data from your target to your destination in an efficient way. With support for both SCD and CDC the question of which to choose becomes one of process instead of technical limitation. Does your organization need to maintain a record of all changes to the data flowing through your systems? If that’s the case, some version of SCD might be the better choice. If your business process requires only that the changed data arrive in your target post haste, CDC might work better for you and your business. Let us help you navigate your constantly changing data flows with ease.

Accelerate decision-making with analytics-ready data

Related Articles

5 Examples of Data Fabric Architecture in Action
App & Data Integration
5 examples of data fabric architecture in action
Data can span multiple locations, and the central management layer can be a data fabric. See examples of data fabric architecture in action.
Read Blog
Data Integration Architecture
App & Data Integration
Data integration architecture
A data integration architecture aims to solve the heterogeneity feature from various data sources, locations, and interfaces. See how it helps!
Read Blog
A Deep Dive Into Data Pipeline Architecture
App & Data Integration
A deep dive into data pipeline architecture
Data pipeline architecture refers to the design of systems and schema that help collect, transform, and make data available. Take a deep dive here.
Read Blog

Find out what Software AG’s solutions can do for your business

Thanks for Subscribing 🎉