The StreamSets and webMethods platforms have now been acquired by IBM 

What is data governance? Principles, concepts, frameworks and best practices

Data governance includes the policies and procedures that dictate how data is created, processed and distributed. And with the massive amount of data being generated by new technologies like IoT devices, AI and AR/VR, data governance is no small job.

So far, research indicates that organizations have plenty of room for improvement. In an IDC report covering North America, Europe and China, respondents estimated that:

  • Their companies collected only 56 percent of the data available through their operations.
  • 43% of the data their company collected went unleveraged.

But let’s return to the challenges of data governance after first reviewing the fundamentals.

What is data governance?

Data governance is the set of overarching policies, processes, standards and metrics used to ensure data is created, processed, distributed and used efficiently and effectively according to an organization’s goals.

Gartner puts it another way: Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.

Why data governance matters

You’ve probably heard that “data is the new oil.” But on further inspection, this statement is a bit misleading.

Data is valuable. But until it’s transported, processed and distributed, data is ultimately just raw material. Most of the data’s value emerges when it’s processed and distributed to the right people, at the right time, in the right form, in a transparent and legal process.

“Unrefined” data can be incredibly costly because it can cause poor decision-making, legal problems and unnecessary work.

Data governance helps organizations extract more value from their data at a lower cost. For example, one common and fundamental goal of data governance is to establish uniformity among different datasets. By establishing uniformity, businesses can 1) avoid making decisions based on unreliable data and 2) cut down the time it takes to make good, data-driven decisions.

In short, data governance matters because it helps organizations create more value from their data.


10 Best Practices

For Modern Data Integration

10 Best Practices

How data governance works

CIO Senior Writer Thor Olasvsrud explains that data governance is “A function that supports an organization’s overarching data management strategy.”

He cites the Data Management Association’s (DAMA) wheel model of data management, with data governance as the hub. The spokes emanating from that hub are:

  1. Data architecture
  2. Data modeling and design
  3. Data storage and operations
  4. Data security
  5. Data integration and interoperability
  6. Documents and content
  7. Reference and master data
  8. Data warehousing and business intelligence
  9. Metadata
  10. Data quality

Though many look at these “spokes” as individual initiatives, the fact is that if developed in siloes, they’re likely to fail. Data governance works best when approached as a whole and tied closely with your data integration strategy.

The challenges with data governance

Data governance is as much a human problem as it is a technical problem.

One of the primary drivers of the need for data governance is the proliferation of so many different individuals, teams and departments using so many different tools in so many different ways. This compounds many of the technical data engineering problems.

Lots of different types of data are stored in different places. That data has to be integrated in a way that businesses can use it for their current needs. But those “current needs” rarely stay current; data governance must make room for when new data use cases are discovered.

Overcoming challenges with data governance best practices

It’s up to data governance teams to standardize the way tools are used to create, process, store, analyze and share data. This is a big task, so here are a few best practices to make your efforts more manageable and effective:

  1. Choose tools that leverage existing metadata. Many common data integration and data wrangling tools leverage an open metadata sharing model. This allows you to leverage existing metadata to see the bigger picture of your data governance. By leveraging this open metadata you can have a bidirectional flow from producers and consumers represented fully in your data governance strategy.
  2. Start small and focused. As we discussed earlier, data governance is as much a human problem as a technical challenge. So you can’t avoid addressing the human side of the data governance equation. Change management expert, Mary Gregory, cites the adage “You wouldn’t eat an elephant in one bite” and explains that “Change needs to be broken down into smaller steps.” Speaking further on data quality initiatives, Gregory says it’s critical to “Create and champion a vision that is really compelling. Ensure this is communicated powerfully, talked about with visible actions taking place that demonstrate movement towards fulfilling the vision.”
  3. Follow BARC’s implementation steps. The Business Application Research Center recommends the following steps for implementing a data governance program:
    • Define goals and understand benefits
    • Analyze current state and delta analysis
    • Derive a roadmap
    • Convince stakeholders and budget project
    • Develop and plan the data governance program
    • Implement the data governance program
    • Monitor and control
  4. Create a data governance framework. Your data governance framework is a product of how you choose to organize people, technology, and processes to ensure seven core properties in a data set:
    1. Data quality: Data is in correct and consistent form
    2. Data availability: Relevant employees can access the data
    3. Data usability: Data is readily available and usable with your tools
    4. Data integrity: Data is not tampered with as it moves throughout the organization
    5. Data security: Data is classified and protected based on its sensitivity
    6. Data auditability: Data can be audited to detect problems
    7. Data observability: System wide visibility into data movement

The first step in creating a data governance framework is to organize your people.

To see how you might structure roles and responsibilities for data governance, McKinsey has a useful best-practice data governance organizational model shown in the image below.

A note on data governance and data management

If data governance is architecture, then data management is construction.

Despite this distinction, many people get data management and data governance confused. But it’s useful to understand this difference because when they combine effectively, the result is more valuable data at a lower cost.

Choosing the tools that empower data governance

Data quality is a fundamental impetus of data governance. Without quality, data cannot be available, usable or secure. And one of the most difficult places to ensure data quality is at the point of ingestion. When leveraging metadata from multiple solutions your data governance picture gets more complete.

With StreamSets, you can avoid the data corrosion and data loss that so often occurs when data is ingested. Among other things, StreamSets gives you the ability to inspect data in motion and automatically detect and respond to schema changes. StreamSets smart data pipelines produce vital metadata that can be used by governance solutions to monitor the use of data from inception to analytics.


Deliver Data 10x Faster

Eliminate 90% of maintenance tasks with data drift handling

Deliver Data 10x Faster

Frequently Asked Questions

What are the principles of good data governance?

The principles of good data governance call for:

  1. Proper knowledge of the data available
  2. Consistent policies aligned with business goals
  3. Privacy by design
  4. Properly managed metadata
  5. Data quality management.

What is data stewardship vs data governance?

Data stewardship is a component of data governance. Data governance leaders set policies and create processes and data stewards carry out procedures according to their interpretation of the written policies and procedures. This means that data stewardship is the execution arm of the data governance initiative.

Are you ready to unlock your data?
Resilient data pipelines help you integrate your data, without giving up control, to power your cloud analytics and digital innovation.