Selecting a region changes the language and/or content of this page.

StreamSets

Prebuilt transformation processors

Key features

For both data practitioners and business users

UI-driven actions increase speed and consistency

Comprehensive suite of processors including SCDs and Pivots

Quicker time to insights with drag and drop capabilities

Stage

Function

AGGREGATE

Benefit of Using / Best Practice:
Allows users to calculate averages, sums, counts, or other statistical measures to provide quick insights.

Performs aggregate calculations within a data set

Benefit of Using / Best Practice:
Allows user to apply a function based on a dynamically (run-time) selected set of columns; as new columns matching the criteria are added to the source tables, the function will be applied to them. This feature offers flexibility and ease-of-use, empowering users to streamline data processing without being limited to regular expressions.

Differentiates Us? (compared to SQL or competitor)
StreamSets can apply functions to a set of columns chosen by a regular expression. For example, you could apply a user defined function to columns with name ending "_DATE". New columns would automatically get the function applied.

When it comes to dates, never again get your interval wrong. How many times in the world of code do you sit and wonder, is 'month' a capital 'M', or does that mean minute? StreamSets reduces the need to search elsewhere like on Stack Overflow or read through documentation.

Productivity Impact (H/M/L):
H

Applies a function to selected columns, enabling users to define multiple columns at once or conveniently select columns within the tool.

COLUMN ORDER

Benefit of Using / Best Practice:
Allows users to rearrange the order of columns in a dataset to improve data organization and readability.

Reorders columns to a specified order

COLUMN REMOVER

Benefit of Using / Best Practice:
Allows users to remove columns in a dataset to improve data organization and readability.

Differentiates Us? (compared to SQL or competitor)
Selecting columns to keep is easy, but selecting columns to remove requires you to specify the exact columns you need, driving brittleness if a new column is added, requires manual intervention to add that column to the dataset. Pattern-based inclusion/exclusion is not achievable in traditional SQL (can be done but requires a heck of a lot of additional code, and becomes a maintenance nightmare!)

Productivity Impact (H/M/L):
L

Removes columns from a data set

COLUMN RENAME

Benefit of Using / Best Practice:
Allows users to easily rename columns.

Differentiates Us? (compared to SQL or competitor)
To rename a column in SQL, you would have to call each individual column into your script and then manually apply the change. You cannot do this based on a regular expression, also traditional approaches drive brittleness or slower adoption to changing columns in the source tables (as you have to manually bring them in).

Productivity Impact (H/M/L):
M

Renames columns in a data set

COLUMN TYPE CONVERTER

Benefit of Using / Best Practice:
Enables users to seamlessly convert data from one type to another, such as converting float to integer, enhancing data quality and improving downstream processing.

Differentiates Us? (compared to SQL or competitor)
StreamSets can convert a set of columns chosen by a regular expression. For example, you could convert string columns with name ending "_ID" to integers. New columns would automatically get the function applied.

Converts data types for columns that match a regular expression. Regex can also be used to specify multiple columns.

CUBE

Benefit of Using / Best Practice:
Simplifies complex data aggregations, providing a powerful analytical capability for multidimensional analysis.

Performs group by cube and aggregates data in specified columns

DUPLICATES

Benefit of Using / Best Practice:
Enables users to efficiently remove duplicate records from datasets, which ensures data quality and reduce storage footprint by eliminating redundant information.

Removes duplicates rows from a data set

FILTER

Benefit of Using / Best Practice:
Allows users to selectively extract and manipulate data based on specific conditions or criteria.

Passes only rows that match all defined conditions

JOIN

Benefit of Using / Best Practice:
Simplifies the process of merging datasets based on common attributes, which enables non-technical users to efficiently integrate and analyze data. Saves time while reducing risk of error.

Differentiates Us? (compared to SQL or competitor)
The concept of joining is common across SQL and Transformer for Snowflake. However, the benefit lies in the fact you can quickly understand the join. In traditional SQL, bad naming practices creep in (how many derived tables called A or B?) and understanding the impact of the join requires assessing the where statement in conjunction. StreamSets simplifies the view of this showing clearly what records are going into the join and what the join means, helping with debugging and audit compliance in regulated markets.

Joins data from two tables

JSON Parser

Benefit of Using / Best Practice:
Parsing JSON in SQL often requires writing complex queries. Using a dedicated JSON Parser function eliminates the need for manual JSON manipulation while reducing the complexity of SQL queries.

Differentiates Us? (compared to SQL or competitor)
With StreamSets, you don't need to know the structure of the JSON data in order to parse it.

Parses JSON data and extracts specific elements

NULL PROCESSOR

Benefit of Using / Best Practice:
Without this processor, they often have to do manual, iterative updates (it's takes time, it's low value work but high visibility when it's not done). It is a particularly useful processor to use in conjunction with the Join processor because doing Joins with Nulls is problematic (technical and business).

There are many reasons null values happen: this is a processor that helps the analysts automatically clean up the data (eliminate the null values) for use in a pipeline.

PIVOT

Benefit of Using / Best Practice:
Reshapes the data to provide a summarized and structured view of the dataset, making it easier to analyze and present information. Significantly reduces time required by manual data manipulation.

Differentiates Us? (compared to SQL or competitor)
Have you tried pivoting in SQL? One way to do it is to put all the results in a derived table, then write a set of sums and a group by. And if a new item comes to sum... it will not appear in the dataset.

With Transformer for Snowflake, you can select the field you want as a column, what you want to group together, and what math function to apply and it reacts to change.

Rotates a column by turning unique values into new columns and aggregates the results

ROLLUP

Benefit of Using / Best Practice:
Simplifies the process of producing subtotal results for various levels of grouping. It enables users to quickly obtain summaries and subtotals for multiple dimensions within their data, making it easier to identify patterns and trends across different levels of granularity.

Performs Group by Rollup and aggregates data into specific columns

SAMPLE

Benefit of Using / Best Practice:
Allows users to generate a representative sample of incoming data for analysis or testing purposes.

Generates sample of incoming data

SLOWLY CHANGING DIMENSION

Benefit of Using / Best Practice:
Simplifies the management of changing data over time. This allows users to track and analyze historical changes in data attributes without requiring extensive coding. Reduces time required to update changing data by manual coding.

Differentiates Us? (compared to SQL or competitor)
With StreamSets, history tables are made easy, but is a really complex logic to write by hand.

Generates updates for a slowly changing dimension based on master and change data

SNOWFLAKE SQL EVALUATOR

Benefit of Using / Best Practice:
Enables users to leverage the full power and flexibility of Snowflake's SQL language for data manipulation. The SQL Evaluator function simplifies the process of incorporating complex SQL logic into the transformation pipeline without the need for separate SQL scripts or manual coding.

Applies Snowflake SQL expressions to a data set

SORT

Benefit of Using / Best Practice:
Enables efficient downstream processes such as joins or aggregations that require ordered data. Using SQL would require complex queries, which can take up a lot of time.

Sorts data based on specified columns

STREAM SELECTOR

Benefit of Using / Best Practice:
Isolating particular records to process is very messy in traditional SQL, especially to meet the requirements of auditing and collaborative working. StreamSets reduces time to inspect and evaluate what a dataflow is doing.

Passes rows downstream based on conditions

UNION

Benefit of Using / Best Practice:
Allows users to combine multiple datasets. As a best practice, ensure the datasets have compatible columns types and order. Saves time while reducing risk of error.

Differentiates Us? (compared to SQL or competitor)
Transformer for Snowflake innovatively streamlines data integration by automatically aligning columns, regardless of their inbound order. It eliminates time-consuming tasks like schema matching and null-value backfilling. Additionally, it offers control over column selection and duplicates, improving data management efficiency. We're transforming data integration - you gain valuable time for more critical tasks.

Merges data from multiple inputs into a single output

UNPIVOT

Benefit of Using / Best Practice:
Makes it easier for users to work with multi-dimensional datasets. Using SQL to accomplish this requires complex queries and manual transformations, which can take up a significant amount of time.

Rotates a table by transforming columns into rows

WINDOWS FUNCTIONS

Benefit of Using / Best Practice:
Window functions allow for advanced calculations and analysis, such as ranking, running totals, and moving averages, by operating on a specific subset of data based on defined window specifications.

Applies window functions

Research Report

The Business Value of Data Engineering

Explore the pivotal role of data engineering in driving business value and innovation. Dive into our research on trends, challenges, and strategies for 2024.

Read white paper

White paper

The Data Integration Advantage: Building a Foundation for Scalable AI

Discover how modern data integration is key to scaling AI initiatives. Learn strategies for overcoming AI challenges and driving enterprise success.

Read white paper

eBook

Five Principles for Agile Data & Operational Analytics

Master the five data principles essential for powering effective operational analytics. Transform your data strategy for agility and insight.

Read eBook

Are you ready to unlock your data?

Resilient data pipelines help you integrate your data, without giving up control, to power your cloud analytics and digital innovation.

Prebuilt transformation processors

For both data practitioners and business users

UI-driven actions increase speed and consistency

Comprehensive suite of processors including SCDs and Pivots

Quicker time to insights with drag and drop capabilities