The StreamSets and webMethods platforms have now been acquired by IBM 
Products & Solutions

Streaming & Batch Ingest Data Collector

Build reusable data ingestion pipelines in one interface. From any source, to any destination.

Data ingestion pipelines, simplified

Spend more time building data smart pipelines, enabling self-service and innovating without the noise. StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC and batch ingestion from any source to any destination.

  • Build pipelines for streaming, batch and change data capture (CDC) in minutes
  • Eliminate 90% of break-fix and maintenance time
  • Port data pipelines to new data platforms without rewrites
100+ connectors get your pipelines up and running fast without special skills.
google cloud platform
Awards and recognition
Operationalize your data collection

                    Single experience for all design patterns

Build schema-agnostic smart data pipelines with pre-built sources and destinations in minutes for streaming, batch, and change data capture (CDC), using a single, visual tool. StreamSets Data Collector Engine makes it easy to run data pipelines from Kafka, Oracle, Salesforce, JDBC, Hive, and more to Snowflake, Databricks, S3, ADLS, Kafka and more. Data Collector Engine runs on-premises or any cloud, wherever your data lives.

                    Ingest data across multiple platforms

Run your data in a development environment on multiple platforms without rework. Data Collector pipelines are platform agnostic by design so you can reuse them across data platforms in hybrid and multi-cloud environments. With a few configuration settings, any data professional can start ingesting data from any source to multiple platforms, giving your organization the flexibility to adapt more quickly to new business needs. 

                    Smart data pipelines built for change

Worst case scenario: an upstream change doesn’t break your pipeline, it flows unreliable, incorrect, or unusable data into your analytics platform undetected. Intent-driven pipelines built for data drift, reducing risk of bad data downstream and outages. When data drift happens, Data Collector pipelines alert you to remediate issues or embrace emergent design.
The StreamSets Data Integration Platform
Build smart data pipelines in minutes and deploy across hybrid and multi-cloud platforms from a single log in.
Data engineers gain efficiencies with StreamSets
★★★★★ | 8.01.23

"The best feature of StreamSets is its intuitive visual interface, allowing us to effortlessly design, monitor, and manage data pipelines without the need for complex coding. This has significantly reduced our development time and made the process highly accessible to both technical and non-technical team members."

– Mili M., Senior System Analyst, Mid-Market, (51-1000 emp.)

★★★★★ | 8.03.23
"StreamSets has lot of out of box features to use for data pipelines and connect AWS Kinesis, DB or Kafka and send to HDFS & Hive."

– Sanath V., Enterprise (> 1000 emp.)
Frequently asked questions
  • What is StreamSets Data Collector?
    StreamSets Data Collector is a data pipeline engine for building reliable, smart data pipelines for streaming, batch, and change data capture (CDC) from a wide variety of sources and destinations.
  • What is the difference between StreamSets Data Collector and Transformer
    StreamSets Data Collector runs data ingestion for cloud data pipelines in streaming, CDC, or batch modes, whereas StreamSets Transformer performs ETL, ELT and data transformations such as joins, aggregates, and unions directly on Apache Spark and Snowflake platforms. They are both part of the StreamSets platform.
  • How is processed data in StreamSets Data Collector tracked?
    Processed data is tracked in Data Collector through the Orchestration Record, which contains details about the task that it performed, such as the IDs of the jobs or pipelines that it started and the status of those jobs or pipelines.
  • Can the StreamSets Data Collector engine be deployed in the cloud?
    Yes. StreamSets Data Collector can be deployed to Amazon EC2, Azure Virtual Machine, or Google Compute Engine. Review the documentation for more information.
You may also like:
Research Report
The Business Value of Data Engineering
Explore the pivotal role of data engineering in driving business value and innovation. Dive into our research on trends, challenges, and strategies for 2024.
White paper
The Data Integration Advantage: Building a Foundation for Scalable AI
Discover how modern data integration is key to scaling AI initiatives. Learn strategies for overcoming AI challenges and driving enterprise success.
Five Principles for Agile Data & Operational Analytics
Master the five data principles essential for powering effective operational analytics. Transform your data strategy for agility and insight.
Are you ready to unlock your data?
Resilient data pipelines help you integrate your data, without giving up control, to power your cloud analytics and digital innovation.