Data lakes: Why use them for IoT analytics?
Data lakes are stores or repositories for data of any kind, any scale. An Internet of Things solution can generate thousands of data points from thousands of endpoints every day, resulting in huge amounts of historical data. In fact, devices connected to the Internet of Things are expected to generate 79.4 zettabytes of data in 2025, IDC predicts.
An IoT data lake is a way for you to store your IoT data over time. Later, you can access your IoT data for historical analytics. Offloading to an IoT data lake enables you to:
Sample use cases
Use historical analytics on data in a data lake to:
With historical analytics, you can answer questions like:
Use historical analytics alongside real-time streaming analytics to fine-tune your processes—for example, improve shipment processes by learning from past shipments or discover trends to identify devices that require proactive maintenance to prevent in-operation failure.
Key considerations
When choosing a data lake for IoT analytics, ask:
Make your IoT data your advantage with Software AG’s Cumulocity IoT DataHub. With DataHub, you can bridge the gap between streaming and historical analytics in a way that simplifies processes for IT administrators and enables the business to gain new insights about operations and performance.
Simplified management of long-term data storage
DataHub takes data periodically from the operational data store, either on-premises or at the edge, and transforms it into a compact format that’s highly efficient for analytical queries and places it in an analytical store in the data lake. DataHub can support a multitude of devices and, for each offload, DataHub will move alarm, event, measurement and inventory data for every device into the data lake.
Lower cost for IoT data storage
The analytical store can be hosted on your choice of Amazon® S3 or Microsoft® Azure® Data Lake Storage. Cloud-based storage dramatically lowers the cost of creating and managing a data lake. DataHub also supports file system data storage and Hadoop® Distributed File System (HDFS).
Scalable SQL querying of long-term IoT data
Like Cumulocity IoT, DataHub is designed to support an IoT solution consisting of any number of devices and can scale to manage the data each produces. For analyzing this onslaught of device data, DataHub offers SQL, the lingua franca of data processing for decades. Unleash the power of SQL, and you will quickly convert raw IoT device data into meaningful information.
Standard interfaces to BI & data science tools
DataHub acts as an integration layer, enabling high-performance SQL queries on historical IoT data that can be used with a wide range of business intelligence or analytics applications, machine learning training or with other custom applications that use standards such as Arrow Flight, JDBC®, ODBC, REST and SQL.
What customers say about Cumulocity IoT DataHub
“Through Cumulocity IoT DataHub, we have been able to provide our customers with analytics that weren’t previously possible, bringing a wide range of new analytic capabilities to our machine operators to improve the efficiency of their operations,” said Michael Schultheis of Dürr Somac.