Data lakes:
Why use them for IoT analytics?
            

Data lakes are stores or repositories for data of any kind, any scale. An Internet of Things solution can generate thousands of data points from thousands of endpoints every day, resulting in huge amounts of historical data. In fact, devices connected to the Internet of Things are expected to generate 79.4 zettabytes of data in 2025, IDC predicts.

An IoT data lake is a way for you to store your IoT data over time. Later, you can access your IoT data for historical analytics. Offloading to an IoT data lake enables you to:

  • Build and retain experiences with data
  • Store data cost effectively
  • Use data without impacting the performance of your IoT solution

Sample use cases
Use historical analytics on data in a data lake to:

  • View and analyze historic IoT data in aggregate form to identify long-term trends
  • Train machine learning (ML) models
  • Plan production line changes

With historical analytics, you can answer questions like:

  • When did this last happen?
  • Where and how many times have we seen this happen?
  • What’s the average value of the measurement of this device across all factories over a specific time?

Use historical analytics alongside real-time streaming analytics to fine-tune your processes—for example, improve shipment processes by learning from past shipments or discover trends to identify devices that require proactive maintenance to prevent in-operation failure.

Key considerations
When choosing a data lake for IoT analytics, ask:

  • Can you store data on-premises and at the edge?
  • Will you have only one option for data lake storage, or will you have choices?
  • How long can you retain operational endpoint data? Some IoT platforms retain data for a finite time—as short as two weeks.
  • How expensive will storage be?
  • Is the data lake built for huge data sets?
  • Is the data lake optimized for data analysis or training machine learning (ML) models?
  • Can you use a variety of business intelligence, machine learning and SQL-based tools?
  • Will data architects have the flexibility and control they need while data consumers have self-service access to their IoT data?

Benefits of a data lake for IoT analytics

Make your IoT data your advantage with Software AG’s Cumulocity IoT DataHub. With DataHub, you can bridge the gap between streaming and historical analytics in a way that simplifies processes for IT administrators and enables the business to gain new insights about operations and performance.

Simplified management of long-term data storage
DataHub takes data periodically from the operational data store, either on-premises or at the edge, and transforms it into a compact format that’s highly efficient for analytical queries and places it in an analytical store in the data lake. DataHub can support a multitude of devices and, for each offload, DataHub will move alarm, event, measurement and inventory data for every device into the data lake.

Lower cost for IoT data storage
The analytical store can be hosted on your choice of Amazon® S3 or Microsoft® Azure® Data Lake Storage. Cloud-based storage dramatically lowers the cost of creating and managing a data lake. DataHub also supports file system data storage and Hadoop® Distributed File System (HDFS).

Scalable SQL querying of long-term IoT data
Like Cumulocity IoT, DataHub is designed to support an IoT solution consisting of any number of devices and can scale to manage the data each produces. For analyzing this onslaught of device data, DataHub offers SQL, the lingua franca of data processing for decades. Unleash the power of SQL, and you will quickly convert raw IoT device data into meaningful information.

Standard interfaces to BI & data science tools
DataHub acts as an integration layer, enabling highperformance SQL queries on historical IoT data that can be used with a wide range of business intelligence or analytics applications, machine learning training or with other custom applications that use standards such as Arrow Flight, JDBC®, ODBC, REST and SQL.

What customers say about Cumulocity IoT DataHub
“Through Cumulocity IoT DataHub, we have been able to provide our customers with analytics that weren’t previously possible, bringing a wide range of new analytic capabilities to our machine operators to improve the efficiency of their operations,” said Michael Schultheis of Dürr Somac.

Software AG and all Software AG products are either trademarks or registered trademarks of Software AG. Other product and company names mentioned herein may be the trademarks of their respective owners.