Tame Big Data with your ESB

Tame Big Data with your ESBEvery vendor likes to claim that their product is the most superior in the market. When it comes to ESBs (Enterprise Service Bus), it is no different either. Every ESB vendor claims that their product is indeed the fastest. As a consumer, your question to a statement like that should be “How?”. With the application infrastructure, within an enterprise, changing so dramatically over the last decade, it is only normal to expect that an ESB solution being used in such an environment is also capable of accommodating such dynamic changes.

Infrastructure has always been a significant problem but companies seem to be handling it quite well with cloud-based models and virtualized servers. Now, that brings about its own set of challenges from an integration problem. But, that is not the topic of this post.

There is an even bigger challenge at stake today – Data. Companies are growing at the rate of 120% their data volume year-over-year. With such explosive growth rates of data, companies are battling the classic challenges of latency, data handling costs, real-time access, analytics and something real hot today – BIG DATA!

An ESB, as we know, is fundamental to a large and growing infrastructure. And, an ESB is also critical to address most of the data challenges above. However, not every ESB in the market has the capability to solve such issues. As a consumer, do NOT be misled by terms such as “high-performance” or “highly scalable”. Every vendor loves to use those terms lavishly. So, to clearly distinguish the ones that can actually deliver from the ones that are just claiming so, you need to understand what is it that makes an ESB so scalable to solve such BIG data problems.

The root problem of all the key data challenges is the sheer volume of it. If the ESB needs to cater data back and forth between various applications, it needs to have the capacity to handle such bulk loads of data in memory. Yes, you may have heard about design patterns that can handle streams of bulk data so as to not overload the ESB. However, those patterns may not be applicable in a variety of use cases where you need the bulk of the data to be available in totality. I will discuss such key scenarios below.

But, before we get to the scenarios, let us understand one thing clearly. To process such large volumes of data (> 10 GB) in real-time, ESBs need a lot of memory to work with but they do not have the capability to manage all that memory by themselves. It is not the function of an ESB. However, very few of the top notch ESBs can actually work hand-in-hand with an in-memory database (IMDB) or an in-memory data management platform to handle such complex scenarios easily. Some even come bundled with an IMDB. An IMDB is different than the traditional disk-oriented relational databases in that all of the data resides in the main memory of the machine where the IMDB is installed. The IMDB is based on an optimized algorithm that executes much faster in memory and so, data retrieval / storage have no noticeable latency. The main advantage of an in-memory database is that you are no longer delayed by costly database reads/writes and are not tied down by disk latency issues.

Scenarios to explain how an ESB can scale up and scale out with an in-memory database –

  • Mainframe or AS/400 data access too expensive and poor response – There are many companies that have adopted a legacy system as their system-of-record. In which case, every single data update or read has to go to the legacy system. This is not only time-consuming and expensive but also hurts IT agility when it comes to rapidly changing business needs. Bringing in an ESB in between can solve some of the point-to-point connectivity issues and ease up on onboarding / off-boarding of other systems in the environment. However, if you are still going to the legacy system for every single query / update, then having the ESB is not helpful here. Now, bring in an IMDB and turn on the ESB’s capability to work with it. You can now bring in a significantly large amount of the most-commonly used legacy data into the IMDB using the ESB and save up on a lot of time going back and forth to the legacy system. The IMDB will take care of updating the legacy system asynchronously with its incremental updates. The ESB can now serve up any application with information from the legacy system, which is now available readily in memory. This is truly a “high-performance” promise.
  • In-memory analytics of diverse data sources – This is similar to the above use case, only that the data in this case is mostly read-only and also might come from multiple data sources. The most important criterion is that the related data from such diverse sources needs to be available for real-time analysis. Again, you have to imagine real huge volumes of data (Hundreds of GBs of data). This scenario is very common in the Finance industry (Credit card fraud detection) or in Healthcare (Patient telemetric data). The question about which analytics tool to use is not relevant here since the key challenge is to bring in all these data points collectively in real-time with very little or no latency. The ESB can naturally connect with all these data sources and pull in such data effectively. However, to deposit all that information in real-time for the analytics tool to read, it will leverage the IMDB.
  • Large file processing – When IT has to process a large file (> 10 GB in size), it is usually done as a nightly job so as to not impact the performance of any system or database during regular hours. However, there are scenarios when we need to process a large file like that during regular business hours. Also, the other requirement for this scenario might also be that you may need a really large chunk (if not all) of that data to be available in memory for parsing, validation and cross-reference reasons. Again, the ESB may have native capabilities to parse that file format very easily. But, to enable the ESB to read such a large chunk of the file in one shot, an IMDB is necessary. This scenario can be quite common in the Retail space (Initial Product Master load), High-tech manufacturing (Test data, supplier files etc) or in the Financial industry (Trades).

So to sum it up, when you ask an ESB vendor on how their product scales up and if they answer – “Throw in more memory” at it, question that immediately. As you now know, an ESB by itself cannot manage all that memory. It is constrained by whatever JVM or CLR constraints it is allowed. Even if your architect thinks that they can tweak the JVM parameters to use up more extended memory, remind them that the garbage collection (GC) process only takes longer and slows down the ESB even more when more memory is included. You need more sophisticated technology like IMDB to allow for very large main memory access, storage and clean-up. IMDBs have their own GC management and hence, do not rely on native JVM GC techniques. As a result, they are very fast.

Go ahead and power up your ESB. Build up an Application Integration Strategy. Give your ESB a shot in the arm by pairing it with an IMDB, if you run into any of the above mentioned scenarios or one where you see a need for such technology. When you get into more Big Data scenarios, this approach discussed here (as well as in this other post by a colleague), may serve as a possible solution pattern to that problem but may not be the entire solution. There may be other challenges that you may encounter that you typically do not see in your day-to-day use cases within your industry vertical. Topic for a different day! Write to me about some of the big data use cases that you have encountered.


About Dinesh Chandrasekhar

Dinesh Chandrasekhar has written 27 posts in this blog.

Dinesh Chandrasekhar has more than 16+ years experience in Application Architecture, Integration and Implementation across multiple industry verticals. He has special interest in on-premise / cloud integration, iPaaS solutions, high-speed messaging and solving complex integration problems. He is currently a Sr. Manager of Global Product Marketing at Software AG, responsible for the Application Integration product line.


Leave a Reply