benchmarking

Understanding ESB Performance & Benchmarking

ESB performance is a hot (and disputed topic). In this post I don't want to talk about different vendors or different benchmarks. I'm simply trying to help people understand some of the general aspects of benchmarking ESBs and what to look out for in the results.

The general ESB model is that you have some service consumer, an ESB in the middle and a service provider (target service) that the ESB is calling. To benchmark this, you usually have a load driver client, an ESB, and a dummy service.

+-------------+      +---------+      +---------------+
| Load Driver |------|   ESB   |------| Dummy Service |
+-------------+      +---------+      +---------------+

Firstly, we want the Load Driver (LD), the ESB and the Dummy Service (DS) to be on different hardware. Why? Because we want to understand the ESB performance, not the performance of the DS or LD.

The second thing to be aware of is that the performance results are completely dependent on the hardware, memory, network, etc used. So never compare different results from different hardware.

Now there are three things we could look at:
A) Same LD, same DS, different vendors ESBs doing the same thing (e.g. content-based routing)
B) Same LD, same DS, different ESB configs for the same ESB, doing different things (e.g. static routing vs content-based routing)
C) Going via ESB compared to going Direct (e.g. LD--->DS without ESB)

Each of these provides useful data but each also needs to be understood.

Metrics
Before looking at the scenarios, lets look at how to measure the performance. The two metrics that are always a starting point in any benchmark of an ESB here are the throughput (requests/second) and the latency (how long each request takes). With latency we can consider overall latency - the time taken for a completed request observed at the LD, and the ESB latency, which is the time taken by the message in the ESB. The ESB latency can be hard to work out. A well designed ESB will already be sending bytes to the DS before its finished reading the bytes the LD has sent it. This is called pipelining. Some ESBs attempt to measure the ESB latency inside the ESB using clever calculations. Alternatively scenario C (comparing via ESB vs Direct) can give an idea of ESB Latency. 

But before we look at the metrics we need to understand the load driver.

There are two different models to doing Load Driving:
1) Do a realistic load test based on your requirements. For example if you know you want to support up to 50 concurrent clients each making a call every 5 seconds on average, you can simulate this.
2) Saturation! Have a large number of clients, each making a call as soon as the last one finishes.

The first one is aimed at testing what the ESB does before its fully CPU loaded. In other words, if you are looking to see the effect of adding an ESB, or the comparison of one ESB to another under realistic load, then #1 is the right approach. In this approach, looking at throughput may not be useful, because all the different approaches have similar results. If I'm only putting in 300 requests a sec on a modern system, I'm likely to see 300 request a sec. Nothing exciting. But the latency is revealing here. If one ESB responds in less time than another ESB thats a very good sign, because with the same DS the average time per request is very telling.

On the other hand the saturation test is where the throughput is interesting. Before you look at the throughput though, check three things:
1) Is the LD CPU running close to 100%?
2) Is the DS CPU running close to 100%?
3) Is the network bandwidth running close to 100%?

If any of these are true, you aren't doing a good test of the ESB throughput. Because if you are looking at throughput then you want the ESB to be the bottleneck. If something else is the bottleneck then the ESB is not providing its max throughput and you aren't giving it a fair chance. For this reason, most benchmarks use a very very lightweight LD or a clustered LD, and similarly use a DS that is superfast and not a realistic DS. Sometimes the DS is coded to do some real work or sleep the thread while its executing to provide a more realistic load test. In this case you probably want to look at latency more than throughput.

Finally you are looking to see a particular behaviour for throughput testing as you increase load.
Throughput vs Load
The shape of this graph shows an ideal scenario. As the LD puts more work through the ESB it responds linearly. At some point the CPU of the ESB hits maximum, and then the throughput stabilizes.  What we don't want to see is the line drooping at the far right. That would mean that the ESB is crumpling under the extra load, and its failing to manage the extra load effectively. This is like the office worker whose efficiency increases as you give them more work but eventually they start spending all their time re-organizing their todo lists and less work overall gets done.

Under the saturation test you really want to see the CPU of the ESB close to 100% utilised. Why? This is a sign that its doing as much as possible. Why would it not be 100%? Two reasons: I/O, multi-processing and thread locks: either the network card or disk or other I/O is holding it up, the code is not efficiently using the available cores, or there are thread contention issues.

Finally its worth noting that you expect the latency to increase a lot under the saturation test. A classic result is this: I do static routing for different size messages with 100 clients LD. For message sizes up to 100k maybe I see a constant 2ms overhead for using the ESB. Suddenly as the message size grows from 100k to 200k I see the overhead growing in proportion to the message size.


Is this such a bad thing? No, in fact this is what you would expect. Before 100K message size, the ESB is underloaded. The straight line up to this point is a great sign that the ESB is pipelining properly. Once the CPU becomes loaded, each request is taking longer because its being made to wait its turn at the ESB while the ESB deals with the increased load.

A big hint here: When you look at this graph, the most interesting latency numbers occur before the CPU is fully loaded. The latency after the CPU is fully loaded is not that interesting, because its simply a function of the number of queued requests.

Now we understand the metrics, lets look at the actual scenarios.

A. Different Vendors, Same Workload
For the first comparison (different vendors) the first thing to be careful of is that the scenario is implemented in the best way possible in each ESB. There are usually a number of ways of implementing the same scenario. For example the same ESB may offer two different HTTP transports (or more!). For example blocking vs non-blocking, servlet vs library, etc. There may be an optimum approach and its worth reading the docs and talking to the vendor to understand the performance tradeoffs of each approach.

Another thing to be careful of in this scenario is the tuning parameters. Each ESB has various tuning aspects that may affect the performance depending on the available hardware. For example, setting the number of threads and memory based on the number of cores and physical memory may make a big difference.

Once you have your results, assuming everything we've already looked at is tickety-boo, then both latency and throughput are interesting and valid comparisons here. 

B. Different Workloads, Same Vendor
What this is measuring is what it costs you to do different activities with the same ESB. For example, doing a static routing is likely to be faster than a content-based routing, which in turn is faster than a transformation. The data from this tells you the cost of doing different functions with the ESB. For example you might want to do a security authentication/authorization check. You should see a constant bump in latency for the security check, irrespective of message size. But if you were doing complex transformation, you would expect to see higher latency for larger messages, because they take more time to transform. 

C. Direct vs ESB
This is an interesting one. Usually this is done for a simple static routing/passthrough scenario. In other words, we are testing the ESB doing its minimum possible. Why bother? Well there are two different reasons. Firstly ESB vendors usually do this for their own benefit as a baseline test. In other words, once you understand the passthrough performance you can then see the cost of doing more work (e.g. logging a header, validating security, transforming the message). 

Remember the two testing methodologies (realistic load vs saturation)? You will see very very different results in each for this, and the data may seem surprising. For the realistic test, remember we want to look at latency. This is a good comparison for the ESB. How much extra time is spent going through the ESB per request under normal conditions. For example, if the average request to the backend takes 18ms and the average request via the ESB takes 19ms, we have an average ESB latency of 1ms. This is a good result - the client is not going to notice much difference - less than 5% extra. 

The saturation test here is a good test to compare different ESBs. For example, suppose I can get 5000 reqs/sec direct. Via ESB_A the number is 3000 reqs/sec and via ESB_B the number is 2000 reqs/sec, I can say that ESB_A is providing better throughput than ESB_B. 

What is not  a good metric here is comparing throughput in saturation mode for direct vs ESB. 


Why not? The reason here is a little complex to explain. Remember how we coded DS to be as fast as possible so as not to be a bottleneck? So what is DS doing? Its really just reading bytes and sending bytes as fast as it can. Assuming the DS code is written efficiently using something really fast (e.g. just a servlet), what this is testing is how fast the hardware (CPU plus Network Card) can read and write through user space in the operating system. On a modern server hardware box you might get a very high number of transactions/sec. Maybe 5000req/s with each message in and out being 1k in size.

So we have 1k in and 1k out = 2k IO.
2k IO x 5000 reqs/sec x 8bits gives us the total network bandwidth of 80Mbits/sec (excluding ethernet headers and overhead).

Now lets look at the ESB. Imagine it can handle 100% of the direct load. There is no slowdown in throughput for the ESB. For each request it has to read the message in from LD and send it out to DS. Even if its doing this in pipelining mode, there is still a CPU cost and an IO cost for this. So the ESB latency of the ESB maybe 1ms, but the CPU and IO cost is much higher. Now, for each response it also has to read it in from DS and write it out to LD. So if the DS is doing 80Mbits/second, the ESB must be doing 160Mbits/second. 

Here is a picture.

Now if the LD is good enough, it will have loaded the DS to the max. CPU or IO capacity or both will be maxed out. Suppose the ESB is running on the same hardware platform as the DS. If the DS machine can do 80Mbit/s flat out, there is no way that the same hardware running as an ESB can do 160Mbit/s! In fact, if the ESB and DS code are both as efficient as possible, then the throughput via ESB will always be 50% of the throughput direct to the DS. Now there is a possible way for the ESB to do better: it can be better coded than the DS. For example, if the ESB did transfers in kernel space instead of user space then it might make a difference. The real answer here is to look at the latency. What is the overhead of adding the ESB to each request. If the ESB latency is small, then we can solve this problem by clustering the ESB. In this case we would put two ESBs in and then get back to full throughput.

The real point of this discussion is that this is not a useful comparison. In reality backend target services are usually pretty slow. If the same dual core server is actually doing some real work - e.g. database lookups, calculations, business logic - then its much more likely to be doing 500 requests a second or even less. 

The following chart shows real data to demonstrate this. The X-Axis shows increasing complexity of work at the backend (DS). As the effort taken by the backend becomes more realistic, the loss in throughput of having an ESB in the way reduces. So with a blindingly fast backend, we see the ESB struggling to provide just 55% of the throughput of the direct case. But as the backend becomes more realistic, we see much better numbers. So at 2000 requests a second there is barely a difference (around 10% reduction in throughput). 


In real life, what we actually see is that often you have many fewer ESBs than backend servers. For example, if we took the scenario of a backend server that can handle 500 reqs/sec, then we might end up with a cluster of two ESBs handling a cluster of 8 backends. 

Conclusion
I hope this blog has given a good overview of ESB performance and benchmarking. In particular, when is a good idea to look at latency and when to use throughput. 





benchmarking

Benchmarking - ASQ™ TV

This ASQ TV episode covers the basics of benchmarking, reviews the recommended six phases of a benchmarking process, and explains one vital ingredient in benchmarking: metrics.




benchmarking

NSC releases report on MSD prevention benchmarking survey

Itasca, IL — Improving methods of tracking musculoskeletal disorders, continuously monitoring and assessing physical risk factors, and sharing best practices can help workplace MSD prevention programs have real impact.




benchmarking

Benchmarking predictive methods for small-angle X-ray scattering from atomic coordinates of proteins using maximum likelihood consensus data

Stimulated by informal conversations at the XVII International Small Angle Scattering (SAS) conference (Traverse City, 2017), an international team of experts undertook a round-robin exercise to produce a large dataset from proteins under standard solution conditions. These data were used to generate consensus SAS profiles for xylose isomerase, urate oxidase, xylanase, lysozyme and ribonuclease A. Here, we apply a new protocol using maximum likelihood with a larger number of the contributed datasets to generate improved consensus profiles. We investigate the fits of these profiles to predicted profiles from atomic coordinates that incorporate different models to account for the contribution to the scattering of water molecules of hydration surrounding proteins in solution. Programs using an implicit, shell-type hydration layer generally optimize fits to experimental data with the aid of two parameters that adjust the volume of the bulk solvent excluded by the protein and the contrast of the hydration layer. For these models, we found the error-weighted residual differences between the model and the experiment generally reflected the subsidiary maxima and minima in the consensus profiles that are determined by the size of the protein plus the hydration layer. By comparison, all-atom solute and solvent molecular dynamics (MD) simulations are without the benefit of adjustable parameters and, nonetheless, they yielded at least equally good fits with residual differences that are less reflective of the structure in the consensus profile. Further, where MD simulations accounted for the precise solvent composition of the experiment, specifically the inclusion of ions, the modelled radius of gyration values were significantly closer to the experiment. The power of adjustable parameters to mask real differences between a model and the structure present in solution is demonstrated by the results for the conformationally dynamic ribonuclease A and calculations with pseudo-experimental data. This study shows that, while methods invoking an implicit hydration layer have the unequivocal advantage of speed, care is needed to understand the influence of the adjustable parameters. All-atom solute and solvent MD simulations are slower but are less susceptible to false positives, and can account for thermal fluctuations in atomic positions, and more accurately represent the water molecules of hydration that contribute to the scattering profile.




benchmarking

Roodmus: a toolkit for benchmarking heterogeneous electron cryo-microscopy reconstructions

Conformational heterogeneity of biological macromolecules is a challenge in single-particle averaging (SPA). Current standard practice is to employ classification and filtering methods that may allow a discrete number of conformational states to be reconstructed. However, the conformation space accessible to these molecules is continuous and, therefore, explored incompletely by a small number of discrete classes. Recently developed heterogeneous reconstruction algorithms (HRAs) to analyse continuous heterogeneity rely on machine-learning methods that employ low-dimensional latent space representations. The non-linear nature of many of these methods poses a challenge to their validation and interpretation and to identifying functionally relevant conformational trajectories. These methods would benefit from in-depth benchmarking using high-quality synthetic data and concomitant ground truth information. We present a framework for the simulation and subsequent analysis with respect to the ground truth of cryo-EM micrographs containing particles whose conformational heterogeneity is sourced from molecular dynamics simulations. These synthetic data can be processed as if they were experimental data, allowing aspects of standard SPA workflows as well as heterogeneous reconstruction methods to be compared with known ground truth using available utilities. The simulation and analysis of several such datasets are demonstrated and an initial investigation into HRAs is presented.




benchmarking

Benchmarking benchmarks

At the tops of many mountains and along numerous roads across the USA are small brass disks called benchmarks.  These survey points are critical for mapping the landscape, determining boundaries, and documenting changes, and there are hundreds of them in Yellowstone National Park!




benchmarking

Thermo-Hydro-Mechanical-Chemical Processes in Fractured Porous Media: Modelling and Benchmarking Benchmarking Initiatives

Location: Electronic Resource- 




benchmarking

IBM Develops New Quantum Benchmarking Tool — Benchpress

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introduced a new tool — […]

The post IBM Develops New Quantum Benchmarking Tool — Benchpress appeared first on HPCwire.





benchmarking

Hotel Performance Insights: Benchmarking and Trends from the Mews Data Snap 2024

Data isn’t meaningful unless you do something with it. Let’s say your average RevPAR for June is $150. Alone, that doesn’t mean very much. But what if you compare that number to the average RevPAR, globally and within your region? If you measure whether you’re improving year-on-year, month-on-month? If you look beyond RevPAR to accompanying key metrics?




benchmarking

Measuring Productivity: Lessons from Tailored Surveys and Productivity Benchmarking [electronic journal].

National Bureau of Economic Research




benchmarking

Report: An international benchmarking analysis of public Programmes for High-growth firms

High-growth firms (HGFs) – firms able to grow fast over a short period of time – contribute to most new jobs in advanced economies.




benchmarking

Report: An international benchmarking analysis of public Programmes for High-growth firms

High-growth firms (HGFs) – firms able to grow fast over a short period of time – contribute to most new jobs in advanced economies.




benchmarking

Report: An international benchmarking analysis of public Programmes for High-growth firms

High-growth firms (HGFs) – firms able to grow fast over a short period of time – contribute to most new jobs in advanced economies.




benchmarking

Anders and LEA Launch National Benchmarking Survey for Manufacturers

To fill a void of relevant benchmarking data, Anders and our accounting association, the Leading Edge Alliance (LEA), have launched the third annual National Manufacturing Outlook Survey, and we are requesting participation. Created specifically for privately-held manufacturers, this is the… Read More

The post Anders and LEA Launch National Benchmarking Survey for Manufacturers appeared first on Anders CPAs.



  • Manufacturing and Distribution
  • News
  • lea manufacturing outlook

benchmarking

AIBench: Scenario-distilling AI Benchmarking. (arXiv:2005.03459v1 [cs.PF])

Real-world application scenarios like modern Internet services consist of diversity of AI and non-AI modules with very long and complex execution paths. Using component or micro AI benchmarks alone can lead to error-prone conclusions. This paper proposes a scenario-distilling AI benchmarking methodology. Instead of using real-world applications, we propose the permutations of essential AI and non-AI tasks as a scenario-distilling benchmark. We consider scenario-distilling benchmarks, component and micro benchmarks as three indispensable parts of a benchmark suite. Together with seventeen industry partners, we identify nine important real-world application scenarios. We design and implement a highly extensible, configurable, and flexible benchmark framework. On the basis of the framework, we propose the guideline for building scenario-distilling benchmarks, and present two Internet service AI ones. The preliminary evaluation shows the advantage of scenario-distilling AI benchmarking against using component or micro AI benchmarks alone. The specifications, source code, testbed, and results are publicly available from the web site url{this http URL}.




benchmarking

Ultra Low Power Benchmarking: Is Apples-to-Apples Feasible?

I noticed some very interesting news last week, widely reported in the technical press, and you can find the source press release here. In a nutshell, the Embedded Microprocessor Benchmark Consortium (EEMBC) has formed a group to look at benchmarks for ultra low power microcontrollers. Initially chaired by Horst Diewald, chief architect of MSP430TM microcontrollers at Texas Instruments, the group's line-up is an impressive "who's who" of the microcontroller space, including Analog Devices, ARM, Atmel, Cypress, Energy Micro, Freescale, Fujitsu, Microchip, Renesas, Silicon Labs, STMicro, and TI.

As the press release explains, unlike usual processor benchmark suites which focus on performance, the ULP benchmark will focus on measuring the energy consumed by microcontrollers running various computational workloads over an extended time period. The benchmarking methodology will allow the microcontrollers to enter into their idle or sleep modes during the majority of time when they are not executing code, thereby simulating a real-world environment where products must support battery life measured in months, years, and even decades.

Processor performance benchmarks seem to be as widely criticized as EPA fuel consumption figures for cars - and the criticism is somewhat related. There is a suspicion that manufacturers can tune the performance for better test results, rather than better real-world performance. On the face of it, the task to produce meaningful ultra low power benchmarks seems even more fraught with difficulties. For a start, there is a vast range of possible energy profiles - different ways that computing is spread over time - and a plethora of low power design techniques available to optimize the system for the set of profiles that particular embedded system is likely to experience. Furthermore, you could argue that, compared with performance in a computer system, energy consumption in an ultra low power embedded system has less to do with the controller itself and more to do with other parts of the system like the memories and mixed-signal real-world interfaces.

EEMBC cites that common methods to gauge energy efficiency are lacking in growth applications such as portable medical devices, security systems, building automation, smart metering, and also applications using energy harvesting devices. At Cadence, we are seeing huge growth in these areas which, along with intelligence being introduced into all kinds of previously "dumb" appliances, is becoming known as the "Internet of Things." Despite the difficulties, with which the parties involved are all deeply familiar, I applaud this initiative. While it may be difficult to get to apples-to-apples comparisons for energy consumption in these applications, most of the time today we don't even know where the grocery store is. If the EEMBC effort at least gets us to the produce department, we're going to be better off.

Pete Hardee 

 




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.




benchmarking

How Clean is the U.S. Steel Industry? An International Benchmarking of Energy and CO2 Intensities

In this report, the authors conduct a benchmarking analysis for energy and CO2 emissions intensity of the steel industry among the largest steel-producing countries.





benchmarking

Public spending efficiency in the OECD: benchmarking health care, education and general administration

This paper uses data envelopment analysis (DEA) to assess the efficiency of welfare spending in a sample of OECD countries around 2012, focussing on health care, secondary education and general public services.




benchmarking

Benchmarking the inversion barriers in σ3λ3-phosphorus compounds: a computational study

New J. Chem., 2020, Accepted Manuscript
DOI: 10.1039/D0NJ01237H, Paper
Arturo Espinosa Ferao, Antonio García Alcaraz
The study of inversion barriers for ninety-four P(III)-containing compounds has been carried out using DFT calculations. Most of these compounds display a typical vertex (“umbrella”) transition state (TS) structure, whereas...
The content of this RSS Feed (c) The Royal Society of Chemistry




benchmarking

[ASAP] Benchmarking Correlated Methods for Frequency-Dependent Polarizabilities: Aromatic Molecules with the CC3, CCSD, CC2, SOPPA, SOPPA(CC2), and SOPPA(CCSD) Methods

Journal of Chemical Theory and Computation
DOI: 10.1021/acs.jctc.9b01300




benchmarking

Benchmarking, measuring, and optimizing: First BenchCouncil International Symposium, Bench 2018, Seattle, WA, USA, December 10-13, 2018, Revised Selected Papers / Chen Zheng, Jianfeng Zhan (eds.)

Online Resource




benchmarking

Benchmarking and comparative meeasurement for effective performance management by transportation agencies / Joe Crossett, Anna Batista, Hyun-A Park, Hugh Louch, and Kim Voros

Barker Library - TE7.N275 no.902