High Performance: Bottleneck

Sunday, 25 May 2014

Analytical thinking in performance analysis

Many performance job requirements list analytical skills as one of the requirements. For people who love analysis this is a interesting job.

Lot of analysis is done using mental models or mathematical models for the scenario. Without analysis the test results are just numbers.

Major limitation of performance testing is that you cannot test all possible scenarios and use-cases. Testing and modeling complement each other. One cannot expect major re-engineering decision to be taken based on performance flaws pointed out by analytical reasoning only, they have to be backed by actual measurements.

Good modeling requires a solid understanding of how things work. When you keep looking at data from various perspectives and keep asking questions interesting insights emerge.

Below are some of the scenarios where modeling and analysis can complement testing and measurements.

Modeling and pre-analysis helps in design of right tests

For example from the understanding of architecture, when you expect synchronization issues ,you will design the test so that saturation even while CPU resource is available is clearly established

Modeling and analysis can even provide bounds on expected results

For example, In case where all concurrent transactions serialize on a shared data structure, we know that time spent in synchronized block is serial. So if 50 microseconds are being spent inside synchronized block then there can be at most 20000 transactions per second beyond which the system will saturate.This limit is independent of number of threads executing the transactions in the server and number of available cores. If synchronization is the primary bottleneck by reducing the time spent in synchronized block to half we can double the throughput.

Modeling and analysis help in identifying the bottleneck

Simple model above has given a upper limit on throughput which can be validated by testing. if the solution saturates before that we know that primary bottleneck is somewhere else. Quite often people justify the observed results just on hunches like “Oh! solution saturated using 50% of CPU,it must be a synchronization issue” and accept the results.Without a right quantitative approach this can hide the actual bottleneck. For example in the above example if the saturation is much before 20000 transactions per seconds asking the poor developer to optimize the synchronization will not really solve the problem as the bottleneck is somewhere else

Modeling and analysis can help in answering the what if scenarios.

For example you have tested using 10 G network and someone asks what are your projections if the customer runs on a 1 G network? What are the capacity projections for the solution in this scenario?

What do you think we need higher per core capacity or larger number of cores for the deployment of the solution? if we double the available RAM what kind of improvements can we expect?
If we double the number of available cores what is the expected capacity of the system?

Modeling and analysis help in evaluation of hypothesis

For example someone thinks a particular performance problem is due to OS scheduling, this can be analytically evaluated. For example we had a scenario in which 600 threads were scheduled on 8 core server. When there were performance issues developer felt problem should be from thread contending to get scheduled. When we created the model of how CPU was being used it was very obvious the issue was not of scheduling.By accepting a false hypothesis the real cause of issues remain hidden.

Modeling and analysis enhances our understanding of the system

When models dont match the measurements there are new learnings as we revisit our assumptions

Modeling and analysis helps in extrapolation of results

We might want to extrapolate the results due to various reasons , there can be practical limitations like you dont have enough hardware to simulate the required number of users.

We will discuss how we can model various performance scenarios in future posts.

Related Post :

Software bottlenecks:- Understanding performance of software systems with examples from road traffic

Wednesday, 13 November 2013

Software bottlenecks:- Understanding performance of software systems with examples from road traffic

Through simple examples from road traffic I want to share some of the concepts of software system performance

Modern Financial systems are like maze of interconnected distributed software systems with huge amount of data flowing between these nodes.Different messages taking different routes taking services of different applications in their path.What is common between the performance of these systems and traveling on roads?

Bottleneck

Quite often while driving slowly through traffic jams the first question that comes to my mind is where is the bottleneck. While its much easier to detect the bottleneck in road traffic it is not so obvious in the complex software system.

What is bottleneck?

Wikipedia defines it as "A bottleneck is a phenomenon where the performance or capacity of an entire system is limited by a single or limited number of components or resources."

While driving from Noida to Delhi the slowest part of my journey is a 3-4 KM stretch of a highway. After crossing one particular point of the highway the traffic becomes very fast. At this point people and vehicle cross the road and there is no traffic light.Slow moving traffic queues are formed in places much before this point. Some days when traffic police prohibits crossing at this point there is smooth traffic on the entire highway.This intersection is the bottleneck for the entire traffic flow on this highway

Software bottlenecks are also like this. There will be messages/pending requests in the queues of the component or components before the bottleneck. After the bottleneck component you will observe very less queued requests and smooth operations. This is also the case with traffic bottlenecks just after you cross the bottleneck the traffic is quite smooth and fast.

Software performance bottlenecks will be as evident as traffic bottlenecks if we could easily see where queues are forming in software systems. For this we need to instrument the application and collect good statistics to monitor the state of application. For a set of interconnected components in a workflow it is quite easy to detect which component is bottleneck if you know the pending requests with each. Detecting bottlenecks in large monolithic, multithreaded, multilayered systems is more challenging. Although the basic principles apply but the queues here are more subtle as they are internal to application.

Bottlenecks start becoming evident only after a certain load on the system .On this highway if I go very early morning the traffic is quite smooth. Its only in the peak traffic hour that it becomes really painful. If your load is smaller than the capacity of bottleneck than you will get smooth flow. High load performance testing is done to identify these bottlenecks.

When one bottleneck is resolved it shifts to the next slowest part but the overall capacity of the system is larger. In my example of the road traffic when the bottleneck at the interaction is removed we can see queues at the next traffic light but overall traffic flow is smother than before.As the overall capacity of the system is larger you will hit the next bottleneck at a higher load.

The next post will be about latency and throughput.Why providing low latency is more difficult than providing higher throughput.What are the various components of the latency. We will discuss these issues with simple examples from road traveling.

High Performance

Wednesday, 22 October 2014

Happy Diwali