Through simple examples from road traffic I want to share some of the concepts of software system performance
Modern Financial systems are like maze of interconnected distributed software systems with huge amount of data flowing between these nodes.Different messages taking different routes taking services of different applications in their path.What is common between the performance of these systems and traveling on roads?
Bottleneck
Quite often while driving slowly through traffic jams the first question that comes to my mind is where is the bottleneck. While its much easier to detect the bottleneck in road traffic it is not so obvious in the complex software system.
What is bottleneck?
Wikipedia defines it as "A bottleneck is a phenomenon where the performance or capacity of an entire system is limited by a single or limited number of components or resources."
While driving from Noida to Delhi the slowest part of my journey is a 3-4 KM stretch of a highway. After crossing one particular point of the highway the traffic becomes very fast. At this point people and vehicle cross the road and there is no traffic light.Slow moving traffic queues are formed in places much before this point. Some days when traffic police prohibits crossing at this point there is smooth traffic on the entire highway.This intersection is the bottleneck for the entire traffic flow on this highway
Software bottlenecks are also like this. There will be messages/pending requests in the queues of the component or components before the bottleneck. After the bottleneck component you will observe very less queued requests and smooth operations. This is also the case with traffic bottlenecks just after you cross the bottleneck the traffic is quite smooth and fast.
Software performance bottlenecks will be as evident as traffic bottlenecks if we could easily see where queues are forming in software systems. For this we need to instrument the application and collect good statistics to monitor the state of application. For a set of interconnected components in a workflow it is quite easy to detect which component is bottleneck if you know the pending requests with each. Detecting bottlenecks in large monolithic, multithreaded, multilayered systems is more challenging. Although the basic principles apply but the queues here are more subtle as they are internal to application.
Bottlenecks start becoming evident only after a certain load on the system .On this highway if I go very early morning the traffic is quite smooth. Its only in the peak traffic hour that it becomes really painful. If your load is smaller than the capacity of bottleneck than you will get smooth flow. High load performance testing is done to identify these bottlenecks.
When one bottleneck is resolved it shifts to the next slowest part but the overall capacity of the system is larger. In my example of the road traffic when the bottleneck at the interaction is removed we can see queues at the next traffic light but overall traffic flow is smother than before.As the overall capacity of the system is larger you will hit the next bottleneck at a higher load.
The next post will be about latency and throughput.Why providing low latency is more difficult than providing higher throughput.What are the various components of the latency. We will discuss these issues with simple examples from road traveling.