Response time and capacity are the two fundamental indicators of software performance.In industry like finance low latency has a lot of value for the traders. High frequency trading which drives large percentage of trades in many markets requires very fast trading software that can react to market depth in microseconds or nanoseconds. Read about low latency challenges in finance industry here.
Performance challenges in the web technologies is more on scalability side.In trading software where we are looking at milliseconds, microseconds in the web world even 250 milliseconds is good enough as response time for end users. Even in the web world people want low service time at server side as high service time indirectly means low capacity but they wont care much if 50-100 milliseconds are added by Internet.
Why providing low latency is more challenging ?
Problem of scalability was solved with load balancing and horizontal scaling. Although there are technical challenges but a load of million of users can be distributed on thousands of serves and with proper design of software, load balancing and capacity planning.
Latency is the sum of time spent in all activities required to do some work.The time may be spent on CPU or in waits. Basic service time is the latency in which the request does not have to wait in the queue before being serviced. At very high capacity utilization the response time may be 10-100 times the basic service time if the utilization is high due to queuing. So the first important step is to have good capacity planning to manage the response time.If you get this wrong than even achieving a very low basic service time latency will be practically wasted.
Lets say you are able to achieve it and still want lower latency what do you do? Now that you know you have right load balancing and appropriate capacity utilization so that response time is around basic service time mostly except for bursts.We now focus on reducing the basic service time.
Two basic things that can reduce the basic service time are
Comparing performance optimization and road travel
If you compare performance optimization to the road travel. you can achieve higher capacity by adding more lanes to the road. Here basic service time is the time to travel when road is completely empty. Your latency or time to travel increases if there is lot of traffic on the road which is like high latency due to high capacity utilization.Now if you want to reduce the time from the basic service time or time to travel on empty road, what can you do. Either you can find a shorter path or you need faster driving speed. There are limits to both of them. you cannot reduce the distance between two points beyond the shortest distance. You can not keep increasing the speed beyond a point. Similarly when reducing latency beyond a point it becomes more and more difficult. Industry is hacking into OS, libraries,network, switches hardware to reduce the sources of latencies but reducing it further is becoming more and more difficult and expensive. There are market gateways that can provide market update from wire to app memory in 1 microseconds. a single memory fetch from RAM can take 40 to 100 nano seconds so you can imagine you will exceed this latency even if you do 10 memory fetches.
Performance challenges in the web technologies is more on scalability side.In trading software where we are looking at milliseconds, microseconds in the web world even 250 milliseconds is good enough as response time for end users. Even in the web world people want low service time at server side as high service time indirectly means low capacity but they wont care much if 50-100 milliseconds are added by Internet.
Why providing low latency is more challenging ?
Problem of scalability was solved with load balancing and horizontal scaling. Although there are technical challenges but a load of million of users can be distributed on thousands of serves and with proper design of software, load balancing and capacity planning.
Latency is the sum of time spent in all activities required to do some work.The time may be spent on CPU or in waits. Basic service time is the latency in which the request does not have to wait in the queue before being serviced. At very high capacity utilization the response time may be 10-100 times the basic service time if the utilization is high due to queuing. So the first important step is to have good capacity planning to manage the response time.If you get this wrong than even achieving a very low basic service time latency will be practically wasted.
Lets say you are able to achieve it and still want lower latency what do you do? Now that you know you have right load balancing and appropriate capacity utilization so that response time is around basic service time mostly except for bursts.We now focus on reducing the basic service time.
Two basic things that can reduce the basic service time are
- Better algorithms
- Parallelism
Comparing performance optimization and road travel
If you compare performance optimization to the road travel. you can achieve higher capacity by adding more lanes to the road. Here basic service time is the time to travel when road is completely empty. Your latency or time to travel increases if there is lot of traffic on the road which is like high latency due to high capacity utilization.Now if you want to reduce the time from the basic service time or time to travel on empty road, what can you do. Either you can find a shorter path or you need faster driving speed. There are limits to both of them. you cannot reduce the distance between two points beyond the shortest distance. You can not keep increasing the speed beyond a point. Similarly when reducing latency beyond a point it becomes more and more difficult. Industry is hacking into OS, libraries,network, switches hardware to reduce the sources of latencies but reducing it further is becoming more and more difficult and expensive. There are market gateways that can provide market update from wire to app memory in 1 microseconds. a single memory fetch from RAM can take 40 to 100 nano seconds so you can imagine you will exceed this latency even if you do 10 memory fetches.