Final objective of all performance engineering, validation etc is to deliver performance that meets service requirements in production providing a smooth experience to the customer. One aspect is engineering of software that has high capacity, low response time and optimally utilizes the resources(CPU, Memory, Storage and network), another aspect is managing the operations for optimal working of the software. Even a well engineered software if used beyond its capacity will provide very poor user experience.
Providing performance in the production environment is more important than just demonstrating it in performance labs.
Software utilizes hardware infrastructure to function and provide service. saturation in the infrastructure(easy to detect) implies saturation of software
In the world of distributed software, software modules/ components takes services of each other to provide service. Any one of the module that gets saturated will saturate the entire system or a major subsystem
Problem detection implies that monitoring is in place for the user experience so that problems are detected well before the service levels become completely unacceptable
Diagnostics implies that we can identify the subsystem(infrastructure + software service) that has saturated and take corrective actions
Predictive analytics implies that we can foresee the capacity issues before they arise.General respons time vs workload graph is below. The major characteristics for this curve is same for the hardware or software service.
If capacity and utilization of each subsystem is clear we have the decision support information to ensure smooth operations.
Operation of each subsystem should be within the operational capacity otherwise the performance will suffer.
Subsystem with the highest utilization is the one that will saturate first at higher workload and become the bottleneck.
Head room shown in the graph is the additional workload that the system can endure before it saturates. Depending on the risk tolerance additional capacity can be added while operational head room is still remaining.
Sounds simple! what are the challenges?
In a distributed deployed solution there are too many subsystems.Lets say thousands of hardware equipment and similar number of software components
- Are you monitoring all these subsystems?
- what is the workload in these subsystems?
- Do you know what is the capacity of these subsystems?
- Do you know what is the utilization of these subsystems?
- What is the head room in these subsystems? To endure more load
- Do you have historical workload for your services and the operational baseline of workload
- Work load trends (regular + seasonal)
- Can you forecast the workload for important business event?
- Is your capacity elastic ie if you know you need twice the capacity can you add it in time. More hardware + more software service + load distribution
- Do you have required models?
- Are you capturing the data required for these models?In all sub systems?
- OS provides monitoring info about the hardware
- What about the middleware?
- What about the application software?
- What about the DB?
- Do you know the highest load to which you can drive the system at which the response time meets the service requirements not in labs but in production.
Other Posts
- Software bottlenecks:- Understanding performance of software systems with examples from road traffic
- Analytical thinking in performance analysis
- Time scale of system latencies
No comments:
Post a Comment