In this section, we'll discuss developing testing procedures. You can't test everything, so you need to consider what items can act as indicators. How do you prove that the solution is working properly? How do you know if the solution is highly available or scalable? This design illustrates that in some cases, technical design is not sufficient, but must be followed up with human procedures. The original design was intended to handle a maximum of 1000 queries per second. It was originally a functional design. However, over time, the load has grown. While the system is still operational, if one of the front end servers were to fail and the combined traffic was taken up by the other front end, the total would be 1,200 queries per second, and would be above capacity. For this reason, resiliency requires identifying key metrics, in this case total load, and periodically adjusting capacity to stay ahead of growth. On the other side, if load were diminishing consistently over time, there could be cost savings in downsizing the front end servers. A common misunderstanding is forgetting that auto-scaling follows the law of diminishing returns. Imagine that you're considering the target CPU utilization for the entire group of VMs, the percentage utilization that an additional VM contributes depends on the size of the group. The fourth VM added to a group offers 25 percent increase in capacity to the group. The tenth VM attitude group only offers 10 percent more capacity, even though the VMs are the same size. In this example, removing one VM doesn't get close enough to the target of 75 percent. Removing a second VM would exceed the target. The auto-scaler behaves conservatively, so it will shut down one VM rather than two VMs. It would prefer under utilization over running out of resource when it's needed. Now, here's a tip. Consider using Stackdriver custom metrics for auto-scaling. The reason is that CPU utilization is rarely a good measure of customer experience. A custom metric can enable auto-scaling on a more meaningful value. For example, a game service might scale with the number of players, which might be more directly related to application performance than something like CPU utilization.