For autoscaling, CPU and memory are like salt and pepper: they're the beginning of a flavorful dish. Latency is the golden saffron that takes autoscaling to new heights.
This talk will show why it is critical to scale based on latency, as well as how to do it for your own service by combining Linkerd, Prometheus, and Kubernetes. We demonstrate how to use Linkerd to instrument your service to collect aggregated service latency, store these metrics in Prometheus, and use them as custom metrics for consumption by the Horizontal Pod Autoscaler in Kubernetes. We demonstrate how latency-based autoscaling outperforms CPU- and memory-based autoscaling under a variety of conditions including live traffic from the attendees of this talk, and suggest ways to safely apply this technique to existing systems.