If you are using Kubernetes solution as a platform to host containerized applications in any of the public clouds, Billing is one of those things that will hunt you sooner or later. Kubernetes billing largely depends on the number of nodes, and node count is decided by the number of workloads a cluster has. We know that autoscaling is one of the favorite features of Kubernetes. Hence, it would be wiser to scale down some of the workloads when there is no work at all and reduce the cloud cost.

When we talk about Kubernetes autoscaling features, Horizontal Pod Autoscaler (HPA) automatically comes to mind. By default, autoscaling can be achieved by HPA using basic metrics like CPU or RAM usage. However, in the event of complex and distributed applications which are integrating with different components outsides the Kubernetes cluster (Ex: Kafka topic lag, Redis Stream, Azure Pipeline Queue, Azure Service Bus, PubSub topic, etc.), HPA itself cannot scale the pods based on metrics from these components. 

Generated by Feedzy