Hierarchical Autoscaling
Hierarchical Autoscaling for Large Language Model Serving with Chiron
Hierarchical Autoscaling for Large Language Model Serving with Chiron
arXiv:2501.08090v1 Announce Type: new
Abstract: Large language model (LLM) serving is becoming an increasingly important workload for cloud providers. Based on performance SLO requirements, LLM inference requests can be divided into (a) interactive requests that have tight SLOs in the order of seco…