When you can't scale fast enough, throttle the firehose.
A rate limiter caps how fast traffic flows downstream. The most common implementation is a token bucket: tokens refill at a fixed rate (say, 30 per tick), each request takes one, and arrivals with no token are rejected immediately. The bucket size lets short bursts squeak through; sustained overload is throttled.
Bursty traffic overwhelms the downstream server. Insert a rate limiter to throttle arrivals at a sustainable rate; the limiter drops the excess so the server stays healthy.