What Is Tail Latency ?

Table of Contents

Context

Tail latency refers to the slowest response times experienced by a small percentage of requests in a system. These are typically measured at the 99th percentile (p99) or 99.9th percentile (p999).

How tail latency arises in a fan-out model

In the fan-out model (where a request is split across multiple components and waits for all to complete), the overall request latency is determined by the slowest component.

Even if each component is usually fast, a small probability of slowness in each one can add up, making slow responses much more frequent when multiple components are involved.

Probability equation

$$P(at least one slow) = 1 - P(fast request)^n = 1 - (1 - P(slow request))^n$$

Why tail latency matters

  • Bad user experience: Some users get slow responses, even if the average latency looks fine.
  • Scaling challenges: Adding more components makes tail latency worse instead of better.
  • System reliability issues: A few slow components can delay the entire system.

Examples of tail latency in real life

  • A cloud service like AWS Lambda may complete most requests very quickly, but some can take a longer time to execute because of cold starts

Cold starts typically occur in under 1% of invocations. The duration of a cold start varies from under 100 ms to over 1 second

  • A search engine (like Google) may query multiple servers, but the overall search time is dictated by the slowest responding server.

Application

Problem 1: If you fan out processing to 10 components and wait for all of them to complete what is the probability that a user request experiences the 99th percentile latency?

$$ \begin{aligned} P(\text{at least one slow}) &= 1 - (1 - P(\text{slow request}))^{10} \\ P(\text{at least one slow}) &= 1 - (1 - 0.01)^{10} \\ P(\text{at least one slow}) &= 1 - 0.99^{10} \\ P(\text{at least one slow}) &\approx 0.09 \end{aligned} $$

If each component has a 1% chance of being slow, a request fanned out to 10 components will experience at least one slow response in 9% of cases.

Problem 2: If you fan out processing to 10 components and wait for all of them to complete what is the probability that a user request experiences the 95th percentile latency?

$$ \begin{aligned} P(\text{at least one slow}) &= 1 - (1 - P(\text{slow request}))^{10} \\ P(\text{at least one slow}) &= 1 - (1 - 0.05)^{10} \\ P(\text{at least one slow}) &= 1 - 0.95^{10} \\ P(\text{at least one slow}) &\approx 0.4 \end{aligned} $$

If each component has a 5% chance of being slow, a request fanned out to 10 components will experience at least one slow response in 40% of cases.

More cases on p99

n (number of components)P(at least one slow)
1 (no fan-out)1%
21.99%
54.9%
109.56%
2018.21%
5039.5%

Conclusion

  • The more components in the fan-out, the higher the probability of hitting a slow response.
  • This results in a long tail of slow responses, meaning some users will see significantly worse performance than the average user.

Miscellaneous

Why this probability equation

$$P(at least one slow) = 1 - P(fast request)^n = 1 - (1 - P(slow request))^n$$

Let’s try to breakdown it using the complementary event:

$$P(at least one slow) = 1 - P(all fast)$$

If

$$P(fast request)$$ is the probability of a fast component, then:

$$P(fastrequest) = 1 - P(slowrequest)$$

Therefore:

$$ \begin{aligned} P(\text{allfast}) &= P(\text{fastrequest}))^{n} \\ P(\text{allfast}) &= (1 - P(\text{slowrequest}))^{n} \end{aligned} $$

So

$$P(at least one slow) = 1 - P(fast request)^n = 1 - (1 - P(slow request))^n$$

How to calculate a worst case scenario

If you fan out processing to 5 components and wait for all of them to complete what is the probability that a user request experiences the worst case scenario which is all components are slow?

We suppose here the probability of a single component being slow is

$$P(slowrequest)=0.1$$

So:

$$ \begin{aligned} P(\text{allcomponentsslow}) &= \prod_{i=1}^{5} P(\text{slowrequest}) = P(\text{slowrequest})^{5} \\ P(\text{allcomponentsslow}) &= 0.1^{5} \\ P(\text{allcomponentsslow}) &= 10^{-5} = 0.00001 = 0.001% \\ \end{aligned} $$

If each component has a 10% chance of being slow, a request fanned out to 5 components will experience a worst scenario response in 0.001% of cases.

Resources