Skip to content
Chimera readability score 53 out of 100, Graduate reading level.

Discord's API is powered by a unified Python codebase containing over 1700 API endpoints and around 700 background tasks. Engineers make changes to this shared code every day as it's continuously deployed to several hundred separate Kubernetes deployments through a phased rollout process.
That is a lot of code, engineers, endpoints, and deployments! It can be challenging to keep track of all of the changes made every single day, but we have good instrumentation that allows us to keep an eye on latency, throughput, and error rates to help detect regressions that may negatively impact users or our systems.
One observability gap that we wanted to improve last year was our understanding of how hosting costs were allocated across product features. For example, how much does it cost to operate the parts of API that are used to send and receive messages? Start a stream? Send a friend a Nitro gift? How do these values change over time? Did that change someone landed last week meaningfully affect a team’s spend on hosting? We’d like to know these answers for both a single endpoint (e.g. sending a message in a text channel) and for an entire feature (e.g. chat - more on these later).
Most cloud providers will happily split out your costs by Kubernetes deployment, which is helpful but is only the first step due to how we deploy the API. We run the same codebase in all of our Kubernetes deployments, each of which handles a specific subset of HTTP traffic or background tasks. Since we already have so many deployments, breaking them up further to facilitate cost tracking isn’t tenable. We needed to find a way to add better tracking to our existing system without changing our deployment topology.
An additional challenge is that each API worker process handles multiple tasks concurrently. At any moment, it will be juggling work related to any number of features (we do isolate certain traffic to particular deployments, but not in a way that helps us here). Ultimately, in order to understand the cost of serving the API traffic related to a given feature, we need to be able to allocate the cost for a deployment based on how much time it spent on code related to that feature. By extending our application’s profiling tooling, we were able to do exactly this.
Note: all numbers and code in this post are for illustrative purposes only.

Sentinel — Human

Confidence

The text reads like a human-authored technical case study or engineering narrative, exhibiting coherent structure and specific domain knowledge without relying on typical AI hedging or generic transitions.

Signals Detected
low severity: Varied sentence length and technical flow; uses explanatory hedging rather than mechanical transitions.
low severity: Exhibits specific, domain-specific focus; the exposition is grounded in a distinct engineering perspective, lacking generalized, broad AI filler.
low severity: The narrative follows a logical flow from problem identification (cost allocation) to technical solution (profiling tooling); no obvious boilerplate or template language.
low severity: The text presents an internal engineering dilemma with specific, plausible constraints (Kubernetes deployment topology, concurrent worker processes), which suggests lived experience or highly accurate research input.
Human Indicators
Specific discussion of organizational and technical trade-offs (e.g., why splitting deployments isn't tenable due to existing topology) that reflect practical system design constraints.
The tone is analytical and focused on internal systems architecture rather than general synthesis.