The Problem: Legacy Tooling and Its Limitations
Currently, Slack utilizes a hybrid approach to network measurement, incorporating both internal (such as traffic between AWS Availability Zones) and external (monitoring traffic from the public internet into Slack’s infrastructure) solutions. These tools comprise a combination of commercial SaaS offerings and custom-built network testing solutions developed by our internal teams over time. This was a suitable enough solution for our needs.
When we began rolling out HTTP/3 support on the edge, there was a significant challenge that we encountered: A lack of client-side observability.
Since HTTP/3 is built on top of the QUIC transport protocol, it uses UDP instead of the traditional TCP. This fundamental shift to a new transport meant that existing monitoring tools and SaaS solutions were not capable of probing our new HTTP/3 endpoints for metrics.
At that time, there was a major gap in the market:
- None of the SaaS observability tools we investigated supported HTTP/3 probing out of the box.
- Our internal Prometheus Blackbox Exporter (BBE), a cornerstone of our monitoring, didn’t have native support for QUIC.
Without the ability to probe hundreds of thousands of HTTP/3 endpoints in our new infrastructure, we couldn’t get the client-side visibility we needed to monitor regressions to HTTP/2 or accurate round trip measurements.
The Intern Who Made It Happen
The Open Source Contribution
Our intern, Sebastian Feliciano, scoped, implemented, and ultimately open-sourced QUIC support for Prometheus BBE
Choosing the Right HTTP Client: The first step was selecting a QUIC-capable HTTP client. After careful consideration, they chose quic-go to serve as the foundation for the new functionality. The choice was settled on due to its wide adoption across other open source technologies, as well as the first-class support it provides in creating http clients in go.
Here’s how Sebastian integrated quic-go into BBE’s HTTP client:
http3Transport := &http3.Transport{
TLSClientConfig: tlsConfig,
QUICConfig: &quic.Config{},
}
client = &http.Client{
Transport: http3Transport,
}
Maintaining Composability: Sebastian had to add this new logic while following the Blackbox Exporter’s existing architecture, ensuring the new features maintained the tool’s configuration patterns.
The result of this work was a functional and configurable HTTP/3 probe within Prometheus, and by open-sourcing their contribution, they provided a solution that the entire Prometheus community could use. By following existing patterns and earning community buy-in, Sebastian successfully landed the HTTP/3 feature.
Final Step: Integration
Making an open-source contribution as an intern is a huge accomplishment. As many of us know, maintainers don’t always merge PRs quickly, especially for new features. Sebastian’s internship timeline was limited, so he couldn’t wait. Sebastian took matters into his own hands and architected an in-house system that utilized the new upstream features for probing out HTTP/3 endpoints.
Operational Improvements
Single Pane of Glass: We now have a unified view of both HTTP/1.1, HTTP/2, and HTTP/3 metrics in Grafana, allowing for easier correlation with other telemetry and comparison.
Better and More Reliable Alerts: With the new probes, we can create more reliable alerts on the health and performance of our HTTP/3 endpoints.
Easier Correlation: Having all our data in one place makes it easier to correlate HTTP/3 performance with other metrics and debug issues faster.
The Open Source Win
Community Benefit: This contribution benefits the wider Prometheus community, helping other organizations facing the same challenges with HTTP/3 adoption. By building this support, we have future-proofed our observability for the ongoing adoption of QUIC and HTTP/3.
Looking Ahead
While this is a major step, our work isn’t done. Future improvements could be made through adding advanced features, such as:
- Server Name Indication (SNI) routing tests
- Validating that the SNI extension is correctly handled by our edge infrastructure. This ensures that when a client requests a specific hostname over a shared IP (like a CDN or a multi-tenant load balancer), the gateway correctly routes the traffic to the intended backend and serves the matching SSL certificate, preventing misrouting errors.
- end-to-end path visualization
- Moving beyond simple “up/down” checks by mapping the entire network hop-by-hop from the monitoring agent to the service endpoint. This provides a visual representation of the network path, making it possible to pinpoint exactly where latency spikes, or packets are lost.
We invite others in the community to try out this new QUIC support in Prometheus Blackbox Exporter and join us in building the next generation of observability tools. You can find the HTTP/3 configuration in the configuration documentation in the Prometheus Black Box Exporter repository.
Conclusion
There were a few takeaways from this project:
1. Monitor first, and migrate second
This should go without saying, but getting observability right as a precursor to migration makes everything faster. We know that the industry is going towards QUIC, but proving to ourselves that it’s the right move long term enables us to invest more into its future.
2. Contributing open source pays dividends
It feels good to give back to open source communities who provide us so much. When a game changing protocol like QUIC comes through, and there’s a gap in existing technologies supporting it, everyone wins when we fill the gap, and we win when everyone decides to support it long term.
3. Bet on your interns
We were incredibly fortunate to have landed Sebastian as an intern for our team. His proactiveness and creativity in problem solving helped us push the QUIC migration across the line, and gave us tangible exposure to the benefits of black-box monitoring.
This journey from having an observability gap to an open-sourced solution perfectly illustrates our commitment to simplicity and scalability. As HTTP/3 adoption grows industry-wide, we’re committed to keeping our monitoring tools ahead of the curve. We welcome community feedback and contributions to help evolve these capabilities further.
Interested in taking on interesting projects, making people’s work lives easier, or just building some pretty cool forms? We’re hiring! 💼
Apply now
Facts Only
Sebastian Feliciano is the intern who contributed to adding QUIC support for Prometheus BBE
Slack utilizes a hybrid approach to network measurement, incorporating both internal and external solutions
HTTP/3 is built on QUIC transport protocol using UDP instead of TCP
Existing monitoring tools don’t support probing HTTP/3 endpoints
Prometheus Blackbox Exporter (BBE) doesn’t have native support for QUIC
Sebastian chose quic-go as the foundation for the new functionality in BBE's HTTP client
Executive Summary
Slack has implemented a hybrid network monitoring approach, utilizing both internal and external solutions, including commercial SaaS offerings and custom-built network testing solutions. However, when they started rolling out HTTP/3 support on the edge, they encountered a challenge due to the lack of client-side observability with existing tools, as HTTP/3 is built on QUIC transport protocol which uses UDP instead of TCP.
An intern named Sebastian Feliciano addressed this issue by open-sourcing QUIC support for Prometheus Blackbox Exporter (BBE). He integrated quic-go into BBE’s HTTP client and added the new logic while following the tool's existing architecture, ensuring compatibility. With this contribution, Slack now has a unified view of HTTP/1.1, HTTP/2, and HTTP/3 metrics in Grafana, enabling better alerts, easier correlation, and improved monitoring.
Full Take
In this scenario, Slack encountered a challenge with monitoring HTTP/3 due to its unique transport protocol (QUIC using UDP) which existing tools couldn’t probe. An intern named Sebastian Feliciano stepped up and open-sourced QUIC support for Prometheus BBE, enabling better monitoring of their new infrastructure.
This contribution demonstrates the power of open source in solving industry challenges and future-proofing observability for ongoing adoption of QUIC and HTTP/3. However, it also highlights a broader issue: as protocols evolve, existing tools may not keep pace, leaving organizations to fill gaps or rely on internal solutions.
Questions to consider: What other emerging technologies may pose similar challenges for network monitoring? How can we ensure that observability tools adapt quickly and efficiently to new protocols?
Sentinel — Human
This analysis suggests the text is likely human-written. The article exhibits some variance in sentence length, shows idiosyncratic emphasis and personal voice, and does not attribute claims to sources that seem unusually convenient or hard to verify.
