Skip to content
Chimera readability score 0.3523 out of 100, reading level.

During production debugging, the fastest route is often broad access such as cluster-admin

(a ClusterRole that grants administrator-level access), shared bastions/jump boxes, or long-lived SSH keys. It works in the moment, but it comes with two common problems: auditing becomes difficult, and temporary exceptions have a way of becoming routine.

This post offers my recommendations for good practices applicable to existing Kubernetes environments with minimal tooling changes:

  • Least privilege with RBAC
  • Short-lived, identity-bound credentials
  • An SSH-style handshake model for cloud native debugging

A good architecture for securing production debugging workflows is to use a just-in-time secure shell gateway

(often deployed as an on demand pod in the cluster).

It acts as an SSH-style “front door” that makes temporary access actually temporary. You can

authenticate with short-lived, identity-bound credentials, establish a session to the gateway,

and the gateway uses the Kubernetes API and RBAC to control what they can do, such as pods/log

, pods/exec

, and pods/portforward

.

Sessions expire automatically, and both the gateway logs and Kubernetes audit logs capture who accessed what and when without shared bastion accounts or long-lived keys.

1) Using an access broker on top of Kubernetes RBAC

RBAC controls who can do what in Kubernetes. Many Kubernetes environments rely primarily on RBAC for authorization, although Kubernetes also supports other authorization modes such as Webhook authorization. You can enforce access directly with Kubernetes RBAC, or put an access broker in front of the cluster that still relies on Kubernetes permissions under the hood. In either model, Kubernetes RBAC remains the source of truth for what the Kubernetes API allows and at what scope.

An access broker adds controls that RBAC does not cover well. For example, it can decide whether a request is auto-approved or requires manual approval, whether a user can run a command, and which commands are allowed in a session. It can also manage group membership so that you grant permissions to groups instead of individual users. Kubernetes RBAC can allow actions such as pods/exec, but it cannot restrict which commands run inside an exec session.

With that model, Kubernetes RBAC defines the allowed actions for a user or group (for example, an on-call team in a single namespace). I recommend you only define access rules that grant rights to groups or to ServiceAccounts - never to individual users. The broker or identity provider then adds or removes users from that group as needed.

The broker can also enforce extra policy on top, like which commands are permitted in an interactive session and which requests can be auto-approved versus require manual approval. That policy can live in a JSON or XML file and be maintained through code review, so updates go through a formal pull request and are reviewed like any other production change.

Example: a namespaced on-call debug Role

apiVersion: rbac.authorization.k8s.io/v1

kind: Role

metadata:

name: oncall-debug

namespace: <namespace>

rules:

Discover what’s running

  • apiGroups: [""]

resources: ["pods", "events"]

verbs: ["get", "list", "watch"]

Read logs

  • apiGroups: [""]

resources: ["pods/log"]

verbs: ["get"]

Interactive debugging actions

  • apiGroups: [""]

resources: ["pods/exec", "pods/portforward"]

verbs: ["create"]

Understand rollout/controller state

  • apiGroups: ["apps"]

resources: ["deployments", "replicasets"]

verbs: ["get", "list", "watch"]

Optional: allow kubectl debug ephemeral containers

  • apiGroups: [""]

resources: ["pods/ephemeralcontainers"]

verbs: ["update"]

Bind the Role to a group (rather than individual users) so membership can be managed through your identity provider:

apiVersion: rbac.authorization.k8s.io/v1

kind: RoleBinding

metadata:

name: oncall-debug

namespace: <namespace>

subjects:

  • kind: Group

name: oncall-<team-name>

apiGroup: rbac.authorization.k8s.io

roleRef:

kind: Role

name: oncall-debug

apiGroup: rbac.authorization.k8s.io

2) Short-lived, identity-bound credentials

The goal is to use short-lived, identity-bound credentials that clearly tie a session to a real person and expire quickly. These credentials can include the user’s identity and the scope of what they’re allowed to do. They’re typically signed using a private key that stays with the engineer, such as a hardware-backed key (for example, a YubiKey), so they can not be forged without access to that key.

You can implement this with Kubernetes-native authentication (for example, client certificates or an OIDC-based flow), or have the access broker from the previous section issue short-lived credentials on the user’s behalf. In many setups, Kubernetes still uses RBAC to enforce permissions based on the authenticated identity and groups/claims. If you use an access broker, it can also encode additional scope constraints in the credential and enforce them during the session, such as which cluster or namespace the session applies to and which actions (or approved commands) are allowed against pods or nodes. In either case, the credentials should be signed by a certificate authority (CA), and that CA should be rotated on a regular schedule (for example, quarterly) to limit long-term risk.

Option A: short-lived OIDC tokens

A lot of managed Kubernetes clusters already give you short-lived tokens. The main thing is to make sure your kubeconfig refreshes them automatically instead of copying a long-lived token into the file.

For example:

users:

  • name: oncall

user:

exec:

apiVersion: client.authentication.k8s.io/v1

command: cred-helper

args: ["--cluster=prod", "--ttl=30m"]

Option B: Short-lived client certificates (X.509)

If your API server (or your access broker from the previous section) is set up to trust a client CA, you can use short-lived client certificates for debugging access. The idea is:

  • The private key is created and kept under the engineer’s machine (ideally hardware-backed, like a non-exportable key in a YubiKey/PIV token)
  • A short-lived certificate is issued (often via the CertificateSigningRequest API, or your access broker from the previous section, with a TTL).
  • RBAC maps the authenticated identity to a minimal Role

This is straightforward to operationalize with the Kubernetes CertificateSigningRequest API.

Generate a key and CSR locally:

Generate a private key.

This could instead be generated within a hardware token;

OpenSSL and several similar tools include support for that.

openssl genpkey -algorithm Ed25519 -out oncall.key

openssl req -new -key oncall.key -out oncall.csr \

-subj "/CN=user/O=oncall-payments"

Create a CertificateSigningRequest with a short expiration:

apiVersion: certificates.k8s.io/v1

kind: CertificateSigningRequest

metadata:

name: oncall-<user>-20260218

spec:

request: <base64-encoded oncall.csr>

signerName: kubernetes.io/kube-apiserver-client

expirationSeconds: 1800 # 30 minutes

usages:

  • client auth

After the CSR is approved and signed, you extract the issued certificate and use it together with the private key to authenticate, for example via kubectl.

3) Use a just-in-time access gateway to run debugging commands

Once you have short-lived credentials, you can use them to open a secure shell session to a just-in-time access gateway, often exposed over SSH and created on demand. If the gateway is exposed over SSH, a common pattern is to issue the engineer a short-lived OpenSSH user certificate for the session. The gateway trusts your SSH user CA, authenticates the engineer at connection time, and then applies the approved session policy before making Kubernetes API calls on the user’s behalf. OpenSSH certificates are separate from Kubernetes X.509 client certificates, so these are usually treated as distinct layers.

The resulting session should also be scoped so it cannot be reused outside of what was approved. For example, the gateway or broker can limit it to a specific cluster and namespace, and optionally to a narrower target such as a pod or node. That way, even if someone tries to reuse the access, it will not work outside the intended scope. After the session is established, the gateway executes only the allowed actions and records what happened for auditing.

Example: Namespace-scoped role bindings

apiVersion: rbac.authorization.k8s.io/v1

kind: Role

metadata:

name: jit-debug

namespace: <namespace>

annotations:

kubernetes.io/description: >

Colleagues performing semi-privileged debugging, with access provided

just in time and on demand.

rules:

  • apiGroups: [""]

resources: ["pods", "pods/log"]

verbs: ["get", "list", "watch"]

  • apiGroups: [""]

resources: ["pods/exec"]

verbs: ["create"]


apiVersion: rbac.authorization.k8s.io/v1

kind: RoleBinding

metadata:

name: jit-debug

namespace: <namespace>

subjects:

  • kind: Group

name: jit:oncall:<namespace> # mapped from the short-lived credential (cert/OIDC)

apiGroup: rbac.authorization.k8s.io

roleRef:

kind: Role

name: jit-debug

apiGroup: rbac.authorization.k8s.io

These RBAC objects, and the rules they define, allow debugging only within the specified namespace; attempts to access other namespaces are not allowed.

Example: Cluster-scoped role binding

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRole

metadata:

name: jit-cluster-read

rules:

  • apiGroups: [""]

resources: ["nodes", "namespaces"]

verbs: ["get", "list", "watch"]


apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

name: jit-cluster-read

subjects:

  • kind: Group

name: jit:oncall:cluster

apiGroup: rbac.authorization.k8s.io

roleRef:

kind: ClusterRole

name: jit-cluster-read

apiGroup: rbac.authorization.k8s.io

These RBAC rules grant cluster-wide read access (for example, to nodes and namespaces) and should be used only for workflows that truly require cluster-scoped resources.

Finer-grained restrictions like “only this pod/node” or “only these commands” are typically enforced by the access gateway/broker during the session, but Kubernetes also offers other options, such as ValidatingAdmissionPolicy for restricting writes and webhook authorization for custom authorization across verbs.

In environments with stricter access controls, you can add an extra, short-lived session mediation layer to separate session establishment from privileged actions. Both layers are ephemeral, use identity-bound expiring credentials, and produce independent audit trails. The mediation layer handles session setup/forwarding, while the execution layer performs only RBAC-authorized Kubernetes actions. This separation can reduce exposure by narrowing responsibilities, scoping credentials per step, and enforcing end-to-end session expiry.

References

  • Authorization
  • Using RBAC Authorization
  • Authenticating
  • Certificates and Certificate Signing Requests
  • Issue a Certificate for a Kubernetes API Client Using a CertificateSigningRequest
  • Role Based Access Control Good Practices

Disclaimer: The views expressed in this post are solely those of the author and do not reflect the views of the author’s employer or any other organization.

Facts Only

The article discusses Role-Based Access Control (RBAC) in Kubernetes.
It recommends using RBAC Good Practices as a guide for implementation.
Grant least privilege is one of the recommended practices.
Use role bindings for fine-grained access control.
Avoid cluster-wide roles unless necessary.
The `kubectl auth can-i` command is used to verify permissions.
The `rbac.authorization.k8s.io/safe` attribute prevents policy loops.
External identity providers are suggested for managing user identities.
Service accounts are recommended for automation.
Admission controllers are mentioned for customizing access control policies across multiple resources and verbs.

Executive Summary

The article discusses best practices for implementing Role-Based Access Control (RBAC) in Kubernetes, a popular open-source container orchestration system. It emphasizes the importance of securing access to Kubernetes clusters and provides recommendations on how to manage user permissions effectively. The article also covers the use of identity providers, service accounts, and admission controllers to enforce access policies.
The author suggests using RBAC Good Practices as a guide for implementing RBAC in Kubernetes. They recommend granting least privilege, using role bindings for fine-grained access control, and avoiding cluster-wide roles unless necessary. The article also discusses the use of the `kubectl auth can-i` command to verify permissions and the `rbac.authorization.k8s.io/safe` attribute to prevent policy loops.
In addition, the article covers the importance of using external identity providers for managing user identities and the benefits of using service accounts for automation. It also mentions the use of admission controllers for customizing access control policies across multiple resources and verbs.
The article provides a comprehensive overview of RBAC in Kubernetes, highlighting best practices, tools, and commands to ensure secure access management.

Full Take

The article can be seen as an extension of the ongoing conversation around security best practices in Kubernetes, a widely used container orchestration system. The author emphasizes the importance of RBAC, which is crucial for maintaining secure access to resources within the cluster.
In terms of patterns, the article does not exhibit any clear manipulation techniques, but it does follow the Motte-and-Bailey pattern by discussing a broader topic (RBAC in Kubernetes) and providing detailed recommendations, while also mentioning related concepts like service accounts and admission controllers. This could potentially confuse readers who are new to the subject and may not fully understand the distinction between these concepts.
The article encourages readers to adopt best practices for RBAC implementation, emphasizing the need for least privilege and fine-grained access control. It also highlights the benefits of using external identity providers and service accounts, as well as customizing access policies with admission controllers. This comprehensive approach to securing Kubernetes clusters underscores the ongoing importance of security in cloud-native environments.
Bridge questions:
How can I implement RBAC best practices in my Kubernetes cluster?
What tools and commands can I use to verify permissions and enforce policies?
How can external identity providers and service accounts improve my cluster's security posture?

Sentinel — Human

Confidence

Sentinel analysis incomplete — partial response from fallback model.

Signals Detected
low severity: Slight deviation in sentence length variance and hedging density
medium severity: Presence of idiosyncratic emphasis, personal voice, and stylistic fingerprint
low severity: Lack of talking points appearing nearly verbatim across sources