Preamble
In 2025 and 2026, we watched a pattern play out across the industry. Attackers stopped going after production servers directly and started targeting the automation that deploys to them. Compromised developer credentials, a modified workflow file, and suddenly every secret in a CI/CD environment is streaming to an attacker-controlled endpoint. We saw this play out across incidents involving major open-source projects, Fortune 500 companies, and critical infrastructure tooling.
The attack chain is deceptively simple:
Stolen developer credentials → Modified workflow file → Harvested CI secrets → Lateral movement to cloud and production
Today we are open-sourcing cicd-abuse-detector, a drop-in CI template that uses regex-based signal extraction and LLM analysis to detect suspicious changes to CI/CD pipelines. It works across GitHub Actions, GitLab CI, and Azure DevOps, and is designed around the real-world attack techniques documented in public security research.
Key takeaways
- CI/CD environments are high-value targets because a single compromised workflow can exfiltrate cloud credentials, package registry tokens, code signing keys, deploy keys, and OIDC tokens simultaneously
- The tool extracts 50+ regex and metadata signals from diffs, then passes them with the full diff to Claude for structured threat analysis. No Python, no dependencies beyond bash and the Claude Code CLI
- Detection patterns were tested against offensive toolkits like Nord Stream and Gato-X, and against real incidents including ArtiPACKED and HackerBot-Claw
- The project ships with 19 malicious and four benign example diffs modeled after specific incidents, and an automated test suite that validates every signal
Why CI/CD pipelines are a top target
If you spend time reviewing GitHub Actions or GitLab CI configurations, you might notice how much trust is concentrated in these files. A typical deployment workflow has access to AWS credentials, npm publish tokens, Docker Hub passwords, and a GitHub token with write permissions, all at the same time. The attack surface isn't a server with a CVE, it's a YAML file.
Credential harvesting at scale
An attacker with stolen developer credentials modifies a workflow to exfiltrate secrets available in the CI environment. The GhostAction campaign in September 2025 demonstrated this at scale, compromising 327 GitHub users across 817 repositories. 3,325 secrets were stolen through injected workflow files that POST'd credentials to attacker endpoints.
The Shai-Hulud npm worm went further. This self-propagating attack harvested GitHub Personal Access Tokens via gh auth token, ran TruffleHog for secret reconnaissance, and used compromised tokens to silently inject malicious code into other packages owned by the same developer. Over 46,000 malicious packages were published in the first wave alone.
Privileged trigger exploitation
The pull_request_target trigger is one of the most dangerous features in GitHub Actions. Unlike a regular pull_request trigger, it runs workflows in the context of the base repository with access to secrets, but it can execute code from an untrusted fork. The Orca "Pull Request Nightmare" research demonstrated this against repositories maintained by Google, Microsoft, and NVIDIA.
In February 2026, an automated campaign called HackerBot-Claw systematically scanned public repositories for this exact misconfiguration. It used five different exploitation techniques, including poisoned Go init()
functions, branch name command injection, filename-based injection, direct script injection, and AI prompt injection against Claude-based code reviewers. In the most severe case, Aqua Security's Trivy repository was fully compromised, leading to a downstream supply chain attack that exposed 33,000 secrets across nearly 7,000 machines. As documented, this supply chain attack was made possible with compromised tokens that were valid weeks after initially stolen.
The rest of the taxonomy
Beyond credential harvesting and trigger exploitation, the threat model covers four additional categories that appear consistently in public research:
- Permission escalation, where adding permissions: write-all or id-token: write broadens the blast radius of any compromise
- Runner targeting, redirecting jobs to self-hosted runners that often have network access to internal infrastructure, or specifying attacker-controlled container images
- Supply chain manipulation through mutable action references (using @main instead of SHA-pinned versions), remote script execution (
curl
|bash
), lockfile registry swaps, and dependency poisoning - Defense evasion via commit timestamp manipulation, making malicious files appear old and trusted. KL4R10N documented this technique in DPRK-linked campaigns where backdated commits reference infrastructure that did not exist at the claimed date
Each of these maps to specific MITRE ATT&CK techniques: T1552 (Unsecured Credentials), T1195 (Supply Chain Compromise), T1070.006 (Timestomp), and T1059 (Command and Scripting Interpreter).
How the detector works
We wanted the templates to work without requiring Python, custom runtimes, or complex dependencies. Everything runs in standard shell utilities on a default ubuntu-latest runner, and the only installed tool is the Claude Code CLI via npm, which handles authentication, retries, and model routing.
Stage 1: Filter and diff
When a pull request is opened (or a push lands on a protected branch), the workflow identifies changed files across three tiers of CI/CD-relevant paths. The first tier covers core CI files like workflow definitions, pipeline configs, and Makefiles. The second covers build and release artifacts like Dockerfiles, package manifests, lockfiles, and signing or deploy scripts. The third tier picks up developer environment configs like .vscode/tasks.json and .devcontainer files.
Each file is diffed individually and capped at 10,000 characters. We do this per-file rather than globally because a single cap on the combined diff is a bypass vector. An attacker can pad a malicious workflow change with a large benign Dockerfile edit to push the exploit past the character limit.
Stage 2: Signal extraction
Before the LLM sees anything, 50+ regex patterns scan each diff for known-dangerous patterns. These signals are advisory. They never gate the analysis, but they provide the LLM with a pre-screened threat summary. A few examples:
| Signal | Pattern | What it catches |
|---|---|---|
secrets_context | ${{.*secrets. | Direct secret interpolation in workflows |
pull_request_target | pull_request_target | The dangerous trigger that grants secrets to PR code |
checkout_ref | ref:.*github.event.pull_request.head.(sha|ref) | Untrusted PR code checked out in a privileged context |
double_base64 | base64.*|.*base64 | Double-encoding to evade log masking (Nord Stream technique) |
ld_preload | LD_PRELOAD | Arbitrary code execution via environment variable injection |
vscode_auto_task | runOn.*folderOpen | VS Code task that executes on folder open (Contagious Interview) |
The signal list is based on real adversarial tooling, including Nord Stream and Gato-X, and tested against 19 malicious example diffs modeled after specific incidents.
The detector runs identically across GitHub Actions, GitLab CI, and Azure DevOps. Here are detections firing on each platform:
Stage 3: LLM analysis
The signal summary, full diff, author profile, and commit metadata are bundled and sent to Claude via the Claude Code CLI. The analysis prompt walks the model through several areas:
- Diff comprehension and per-file risk assessment
- Signal interpretation with context (a signal alone is not a verdict)
- Temporal analysis for backdated commits
- Author trust assessment using account age, contribution history, and org membership
- Severity calibration against a signal combination table with 60+ entries
- False positive recognition (e.g., cURL for downloading known tools is not exfiltration)
- Concrete, actionable recommendations ("Pin actions/setup-node@main to a specific SHA" instead of "review carefully")
The output is a structured JSON verdict containing severity, confidence, reasoning, evidence, and recommendations, all validated against a JSON Schema.
Stage 4: Alert and gate
Based on the verdict severity, the workflow posts a step summary, creates an issue, sends a Slack notification, and optionally fails the PR check if severity meets a configured threshold.
Alerts in Slack and GitHub Issues solve the immediate notification problem, but they don't give you a queryable history. Every verdict the detector produces (e.g. benign, suspicious, or malicious), can optionally ship to Elasticsearch as a structured document in the logs-cicd.abuse-default data stream. The workflow ships the verdict along with CI/CD metadata (platform, repository, actor, event type, run URL) into a single index that spans all three supported platforms.
This is where cross-platform correlation becomes practical. A GitHub Actions alert and a GitLab CI alert from the same actor land in the same data stream, queryable in a single ES|QL statement:
FROM logs-cicd.abuse-*
WHERE verdict.verdict IN ("malicious", "suspicious") AND @timestamp > NOW() - 7 days
EVAL platform = cicd.platform, repo = cicd.repository, actor = cicd.actor, severity = verdict.severity
KEEP @timestamp, platform, repo, actor, severity
SORT @timestamp DESC
The schema includes cicd.platform, cicd.repository, cicd.actor, and the full verdict object (verdict, severity, confidence, summary, reasons, evidence), making it straightforward to build detection rules. A coordinated campaign that hits multiple repos within an hour, a repeat offender flagged across platforms, or a spike in critical findings that warrants an incident response page can be correlated.
Validating against real attacks
To validate coverage, we compared our detection patterns against the actual source code of offensive tools, published research, and public post-mortems.
Nord Stream: verbatim payload matching
Nord Stream is Synacktiv's open-source CI/CD secret extraction tool supporting GitHub, GitLab, and Azure DevOps. We pulled the YAML generator source (nordstream/yaml/github.py
) and compared its output templates against our example diffs.
- The GitHub payload template uses
env -0 | awk -v RS='0' '/^secret_/ {print $0}' | base64 -w0 | base64 -w0
. Ournord-stream-pipeline-exfil.diff
contains this line verbatim, and ourdouble_base64
,env_null_dump
, andenv_secret_grep
signals all fire. - The OIDC Azure template uses
azure/login@v1
withid-token: write
permissions followed by az accountget-access-token | base64 -w0 | base64 -w0
. Our diff captures this exact flow and triggerscloud_auth_action
andid_token_write
. - The Azure DevOps pipeline techniques (
addSpnToEnvironment
for SPN credential exposure,DownloadSecureFile
for secure file theft, SSH task source patching viassh.js
modification) are all present innord-stream-azure-devops.diff
and detected by platform-specific signals.
ArtiPACKED: the artifact race condition
The ArtiPACKED research from Palo Alto Unit 42 showed that uploading the entire checkout directory as an artifact leaks the .git/config
file containing the GITHUB_TOKEN
. With the v4 artifact API allowing mid-run downloads, an attacker can extract and use the token before the job completes.
Our artifact-token-leak.diff
models this exact pattern, using upload-artifact
with path: .
(the entire workspace). The upload_artifact
signal catches it, and the LLM evaluates whether the upload scope includes the .git
directory.
GITHUB_ENV injection: LD_PRELOAD to RCE
Legit Security's research on Google Firebase and Apache showed that writing untrusted input to $GITHUB_ENV
allows an attacker to set arbitrary environment variables like LD_PRELOAD
and NODE_OPTIONS
, achieving code execution in privileged workflows.
Our github-env-injection.diff
reproduces this technique with three distinct payloads, including LD_PRELOAD
pointing to a malicious shared object, NODE_OPTIONS
with a required injection, and $GITHUB_PATH
manipulation. The github_env_write
, ld_preload
, and github_path_write
signals all trigger as expected.
Contagious Interview: IDE config as initial access
The Contagious Interview campaign attributed to DPRK targets developers through fake job interviews, distributing repositories with .vscode/tasks.json
files that auto-execute on folder open. The presentation is hidden (reveal: never
, echo: false
), and the payload uses curl
| node
for silent execution.
Our ide-config-poisoning.diff
captures the full attack chain, including the auto-execute trigger (runOn: folderOpen
), the hidden presentation, the curl | node
payload, the files.exclude
entry that hides the .vscode
directory, and a trojanized postinstall hook with base64-encoded URLs and eval()
for code execution. Six signals pick this up at once.
Defensive recommendations
Beyond deploying the detector, here are some hardening measures that came directly out of the attack patterns we studied:
- Pin all actions to SHA, not tags, not branches. SHA-pinned references prevent retroactive tag modification attacks like
tj-actions
(CVE-2025-30066). - Scope secrets to individual steps rather than using job-level environment variables. Each step should only have access to the secrets it actually needs.
- Use short lived, ephemeral tokens when possible to reduce attack surface
- Avoid
pull_request_target
unless strictly necessary. If you must use it, never checkout the PR head code in the same workflow. Use a separateworkflow_run-triggered workflow
for operations that need both secrets and PR context. - Set explicit permissions on every workflow because the default token permissions are far too broad. Set
permissions: {}
at the workflow level and add specific permissions per job. - Enable
persist-credentials: false
on checkout since the default behavior of actions/checkout persists theGITHUB_TOKEN
in the.git
directory. If you upload artifacts, this token goes with them.
Summary
CI/CD pipelines have become a major attack surface for supply chain compromise. The same automation that makes modern software delivery possible is what attackers exploit to harvest credentials, poison packages, and pivot to cloud infrastructure. Traditional code review doesn't catch these patterns well because they're subtle, platform-specific, and designed to look like legitimate DevOps changes.
Combining regex-based signal extraction with LLM reasoning lets us surface these patterns at the pull request stage, before they reach production. The repo includes the full threat model, test suite, and example diffs if you want to dig into the details or adapt it to your own environment.
To get started, check out the cicd-abuse-detector repo for setup instructions, the full threat model, and example diffs. We're always interested in hearing about new attack patterns and detection ideas. Chat with us in our community Slack, and ask questions in our Discuss forums.
CI/CD abuse through MITRE ATT&CK
We use the MITRE ATT&CK framework to map the tactics, techniques, and procedures that adversaries use against CI/CD pipelines.
Tactics
| Tactic | CI/CD Relevance |
|---|---|
| Credential Access (TA0006) | Harvesting secrets from CI environments |
| Execution (TA0002) | Running commands in pipeline runners |
| Persistence (TA0003) | Scheduled triggers, cron-based workflows |
| Defense Evasion (TA0005) | Commit timestamp manipulation, log masking evasion |
| Initial Access (TA0001) | Compromised developer credentials, phishing for PATs |
| Lateral Movement (TA0008) | Using harvested cloud credentials to pivot |
Techniques
| Technique | CI/CD Application |
|---|---|
| T1552: Unsecured Credentials | Secrets exposed in CI environment variables, artifacts, and runner memory |
| T1195.002: Compromise Software Supply Chain | Poisoned actions, dependencies, and lockfiles |
| T1059: Command and Scripting Interpreter | curl |
| T1070.006: Timestomp | Backdated commit dates to evade review |
| T1098: Account Manipulation | Permission escalation via write-all, id-token: write |
| T1078: Valid Accounts | Stolen developer PATs used to modify workflows |
References
The following were referenced throughout the above research:
- https://github.com/elastic/cicd-abuse-detector
- https://github.com/synacktiv/nord-stream
- https://github.com/AdnaneKhan/Gato-X
- https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/
- https://blog.gitguardian.com/ghostaction-campaign-3-325-secrets-stolen
- https://www.reversinglabs.com/blog/shai-hulud-worm-npm
- https://orca.security/resources/blog/pull-request-nightmare-github-actions-rce/
- https://orca.security/resources/blog/hackerbot-claw-github-actions-attack/
- https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation
- https://www.legitsecurity.com/blog/github-privilege-escalation-vulnerability-0
- https://www.abstract.security/blog/contagious-interview-tracking-the-vs-code-tasks-infection-vector
- https://about.codecov.io/apr-2021-post-mortem/
- https://kl4r10n.tech/blog/when-git-history-lies
- https://www.synacktiv.com/en/publications/github-actions-exploitation-dependabot
- https://docs.anthropic.com/en/docs/claude-code
About Elastic Security Labs
Elastic Security Labs is the threat intelligence branch of Elastic Security dedicated to creating positive change in the threat landscape. Elastic Security Labs provides publicly available research on emerging threats with an analysis of strategic, operational, and tactical adversary objectives, then integrates that research with the built-in detection and response capabilities of Elastic Security.
Follow Elastic Security Labs on Twitter @elasticseclabs and check out our research at www.elastic.co/security-labs/.
Facts Only
In 2025 and 2026, attackers shifted focus from production servers to CI/CD automation, compromising developer credentials to modify workflow files and exfiltrate secrets.
Major incidents included the GhostAction campaign (September 2025), which stole 3,325 secrets from 327 GitHub users across 817 repositories.
The Shai-Hulud npm worm (2025-2026) harvested GitHub Personal Access Tokens and published over 46,000 malicious packages.
The HackerBot-Claw campaign (February 2026) exploited misconfigured `pullrequesttarget` triggers in GitHub Actions, compromising repositories like Aqua Security's Trivy and exposing 33,000 secrets.
Elastic Security Labs developed *cicd-abuse-detector*, an open-source tool for detecting malicious CI/CD pipeline changes across GitHub Actions, GitLab CI, and Azure DevOps.
The tool uses regex-based signal extraction (50+ patterns) and LLM analysis to evaluate diffs for threats like secret harvesting, privilege escalation, and supply chain manipulation.
Detection patterns were tested against offensive toolkits (Nord Stream, Gato-X) and real incidents (ArtiPACKED, HackerBot-Claw).
The detector includes 19 malicious and 4 benign example diffs, with an automated test suite validating signal coverage.
MITRE ATT&CK techniques mapped to CI/CD threats include T1552 (Unsecured Credentials), T1195.002 (Supply Chain Compromise), and T1070.006 (Timestomp).
Defensive recommendations include pinning actions to SHA, scoping secrets to individual steps, and avoiding `pullrequesttarget` triggers.
The tool integrates with Elasticsearch for cross-platform correlation of CI/CD abuse alerts.
Executive Summary
CI/CD pipelines have emerged as a critical attack vector in software supply chain security, with attackers increasingly targeting automation workflows rather than production servers directly. Between 2025 and 2026, incidents across major open-source projects, Fortune 500 companies, and critical infrastructure demonstrated a consistent pattern: compromised developer credentials leading to modified workflow files, which then harvested secrets from CI/CD environments. Notable campaigns like GhostAction (3,325 stolen secrets) and Shai-Hulud (46,000 malicious npm packages) exploited these vulnerabilities at scale. The attack chain typically involves stolen credentials, modified workflow files, secret harvesting, and lateral movement to cloud infrastructure.
In response, Elastic Security Labs has open-sourced *cicd-abuse-detector*, a tool designed to detect suspicious changes in CI/CD pipelines across GitHub Actions, GitLab CI, and Azure DevOps. The tool uses regex-based signal extraction and LLM analysis to identify malicious patterns, such as secret exfiltration, privilege escalation, and supply chain manipulation. It has been validated against real-world offensive toolkits like Nord Stream and Gato-X, as well as documented incidents like ArtiPACKED and HackerBot-Claw. The detector operates in four stages: filtering and diffing changed files, extracting signals via regex patterns, analyzing diffs with an LLM for structured threat assessment, and alerting or gating based on severity. The system also supports cross-platform correlation through Elasticsearch, enabling organizations to track coordinated attacks across multiple repositories and CI/CD platforms.
Full Take
The rise of CI/CD pipeline attacks reflects a broader shift in cybersecurity threats, where automation—once a force multiplier for development—has become a force multiplier for adversaries. The *cicd-abuse-detector* tool addresses a critical gap: traditional code review struggles to catch subtle, platform-specific exploits disguised as legitimate DevOps changes. By combining regex-based signal extraction with LLM reasoning, the tool surfaces patterns that might otherwise evade detection until it’s too late. This hybrid approach acknowledges the limitations of both rule-based systems (which miss novel attacks) and pure AI (which lacks contextual grounding). The inclusion of real-world validation against offensive toolkits and documented incidents lends credibility, though the effectiveness of LLM-based analysis in production environments remains an open question—false positives and negatives could undermine trust.
The broader implication is that CI/CD security is no longer just about protecting secrets but about defending the integrity of the software delivery process itself. The attack patterns described—credential harvesting, privilege escalation, and supply chain manipulation—echo historical paradigms in cybersecurity, where trust boundaries (like the perimeter) are exploited once they’re assumed to be secure. The tool’s cross-platform correlation capability is particularly notable, as it recognizes that modern attacks are rarely confined to a single system or vendor. However, the reliance on Elasticsearch for logging and correlation may introduce complexity for organizations without existing Elastic deployments.
**Bridge Questions:**
How might adversaries adapt to evade regex + LLM detection, and what countermeasures could mitigate these evasions?
What are the trade-offs between false positives (alert fatigue) and false negatives (missed attacks) in this hybrid detection model?
How does this tool integrate with existing CI/CD security practices, such as static analysis or runtime protection?
**Counterstrike Scan:**
A coordinated influence campaign pushing this narrative might emphasize the inevitability of CI/CD attacks to drive adoption of specific security tools or services. However, the article’s focus on open-sourcing the detector, providing testable examples, and referencing peer-reviewed incidents (e.g., MITRE ATT&CK mappings) aligns with legitimate threat research rather than manipulative messaging. No structural alignment with an attack playbook is detected.
**Patterns detected:** None.
