Security Automation with Elastic Workflows: From Alert to Response

The daily loop

An alert fires. You open it. You read through the details. You gather context from the surrounding activity. You check for related signals across your environment. You decide what it means and what to do next. Sometimes you escalate. Sometimes you close it and move on.

You do this dozens of times a day. The steps are almost always the same. The data you need is already in your SIEM. The actions you take are predictable. But the work is still manual.

This is the kind of work that automation should handle. Not because it's hard, but because it's repetitive, and every minute spent on repetitive manual triage is a minute not spent on the alerts that actually need a human.

Elastic Workflows brings that automation into the SIEM itself. No separate tool. No integration to build. Your detection rule fires, and a workflow runs, with direct access to your alerts, cases, and security data.

This blog post walks through building a security playbook with Workflows, step by step. We'll start simple and build up to a workflow that runs when an alert fires, checks threat intel, gathers context, creates cases, notifies the team, and brings in AI when the investigation calls for it.

If you're new to Workflows, the introductory technical deep dive blog and video cover the core concepts of Workflows. This post focuses on applying these concepts in a security context.

Quick orientation

Workflows are YAML definitions that run inside Kibana. You define what should happen, and the platform handles execution. At a high level, a workflow is composed of three main parts: triggers (when it runs), steps (what it does), and data flow (how information moves between steps).

Triggers decide when the workflow runs. An alert trigger runs on a detection. A scheduled trigger runs on a cadence. A manual trigger runs on demand. A workflow can have more than one.

Steps define what the workflow does. They run in order and can use outputs from earlier steps. They can query data in Elasticsearch, update alerts and cases in Kibana, and call external systems like sending a Slack message or scanning a hash on VirusTotal. They can also apply logic such as conditionals or loops, and use AI for tasks like summarizing text, prompting an LLM, or invoking agents when deeper reasoning is needed.

This is the toolkit. With these primitives, you can build workflows that take a signal, gather context, and drive a response.

Building a security playbook

We'll build an alert triage workflow incrementally. Each section adds a capability, and by the end, you'll have a working playbook that handles the full triage loop.

Start with the trigger

Security workflows start with an event. It could be an alert, a case update, a user action, or a scheduled check. The workflow takes that signal, gathers context, and decides what to do next.

We’ll start with alert triage. It’s the most common path, and it shows the full loop end to end. Each section adds a capability, and by the end, you’ll have a working playbook.

Here’s a minimal workflow with an alert trigger:

description: Enriches alerts, checks threat intel, creates a case, and notifies the team.

enabled: true

tags:

security
triage

triggers:

type: alert

steps:

we'll build these out

The alert

trigger connects this workflow to detection rules. You link a specific rule to this workflow from the rule's Actions settings in Kibana. When the rule fires, the workflow runs and receives the full alert context through the event

variable. That includes event.alerts

(the alert documents), event.rule

(the rule metadata), and every field on the alert.

From here, you start adding steps.

Check threat intel

The first real step: take the file hash from the alert and check it against VirusTotal. Workflows have a built-in VirusTotal connector, so you don't need to construct HTTP requests or manage API keys in your YAML (connector credentials like VirusTotal API keys or Slack tokens are configured once in the connector under Stack Management > Connectors):

name: check_virustotal

type: virustotal.scanFileHash

connector-id: "my-virustotal"

with:

hash: "{{ event.alerts[0].file.hash.sha256 }}"

on-failure:

retry:

max-attempts: 2

delay: 3s

continue: true

Every step in a workflow follows a simple, consistent structure. It starts with a name

, which gives the step a clear identity, and a type

, which defines the action being performed. In this case, the step calls the VirusTotal file hash scan capability. Because this is a connector-backed action, it also includes a connector-id

, which tells the workflow which configured integration to use, including its credentials.

The with

block is where you pass inputs into the step. Each step type defines the parameters it accepts. Here, you provide the file hash to scan. Rather than hardcoding values, workflows use a built-in templating engine powered by LiquidJS. The {{ }}

syntax lets you reference data from the execution context, so the hash is pulled directly from the alert that triggered the workflow.

Finally, the on-failure

block defines how the step behaves if something goes wrong. In this case, it retries twice with a short delay and continues execution even if the lookup fails. This is important in production workflows, where a transient external API issue should not block the entire triage process.

Gather context with ES|QL

Next, query for related alerts on the same host. ES|QL runs directly against your security indices, so there's no API bridging or credential management:

name: related_alerts

type: elasticsearch.esql.query

with:

query: |

FROM .alerts-security*

| WHERE host.name == "{{ event.alerts[0].host.name }}"

| WHERE @timestamp > NOW() - 24 hours

| STATS

alert_count = COUNT(*),

rules_triggered = VALUES(kibana.alert.rule.name),

users_involved = VALUES(user.name)

format: json

This tells you whether the host has been generating other alerts, which rules triggered, and which users were involved. That context is included in the case description and informs the severity assessment later.

The same approach works for any enrichment that touches data in Elasticsearch: looking up a user's first-seen date, checking how many times a hash has appeared in your logs, or pulling the process tree from endpoint data. If the data is in your cluster, ES|QL can get it.

Branch on findings

Now the workflow needs to decide what to do. If VirusTotal flagged the file as malicious, create a case and respond. If not, close the alert as a false positive:

name: check_malicious

type: if

condition: steps.check_virustotal.output.stats.malicious > 5

steps:

true positive path: steps below

else:

name: close_false_positive

type: kibana.SetAlertsStatus

with:

status: closed

reason: false_positive

signal_ids:

"{{ event.alerts[0]._id }}"

The if

step evaluates a condition and runs different steps depending on the result. The false positive path closes the alert in a single step. The true positive path continues below.

Create a case

When the alert is confirmed malicious, open a case with context from previous steps:

name: create_case

type: kibana.createCase

with:

title: "Malware Detected: {{ event.alerts[0].file.hash.sha256 }}"

description: |

Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

Detection: {{ event.rule.name }}

User: {{ event.alerts[0].user.name }}

VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file

Related alerts (24h): {{ steps.related_alerts.output.values[0][0] }}

alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules

owner: securitySolution

severity: high

tags:

automation
malware

settings:

syncAlerts: false

connector:

id: none

type: ".none"

fields: null

Liquid templating pulls data from the alert (event

), from the VirusTotal results (steps.check_virustotal.output

), and from the ES|QL query (steps.related_alerts.output

). Every field from every previous step is available to every subsequent step.

Notify the team

Send a Slack message so the team knows a confirmed case is open:

name: notify_team

type: slack

connector-id: "security-alerts"

with:

message: |

Malware confirmed on {{ event.alerts[0].host.name }}.

VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.

Case created: {{ steps.create_case.output.id }}

Slack is one option. Jira, ServiceNow, PagerDuty, Microsoft Teams, email, and Opsgenie are all supported as connector steps.

The complete workflow

Here's the full workflow assembled:

description: Enriches alerts, checks threat intel, creates a case, and notifies the team.

enabled: true

tags:

security
triage

triggers:

type: alert

steps:

name: check_virustotal

type: virustotal.scanFileHash

connector-id: "my-virustotal"

with:

hash: "{{ event.alerts[0].file.hash.sha256 }}"

on-failure:

retry:

max-attempts: 2

delay: 3s

continue: true

name: related_alerts

type: elasticsearch.esql.query

with:

query: |

FROM .alerts-security*

| WHERE host.name == "{{ event.alerts[0].host.name }}"

| WHERE @timestamp > NOW() - 24 hours

| STATS

alert_count = COUNT(*),

rules_triggered = VALUES(kibana.alert.rule.name),

users_involved = VALUES(user.name)

format: json

name: check_malicious

type: if

condition: steps.check_virustotal.output.stats.malicious > 5

steps:

name: create_case

type: kibana.createCase

with:

title: "Malware Detected: {{ event.alerts[0].file.hash.sha256 }}"

description: |

Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

Detection: {{ event.rule.name }}

User: {{ event.alerts[0].user.name }}

VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file

Related alerts (24h): {{ steps.related_alerts.output.values[0][0] }}

alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules

owner: securitySolution

severity: high

tags:

automation
malware

settings:

syncAlerts: false

connector:

id: none

type: ".none"

fields: null

name: notify_team

type: slack

connector-id: "security-alerts"

with:

message: |

Malware confirmed on {{ event.alerts[0].host.name }}.

VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.

Case created: {{ steps.create_case.output.id }}

else:

name: close_false_positive

type: kibana.SetAlertsStatus

with:

status: closed

reason: false_positive

signal_ids:

"{{ event.alerts[0]._id }}"

That's the triage loop, automated. Alert fires, threat intel checked, context gathered, decision made, case created, team notified. Every execution is logged and auditable.

This is a starting point. The traditional-triage.yaml in the Elastic Workflows library on GitHub goes further: it isolates the host, looks up the on-call analyst, creates a dedicated Slack channel, assigns the case, and posts a rich incident summary. Same patterns, more steps.

Adding AI to the playbook

The workflow above handles a defined path. If the hash is malicious, do X; otherwise, do Y. That covers a lot of triage work. But not every alert fits a clean branching condition, and not every case description should be a list of raw fields.

Workflows include AI steps that handle the parts where structured logic runs out. There are three, and they work together.

Classify: let AI drive the branching

Instead of branching on a VirusTotal score threshold, use ai.classify

to categorize the alert. It considers the full alert context, not just a single number:

name: classify_alert

type: ai.classify

with:

input: "${{ event }}"

categories:

malware
phishing
lateral_movement
data_exfiltration
false_positive

instructions: |

Classify this security alert based on the alert details,

rule name, and affected entities.

includeRationale: true

The output is structured: steps.classify_alert.output.category

returns a single string like "malware"

or "false_positive"

. That drives the if

condition directly. The rationale explains why, and you can include it in the case for audit purposes.

Summarize: write case descriptions that adapt

Rather than templating raw field values into a case description, use ai.summarize

to generate a readable overview. Run it once before case creation for the initial description, and once after the agent investigation to update the description with the full picture:

name: initial_summary

type: ai.summarize

with:

input: "${{ event }}"

instructions: |

Write a one-paragraph overview of this security alert.

State what was detected, on which host, by which user, and the severity.

Do not include recommendations. Just the facts.

maxLength: 300

The summary adapts to whatever fields are present on the alert, so you don't need to account for every possible field combination in your Liquid templates. Use steps.initial_summary.output.content

in the case description and the Slack notification.

Agent: investigate what the playbook can't

The ai.agent

step invokes an Agent Builder agent. Unlike classify and summarize, an agent has access to tools. It can query your indices, check threat intel, correlate signals across data sources, and reason about what it finds:

name: escalate_to_agent

type: ai.agent

agent-id: "security-agent"

create-conversation: true

with:

message: |

Investigate this alert. Search for related activity on this host,

check for persistence mechanisms and lateral movement,

and determine the full scope of the incident.

Alert: {{ event | json }}

Classification: {{ steps.classify_alert.output.category }}

VirusTotal: {{ steps.check_virustotal.output | json }}

Related alerts: {{ steps.related_alerts.output | json }}

timeout: 10m

The agent processes the input, calls whatever tools it needs, and returns its findings. The workflow waits, then continues with the next steps: adding the investigation to the case, notifying the team, and updating the case description with a concise summary of what the agent found.

Setting create-conversation: true

persists the conversation, so the workflow can fetch the agent's reasoning trail and add it to the case as a structured comment with clickable links to each query it ran. And the analyst gets a direct link to pick up the conversation with the agent if they want to dig deeper.

Putting it together

In the full version of this workflow, the three AI steps work in sequence:

Classify the alert to drive the triage decision
Summarize the alert for the initial case description and Slack notification
Agent investigates the full scope: persistence, lateral movement, IOCs, affected systems
Summarize again, this time distilling the agent's findings into a concise, updated case description

The case starts with a clean factual overview and evolves into a comprehensive summary as the investigation completes. The agent's full analysis and reasoning trail live as case comments for analysts who want the details.

The complete workflow, including the AI investigation pipeline with reasoning trails, clickable Discover links, and follow-up Slack notifications, is available in the Elastic Workflows library on GitHub.

Workflows as agent tools

The integration between Workflows and Agent Builder works in both directions. Workflows can call agents (as shown above). And agents can call workflows.

When you expose a workflow as a tool in Agent Builder, an agent can invoke it during a conversation. The agent decides what needs to happen, and the workflow handles the execution reliably and repeatably.

This is the pattern demonstrated in the Chrysalis APT blog post: a two-step workflow hands the entire Attack Discovery to an agent, and the agent calls workflow-backed tools to verify malware hashes, search logs, check the on-call schedule, create a case, and spin up a Slack channel. The workflow is the trigger and the safety net. The agent is the brain.

Agents reason. Workflows execute. Together they cover the full range from judgment to action.

Open by design

Not every team starts from zero. Some already have automation running in Tines, Splunk SOAR, Palo Alto XSOAR, or another platform. Workflows don't ask you to replace any of your existing tools.

The idea is straightforward: use Workflows for the parts of your automation that are native to Elastic. Alert triage, enrichment from your own indices, case management, and alert status updates. These touch your Elastic data directly, and a native workflow will always be simpler and faster than an external tool making API calls back into Elastic.

For everything else, connectors bridge the gap. We have native connectors for Tines, Resilient, Swimlane, TheHive, D3 Security, Torq, and XSOAR. A workflow can kick off a Tines story, push an incident to Resilient, or trigger any external system via HTTP. Your existing tools handle cross-platform orchestration. Workflows handle what's native. As the capability grows, you can consolidate at your own pace. Nobody's forcing a migration.

What's here and what's next

Workflows is available today. Here's what you can build with it today:

Alert triggers connect workflows to detection and alerting rules
Case and alert management through named Kibana steps (

kibana.createCase

,kibana.SetAlertsStatus

,kibana.addCaseComment

, and more) - Direct data access via Elasticsearch search and ES|QL

39 workflow-compatible connectors covering threat intel (VirusTotal, AbuseIPDB, GreyNoise, Shodan, URLVoid, AlienVault OTX), ticketing (Jira, ServiceNow), communication (Slack, Teams, PagerDuty, email), SOAR platforms (Tines, Resilient, Swimlane, TheHive, and others), and AI providers
AI steps for classification, summarization, prompts, and Agent Builder invoking Elastic Agents/Skils
YAML authoring with autocomplete, validation, and step testing in Kibana
50+ example workflows on GitHub, including security-specific templates for detection, enrichment, and response

What's coming:

Visual workflow builder for drag-and-drop authoring
In-product template library to browse and install workflows directly in Kibana
Human-in-the-loop approvals that pause workflows for human input via Slack, email, or the Kibana UI
Natural language authoring where AI helps translate intent into working workflows

Today, authoring is YAML-based. If you've written detection rules or configured CI/CD pipelines, the learning curve is gentle. The editor has built-in autocomplete, validation, and step testing, and the example library gives you templates to start from. A visual builder is coming to make this accessible to a wider audience.

Get started

Elastic Workflows is available now. To start building:

Start an Elastic Cloud trial or enable Workflows in your existing deployment under Stack Management > Advanced Settings
Explore the Workflows documentation
Browse the Elastic Workflow Library on GitHub for security templates you can adapt
Read the introductory technical deep dive for core concepts
See the Chrysalis APT blog for a complete Attack Discovery + Workflows + Agent Builder walkthrough

Start with the workflow that would save you the most time tomorrow.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Facts Only

Elastic Workflows automates security tasks within SIEM systems.
Workflows are defined in YAML and executed in Kibana.
Triggers include alerts, scheduled events, or manual activation.
Steps can query Elasticsearch, update alerts, call external APIs, and use AI.
Connectors integrate with services like VirusTotal, Slack, and Jira.
AI steps include classification, summarization, and agent-based investigation.
Workflows can create cases, notify teams, and close false positives.
The system is available now with future updates planned for visual editing and approvals.
Elastic provides a library of example workflows on GitHub.
The solution is designed to work alongside existing automation tools.

Executive Summary

Elastic Workflows introduces automation directly into SIEM systems to streamline repetitive security tasks like alert triage. The platform allows users to create workflows triggered by alerts, which can then perform actions such as checking threat intelligence, gathering context from Elasticsearch, creating cases, and notifying teams. These workflows are defined in YAML and run within Kibana, eliminating the need for separate tools or complex integrations. The system supports connectors for external services like VirusTotal, Slack, and Jira, as well as AI-driven steps for classification, summarization, and deeper investigation using Elastic’s Agent Builder. The goal is to reduce manual effort in routine tasks, freeing analysts to focus on more complex threats. Elastic provides pre-built workflow templates and plans to expand capabilities with a visual builder and human-in-the-loop approvals. The solution is designed to complement existing automation tools rather than replace them, offering a flexible approach to security orchestration.

Full Take

This article presents Elastic Workflows as a solution to the repetitive, manual nature of security operations, positioning it as a native automation tool within the Elastic ecosystem. The strongest version of this narrative highlights genuine pain points in SOC workflows—analysts spending excessive time on routine triage—and offers a technically sound response by embedding automation directly into the SIEM. The integration of AI for classification, summarization, and investigation is particularly notable, as it addresses the limitations of rigid, rule-based automation in dynamic threat landscapes.
However, the pattern scan reveals subtle elements of **ARC-0024 Ambiguity** in the framing of AI’s role. While the article emphasizes AI’s ability to "handle parts where structured logic runs out," it doesn’t fully address the risks of over-reliance on AI for critical decisions or the potential for false positives in classification. The narrative also leans into **ARC-0043 Motte-and-Bailey** by presenting workflows as both a simple tool for repetitive tasks ("not because it's hard, but because it's repetitive") and a sophisticated AI-driven investigation platform ("agent investigates the full scope"). This dual framing could create unrealistic expectations about the system’s autonomy versus its actual need for human oversight.
The root cause of this narrative is the broader industry push toward automation as a panacea for security operations challenges. The unstated assumption is that reducing manual effort inherently improves security outcomes, but this overlooks the potential for automation to introduce new blind spots or amplify biases in threat detection. The implications for human agency are significant: while workflows free analysts from mundane tasks, they also risk marginalizing the nuanced judgment that experienced professionals bring to complex incidents.
Bridge questions to consider: How might the reliance on AI-driven classification introduce new vulnerabilities, such as adversarial attacks on the training data? What safeguards are in place to ensure that automated case creation doesn’t overwhelm analysts with low-quality alerts? And how does this approach balance the efficiency gains of automation with the need for transparency in security decision-making?
Counterstrike scan: If this were part of a coordinated influence campaign, the playbook would emphasize automation as a silver bullet, downplaying the need for human oversight while framing manual processes as inherently flawed. The actual content doesn’t fully match this pattern—it acknowledges the role of human analysts and provides mechanisms for auditability—but the framing still leans heavily toward automation as the primary solution. A more balanced approach would explicitly address the limitations and risks of AI-driven security workflows.
Patterns detected: ARC-0024 Ambiguity, ARC-0043 Motte-and-Bailey

Sentinel — Human

Confidence

The article exhibits strong human authorship signals, including domain expertise, technical depth, and stylistic variability inconsistent with AI generation.

Signals Detected

Sentence length variance is high, with a mix of short and long sentences, inconsistent with typical AI-generated uniformity.

Text exhibits strong domain-specific expertise and idiosyncratic emphasis (e.g., detailed YAML examples, specific tool references), which is atypical for AI-generated content.

Human Indicators

Deep technical specificity (e.g., ES|QL queries, LiquidJS templating) that reflects hands-on experience.

Idiosyncratic phrasing and domain-specific jargon (e.g., 'Chrysalis APT blog post') that aligns with human-authored technical documentation.

Structural complexity (e.g., nested YAML, conditional logic) that exceeds typical AI-generated output.