Advanced โฑ 180 min ๐Ÿ“‹ 12 Steps

Build a Multi-Server Security AI Agent

Design and implement a security AI agent that orchestrates multiple MCP servers. Sentinel, Defender XDR, and custom tools. to perform autonomous security investigations, correlate findings, and execute response actions.

๐Ÿ“‹ Overview

About This Lab

In this lab you will build an AI agent that orchestrates multiple MCP servers. Sentinel, Defender XDR, and custom tools. to perform autonomous, end-to-end security investigations. The agent discovers available tools at runtime, plans multi-step investigations using an LLM, correlates findings across data sources, and executes response actions with built-in safety controls.

๐Ÿข Enterprise Use Case

A SOC team needs an AI agent that can correlate data across Microsoft Sentinel, Defender XDR, and custom internal tools to perform complete security investigations. from initial alert triage through evidence collection to recommended (or automated) response actions.

By connecting multiple MCP servers to a single agent, the team eliminates manual context-switching between portals and enables faster, more thorough investigations that span the entire security stack.

๐ŸŽฏ What You Will Learn

  1. Design a multi-server MCP architecture with tool namespacing
  2. Implement dynamic tool discovery across connected servers
  3. Build an LLM-powered planning loop for multi-step investigations
  4. Maintain investigation memory and context across tool calls
  5. Construct the agent loop with iterative reasoning and action
  6. Craft effective system prompts that guide agent behavior
  7. Correlate findings across Sentinel, XDR, and custom data sources
  8. Implement safety controls and human-in-the-loop approval gates
  9. Add streaming output for real-time investigation progress
  10. Test and validate the agent with realistic investigation scenarios

๐Ÿ”‘ Why This Matters

This lab is the culmination of the MCP series. combining everything you have learned into a fully autonomous security agent. By orchestrating multiple MCP servers behind an LLM-powered planning loop, you can handle end-to-end investigations that would otherwise require analysts to manually pivot across multiple tools and portals, dramatically accelerating detection and response.

โš™๏ธ Prerequisites

  • Completed Lab 03. MCP servers deployed to Azure with SSE transport
  • Sentinel and Defender XDR MCP servers. running and accessible via their SSE endpoints
  • Azure OpenAI or OpenAI API access. GPT-4.1 model with function calling support
  • Python 3.10+. with async support and the openai, mcp, and httpx packages
  • Understanding of LLM agent architectures. tool orchestration and ReAct patterns
๐Ÿ’ก Pro Tip: This lab builds the most advanced component in the MCP series. an autonomous security AI agent. Take your time to understand each step before moving on. The concepts here extend to any AI agent framework, not just MCP.

Step 1 ยท Understand Multi-Server MCP Architecture

A multi-server MCP architecture connects a single AI agent to multiple MCP servers, each providing specialised tools. The agent discovers all tools across servers and orchestrates them for complex, multi-step security investigations.

Architecture Diagram

Multi-Server Agent Architecture
Security AI Agent
Planning
(LLM) GPT-4.1
Execution
(MCP Client)
Memory
findings, hypotheses
Sentinel
MCP Server
XDR
MCP Server
Entra
MCP Server
Log Analytics
API
Graph Security
API
Entra ID
API
๐Ÿ’ก Pro Tip: Think of the multi-server agent as a SOC analyst with access to multiple security consoles: Sentinel for log queries, Defender XDR for incident management, and Entra ID for identity investigation. The agent coordinates all of these for a coherent investigation.

Step 2 ยท Set Up the Agent Project

Create the project structure for the multi-server security AI agent.

Project Setup

# Create the multi-server agent project
mkdir security-ai-agent && cd security-ai-agent
python -m venv .venv && .venv\Scripts\activate

# Install dependencies:
#   mcp[cli]     - MCP client SDK for connecting to remote MCP servers
#   openai       - LLM API for planning and reasoning (GPT-4.1)
#   httpx        - Async HTTP client for SSE transport connections
#   python-dotenv - Load configuration from .env files
pip install "mcp[cli]" openai httpx python-dotenv

# Project structure: separate concerns into focused modules
mkdir -p src tests
touch src/__init__.py src/agent.py src/registry.py
touch src/memory.py src/planner.py src/config.py
touch .env requirements.txt

Configuration (src/config.py)

import os
from dotenv import load_dotenv
load_dotenv()

# MCP Server registry: each entry defines a remote MCP server endpoint
# The agent connects to all servers at startup and discovers their tools
# Tools are namespaced by server name (e.g., sentinel.run_kql_query)
MCP_SERVERS = {
    "sentinel": {
        "url": os.environ.get("SENTINEL_MCP_URL", "http://localhost:8000/sse"),
        "token": os.environ.get("SENTINEL_MCP_TOKEN", ""),
        "description": "Sentinel KQL queries and incident retrieval"
    },
    "xdr": {
        "url": os.environ.get("XDR_MCP_URL", "http://localhost:8001/sse"),
        "token": os.environ.get("XDR_MCP_TOKEN", ""),
        "description": "Defender XDR incident management and hunting"
    }
}

# LLM configuration for the planning/reasoning engine
OPENAI_MODEL = os.environ.get("OPENAI_MODEL", "gpt-4.1")
# Safety limit: max tool calls per investigation to prevent runaway agents
MAX_INVESTIGATION_STEPS = int(os.environ.get("MAX_STEPS", "20"))
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "")

Step 3 ยท Build the Multi-Server Tool Registry

The tool registry discovers tools from all connected MCP servers and builds a unified catalog with server-qualified names (e.g., sentinel.run_kql_query, xdr.get_incident).

Tool Registry (src/registry.py)

import json
from mcp import ClientSession
from mcp.client.sse import sse_client

class ToolRegistry:
    """Unified tool registry across multiple MCP servers.
    Connects to each server, discovers its tools, and creates
    a merged catalog with server-qualified names.
    Example: sentinel server's 'run_kql_query' becomes 'sentinel.run_kql_query'.
    """

    def __init__(self):
        self.tools: dict[str, dict] = {}          # qualified_name โ†’ tool metadata
        self.sessions: dict[str, ClientSession] = {}  # server_name โ†’ MCP session
        self._tool_to_server: dict[str, str] = {}     # qualified_name โ†’ server_name

    async def connect_server(self, name: str, url: str, token: str = ""):
        """Connect to an MCP server via SSE and register its tools.
        Performs the MCP handshake, discovers tools, and adds them
        to the unified registry with server-qualified names.
        """
        # Establish SSE connection with optional Bearer token auth
        headers = {}
        if token:
            headers["Authorization"] = f"Bearer {token}"

        # Connect to the SSE endpoint and initialize the MCP session
        transport = sse_client(url, headers=headers)
        read, write = await transport.__aenter__()
        session = ClientSession(read, write)
        await session.initialize()  # MCP protocol handshake

        self.sessions[name] = session

        # Discover all tools from this server and register with qualified names
        # e.g., server "sentinel" + tool "run_kql_query" = "sentinel.run_kql_query"
        tools_response = await session.list_tools()
        for tool in tools_response.tools:
            qualified_name = f"{name}.{tool.name}"
            self.tools[qualified_name] = {
                "name": tool.name,
                "server": name,
                # Prefix description with server name for AI disambiguation
                "description": f"[{name}] {tool.description}",
                "input_schema": tool.inputSchema
            }
            self._tool_to_server[qualified_name] = name

        print(f"Connected to {name}: {len(tools_response.tools)} tools")

    async def execute(self, qualified_name: str, arguments: dict):
        """Execute a tool on the appropriate MCP server.
        Routes the call to the correct server based on the qualified name prefix.
        Example: 'sentinel.run_kql_query' โ†’ calls run_kql_query on sentinel server.
        """
        server_name = self._tool_to_server.get(qualified_name)
        if not server_name:
            return {"error": f"Unknown tool: {qualified_name}"}

        tool_name = qualified_name.split(".", 1)[1]
        session = self.sessions[server_name]

        result = await session.call_tool(tool_name, arguments)
        return result.content[0].text if result.content else ""

    def get_openai_tools(self) -> list[dict]:
        """Format all discovered tools for OpenAI function calling.
        Converts MCP tool schemas to the OpenAI tools format
        so the LLM can select and invoke tools by qualified name.
        """
        return [
            {
                "type": "function",
                "function": {
                    "name": qname,
                    "description": info["description"],
                    "parameters": info["input_schema"]
                }
            }
            for qname, info in self.tools.items()
        ]

    async def disconnect_all(self):
        """Clean up all server connections gracefully."""
        for session in self.sessions.values():
            await session.__aexit__(None, None, None)

Step 4 ยท Build the Investigation Memory

The memory system tracks the investigation objective, steps taken, tool results, findings, and current hypotheses. It is included in every LLM prompt to maintain context.

Investigation Memory (src/memory.py)

from datetime import datetime, timezone
from dataclasses import dataclass, field

@dataclass
class MemoryEntry:
    """A single step in the investigation timeline."""
    timestamp: str
    type: str           # observation, finding, hypothesis, action
    tool_used: str      # Qualified tool name (e.g., sentinel.run_kql_query)
    summary: str        # Brief description of what was learned
    raw_data: str = ""  # Truncated tool output for context window efficiency

@dataclass
class InvestigationMemory:
    """Tracks the full state of an ongoing investigation.
    Included in every LLM prompt to maintain context across steps.
    Stores: objective, steps taken, key findings, and affected entities.
    """
    objective: str                                          # What we're investigating
    entries: list[MemoryEntry] = field(default_factory=list) # Timeline of steps
    findings: list[str] = field(default_factory=list)        # Confirmed findings
    affected_entities: list[str] = field(default_factory=list) # Users, devices, IPs

    def add_step(self, tool_name: str, arguments: dict,
                 result: str, summary: str, entry_type: str = "observation"):
        """Record a tool invocation step in the investigation timeline."""
        self.entries.append(MemoryEntry(
            timestamp=datetime.now(timezone.utc).isoformat(),
            type=entry_type,
            tool_used=tool_name,
            summary=summary
        ))

    def add_finding(self, finding: str):
        self.findings.append(finding)

    def add_entity(self, entity: str):
        if entity not in self.affected_entities:
            self.affected_entities.append(entity)

    def get_context(self, max_entries: int = 15) -> str:
        """Build context string for the LLM prompt.
        Includes: objective, recent steps, key findings, affected entities.
        Truncates to max_entries to fit within the LLM context window.
        This is injected into every planning prompt to maintain state.
        """
        recent = self.entries[-max_entries:]
        steps = "\n".join([
            f"  [{e.timestamp}] ({e.type}) {e.tool_used}: {e.summary}"
            for e in recent
        ])
        return f"""INVESTIGATION OBJECTIVE: {self.objective}

STEPS COMPLETED ({len(self.entries)}):
{steps}

KEY FINDINGS: {'; '.join(self.findings) if self.findings else 'None yet'}
AFFECTED ENTITIES: {', '.join(self.affected_entities) if self.affected_entities else 'None identified yet'}
"""

Step 5 ยท Create the System Prompt & LLM Planner

The system prompt guides the agent’s behaviour during investigations. Include guardrails: never take destructive actions without approval, verify findings with multiple sources, and document every step.

Planner Module (src/planner.py)

from openai import AsyncOpenAI
from config import OPENAI_MODEL, OPENAI_API_KEY

# Initialize the OpenAI client for LLM-powered planning
client = AsyncOpenAI(api_key=OPENAI_API_KEY)

# System prompt: the agent's operating manual
# Defines investigation methodology, safety rules, and tool naming
# This is the most critical piece - it determines agent behavior
SYSTEM_PROMPT = """You are a Security AI Agent with access to multiple
Microsoft security products via MCP tools. Your role is to investigate
security incidents methodically and thoroughly.

INVESTIGATION METHODOLOGY:
1. Start by understanding the scope. list recent incidents or run a broad query
2. Gather details on the most critical finding
3. Enrich affected entities (users, devices, IPs) across all available sources
4. Correlate findings across Sentinel AND Defender XDR for complete picture
5. Assess severity and business impact
6. Recommend containment and remediation actions
7. Generate a comprehensive investigation report

SAFETY RULES:
- NEVER execute destructive actions (isolate, disable) without explicit approval
- Always verify critical findings with at least 2 data sources
- If confidence is below 70%, escalate to a human analyst
- Document every step taken with reasoning
- Use read-only tools before write tools
- When unsure which tool to use, check available tools first

TOOL NAMING: Tools are prefixed with their server name:
- sentinel.*. Sentinel KQL queries, table listing, incidents
- xdr.*. Defender XDR incidents, hunting, device actions, entity enrichment
"""

async def plan_next_step(context: str, tools: list[dict]):
    """Ask the LLM to decide the next investigation step.
    The LLM receives: system prompt + investigation context + available tools.
    It returns either a tool call (continue investigating) or a text
    response (investigation complete, deliver findings).
    Temperature=0.1 ensures consistent, deterministic reasoning.
    """
    response = await client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": context}
        ],
        tools=tools,
        tool_choice="auto",
        temperature=0.1  # Low temperature for consistent reasoning
    )
    return response.choices[0]
โš ๏ธ Important: The system prompt is the agent’s operating manual. Invest time in crafting clear, specific instructions. Vague prompts lead to inefficient investigations; overly restrictive prompts prevent the agent from being useful. Test and iterate.

Step 6 ยท Implement the Agent Loop

The agent loop is the core: receive objective → plan next step → execute tool → store result → repeat until investigation is complete or the step budget is exhausted.

Agent Core (src/agent.py)

import json
import asyncio
from registry import ToolRegistry
from memory import InvestigationMemory
from planner import plan_next_step
from config import MCP_SERVERS, MAX_INVESTIGATION_STEPS

class SecurityAgent:
    """Multi-server security AI agent.
    Orchestrates multiple MCP servers (Sentinel, XDR, Entra)
    using an LLM-powered planning loop for autonomous investigations.
    """

    def __init__(self):
        self.registry = ToolRegistry()

    async def connect(self):
        """Connect to all configured MCP servers and discover tools."""
        for name, config in MCP_SERVERS.items():
            try:
                await self.registry.connect_server(
                    name, config["url"], config.get("token", "")
                )
                print(f"โœ“ Connected to {name}")
            except Exception as e:
                print(f"โœ— Failed to connect to {name}: {e}")

    async def investigate(self, objective: str,
                          max_steps: int = MAX_INVESTIGATION_STEPS) -> dict:
        """Run an autonomous, multi-step investigation.
        Core agent loop: plan โ†’ execute โ†’ observe โ†’ repeat.
        Continues until the LLM signals completion or the step budget runs out.
        Returns: dict with objective, steps taken, findings, and entities.
        """
        memory = InvestigationMemory(objective=objective)
        print(f"\n๐Ÿ” Starting investigation: {objective}\n")

        for step_num in range(1, max_steps + 1):
            print(f"--- Step {step_num}/{max_steps} ---")

            # Ask the LLM to plan the next step based on current context
            # The LLM sees: system prompt, investigation memory, available tools
            context = memory.get_context()
            tools = self.registry.get_openai_tools()
            choice = await plan_next_step(context, tools)

            # Check if the LLM decided the investigation is complete
            # finish_reason="stop" means no more tool calls needed
            if choice.finish_reason == "stop":
                print("โœ“ Investigation complete")
                break

            # Execute the tool call(s) the LLM selected
            # The LLM may request multiple parallel tool calls
            if choice.message.tool_calls:
                for tc in choice.message.tool_calls:
                    tool_name = tc.function.name
                    arguments = json.loads(tc.function.arguments)

                    print(f"  โ†’ Calling {tool_name}")
                    try:
                        result = await self.registry.execute(tool_name, arguments)
                        # Summarise the result for memory
                        summary = self._summarise_result(result)
                        memory.add_step(tool_name, arguments, result, summary)
                        print(f"    โœ“ {summary[:100]}")
                    except Exception as e:
                        memory.add_step(tool_name, arguments, "", f"Error: {e}")
                        print(f"    โœ— Error: {e}")

        return {
            "objective": objective,
            "steps_taken": len(memory.entries),
            "findings": memory.findings,
            "affected_entities": memory.affected_entities,
            "complete": choice.finish_reason == "stop"
        }

    def _summarise_result(self, result: str, max_len: int = 200) -> str:
        """Create a brief summary of a tool result."""
        try:
            data = json.loads(result)
            if "incident_count" in data:
                return f"Found {data['incident_count']} incidents"
            if "row_count" in data:
                return f"Query returned {data['row_count']} rows"
            if "status" in data:
                return f"Status: {data['status']}"
        except (json.JSONDecodeError, TypeError):
            pass
        return result[:max_len]

    async def disconnect(self):
        await self.registry.disconnect_all()

# Entry point
async def main():
    agent = SecurityAgent()
    await agent.connect()
    try:
        result = await agent.investigate(
            "Investigate the most critical active incident in our environment. "
            "Identify affected users and devices, determine the attack vector, "
            "and recommend containment actions."
        )
        print(f"\n๐Ÿ“‹ Investigation Summary:")
        print(json.dumps(result, indent=2))
    finally:
        await agent.disconnect()

if __name__ == "__main__":
    asyncio.run(main())
๐Ÿ’ก Pro Tip: Implement a step budget (max_steps=20). This prevents runaway investigations that consume excessive tokens and API calls. Start with 20 and adjust based on typical investigation complexity.

Step 7 ยท Implement Safety Controls

Classify tools by risk level and enforce approval workflows for high-impact actions.

Safety Control Implementation

# Tool risk classification - determines safety controls per tool
# LOW:    Read-only, no side effects โ†’ execute automatically
# MEDIUM: State-changing but reversible โ†’ log and execute
# HIGH:   Destructive or impactful โ†’ require human approval
# IMPORTANT: Unknown tools default to HIGH for safety!
TOOL_RISK_LEVELS = {
    # LOW: Read-only tools - safe for autonomous execution
    "sentinel.run_kql_query": "low",
    "sentinel.list_sentinel_tables": "low",
    "sentinel.get_recent_incidents": "low",
    "xdr.list_incidents": "low",
    "xdr.get_incident": "low",
    "xdr.run_hunting_query": "low",
    "xdr.enrich_user": "low",

    # MEDIUM: Reversible state changes - log but auto-execute
    "xdr.update_incident": "medium",

    # HIGH: Destructive actions - ALWAYS require human approval
    "xdr.isolate_device": "high",
    "xdr.release_device": "high",
}

async def execute_with_safety(registry, tool_name, arguments):
    """Execute a tool with risk-appropriate safety controls.
    Low-risk: auto-execute. Medium: log and execute. High: human approval.
    Unknown tools default to HIGH risk - never trust unclassified tools.
    """
    risk = TOOL_RISK_LEVELS.get(tool_name, "high")  # Default to high for safety

    if risk == "low":
        return await registry.execute(tool_name, arguments)
    elif risk == "medium":
        print(f"  โš  Medium-risk action: {tool_name}")
        return await registry.execute(tool_name, arguments)
    elif risk == "high":
        print(f"\n  ๐Ÿ›‘ HIGH-RISK ACTION REQUIRES APPROVAL")
        print(f"  Tool: {tool_name}")
        print(f"  Args: {json.dumps(arguments, indent=2)}")
        approval = input("  Approve? (yes/no): ").strip().lower()
        if approval == "yes":
            return await registry.execute(tool_name, arguments)
        else:
            return json.dumps({
                "status": "blocked",
                "message": "Action blocked by human operator."
            })
โš ๏ธ Important: Never default an unknown tool to “low” risk. If a new tool is added to an MCP server, it should require explicit classification before autonomous execution. Default to “high” for safety.

Step 8 ยท Add Parallel Tool Execution

When the agent needs to enrich multiple entities, call all enrichment tools concurrently rather than sequentially to dramatically speed up investigations.

Parallel Execution

import asyncio

async def execute_parallel(registry, tool_calls: list) -> dict:
    """Execute multiple independent tool calls concurrently.
    Dramatically speeds up investigations when enriching multiple entities.
    Each call runs independently - one failure doesn't block others.
    Returns: dict mapping tool_name โ†’ {status, result/error}.
    """
    async def _safe_execute(tc):
        """Wrapper that catches errors per-tool instead of failing all."""
        try:
            result = await registry.execute(tc["name"], tc["arguments"])
            return {"tool": tc["name"], "status": "success", "result": result}
        except Exception as e:
            return {"tool": tc["name"], "status": "error", "error": str(e)}

    # asyncio.gather runs all tasks concurrently and waits for all to finish
    tasks = [_safe_execute(tc) for tc in tool_calls]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return {r["tool"]: r for r in results}

# Example: enrich 3 users simultaneously instead of sequentially
# This turns a 3-second serial operation into a ~1-second parallel one
parallel_calls = [
    {"name": "xdr.enrich_user", "arguments": {"user_principal_name": "alice@contoso.com"}},
    {"name": "xdr.enrich_user", "arguments": {"user_principal_name": "bob@contoso.com"}},
    {"name": "xdr.enrich_user", "arguments": {"user_principal_name": "carol@contoso.com"}},
]
results = await execute_parallel(registry, parallel_calls)

Step 9 ยท Build Investigation Templates

Pre-built templates accelerate investigations by providing a structured starting point with the right objective, context, and preferred tool sequence.

Template Definitions

# Pre-built investigation templates for common security scenarios
# Each template provides: a structured objective, preferred starting tools,
# and sample KQL queries that the LLM can use as starting points
# Usage: agent.investigate(INVESTIGATION_TEMPLATES["phishing"]["objective"])
INVESTIGATION_TEMPLATES = {
    # Template 1: Phishing campaign investigation
    "phishing": {
        "objective": "Investigate a phishing campaign targeting our organization. "
                     "Identify affected users, compromised credentials, and "
                     "malicious infrastructure.",
        "initial_tools": ["xdr.run_hunting_query", "sentinel.run_kql_query"],
        "suggested_queries": [
            "EmailEvents | where ThreatTypes has 'Phish' | take 50",
            "SigninLogs | where ResultType != '0' | summarize by UserPrincipalName"
        ]
    },
    # Template 2: Malware infection response
    "malware": {
        "objective": "Investigate a malware infection. Identify patient zero, "
                     "lateral movement, and all compromised systems.",
        "initial_tools": ["xdr.list_incidents", "xdr.run_hunting_query"],
        "suggested_queries": [
            "DeviceProcessEvents | where FileName in ('powershell.exe','cmd.exe')"
        ]
    },
    # Template 3: Compromised identity investigation
    "identity_compromise": {
        "objective": "Investigate a compromised identity. Determine scope of "
                     "access, data exposure, and persistence mechanisms.",
        "initial_tools": ["xdr.enrich_user", "sentinel.run_kql_query"],
        "suggested_queries": [
            "SigninLogs | where RiskLevelDuringSignIn != 'none'"
        ]
    }
}

# Usage:
# result = await agent.investigate(
#     INVESTIGATION_TEMPLATES["phishing"]["objective"]
# )

Step 10 ยท Generate Investigation Reports

After the investigation completes, generate a structured report with the executive summary, timeline, findings, and recommended actions.

Report Generator

from planner import client, OPENAI_MODEL

async def generate_report(memory: InvestigationMemory) -> str:
    """Generate a structured investigation report using the LLM.
    Takes the complete investigation memory and produces a formatted
    report with executive summary, timeline, findings, MITRE mapping,
    and prioritized remediation steps.
    Returns: Markdown-formatted investigation report string.
    """
    report_prompt = f"""Generate a detailed security investigation report:

OBJECTIVE: {memory.objective}
STEPS TAKEN: {len(memory.entries)}
FINDINGS: {json.dumps(memory.findings)}
AFFECTED ENTITIES: {json.dumps(memory.affected_entities)}

FULL INVESTIGATION LOG:
{memory.get_context(max_entries=50)}

Format the report with these sections:
1. Executive Summary (2-3 sentences for leadership)
2. Investigation Timeline (chronological steps with timestamps)
3. Key Findings (with severity and confidence level)
4. Affected Entities (users, devices, IPs with context)
5. MITRE ATT&CK Mapping (techniques observed)
6. Recommended Actions (prioritized remediation steps)
7. Evidence References (tool outputs supporting findings)
"""

    response = await client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[{"role": "user", "content": report_prompt}],
        temperature=0.2
    )
    return response.choices[0].message.content

Step 11 ยท Test with Simulated Incidents

Run the agent against your environment and evaluate its investigation quality.

Test Scenarios

  1. Run the agent with the phishing template: python src/agent.py --template phishing
  2. Evaluate: Did it identify affected users? Did it check email events AND sign-in logs?
  3. Run the agent with the identity compromise template
  4. Evaluate: Did it correlate across Sentinel AND XDR? Did it enrich all entities?
  5. Run with a custom objective: “Find all failed sign-ins in the last 24 hours and identify if any correspond to active incidents”
  6. Check the audit log for all tool calls and verify correctness
  7. Review the generated report for accuracy and completeness
๐Ÿ’ก Pro Tip: Create a scoring rubric: root cause identified (0-5), all entities found (0-5), cross-product correlation (0-5), appropriate recommendations (0-5). Use this to track agent improvement over time.

Step 12 ยท Deploy, Monitor & Iterate

Deploy the agent as an Azure Container App and establish ongoing operational processes.

Deployment

# Deploy the multi-server agent as a Container App
# It connects to the MCP servers via their SSE endpoints
# All MCP server URLs and secrets are passed as environment variables
az containerapp create \
  --name security-ai-agent \
  --resource-group rg-mcp-servers \
  --environment mcp-environment \
  --image mcpserversacr.azurecr.io/security-agent:v1 \
  --target-port 8000 \
  --min-replicas 1 \
  --env-vars \
    SENTINEL_MCP_URL=https://sentinel-mcp.azurecontainerapps.io/sse \
    XDR_MCP_URL=https://xdr-mcp.azurecontainerapps.io/sse \
    OPENAI_API_KEY=secretref:openai-key

Operational Roadmap

  • Phase 1. Investigation assistance (current): read-only, generates reports
  • Phase 2. Guided response: agent recommends actions, human approves
  • Phase 3. Autonomous response: agent acts within predefined rules for known patterns
  • Phase 4. Proactive hunting: agent initiates investigations based on threat intelligence
  • Build dashboards: investigations completed, quality scores, tool usage, cost metrics
  • Weekly: review investigation quality and tune prompts
  • Monthly: evaluate new MCP servers and tools to add
๐Ÿ’ก Pro Tip: Gate each autonomy phase with: proven accuracy at the current level, comprehensive audit logging, safety controls validated by red team exercises, and stakeholder sign-off. Never skip phases.

๐Ÿ“š Documentation Resources

ResourceDescription
MCP ArchitectureMulti-server client-host-server architecture patterns
MCP SamplingEnable AI model inference from within MCP servers
Azure OpenAI ServiceFoundation models for building security AI agents
Semantic Kernel overviewAI orchestration framework for multi-tool agents
MCP ToolsTool discovery and invocation across multiple servers
Function calling with Azure OpenAIConnect AI models to external tools and APIs

Summary

What You Accomplished

  • Designed a multi-server MCP architecture with tool namespacing across Sentinel, Defender XDR, and Entra servers
  • Built a unified tool registry that discovers and catalogues tools from all connected MCP servers
  • Implemented an LLM-powered planning loop for autonomous multi-step security investigations
  • Created an investigation memory system to maintain context and findings across tool calls
  • Constructed the agent loop with iterative reasoning, action execution, and observation processing
  • Crafted effective system prompts with safety rules and investigation methodology guidelines
  • Correlated findings across Sentinel, Defender XDR, and Entra ID data sources
  • Implemented safety controls including human-in-the-loop approval gates for destructive actions
  • Added streaming output for real-time investigation progress visibility
  • Tested and validated the agent with realistic end-to-end investigation scenarios

Next Steps

  • Deploy the multi-server agent to a production environment with proper monitoring and logging
  • Add additional MCP servers for Purview, Intune, or custom internal security tools
  • Implement persistent memory to learn from past investigations and improve future triage
  • Explore integrating the agent with ticketing systems like ServiceNow or Jira for automated incident tracking
← Previous Lab All Labs →