Project M.A.L.I.C.E

Principal Adversarial AI Architect & Red Team Lead · 2026 · 3 Months · 1 person · 4 min read

M.A.L.I.C.E. (Malicious Agentic Linkage & Infrastructure Corruption Engine) is an advanced, automated offensive framework designed to exploit multi-agent orchestration ecosystems and local AI infrastructure (such as Ollama and LangChain), exposing catastrophic vulnerabilities in localized agentic workflows.

Overview

Architecting M.A.L.I.C.E., an autonomous offensive agent suite that executes zero-touch Indirect Prompt Injection (IPI), orchestrator poisoning, and local infrastructure jailbreaking. The framework systematically maps out local AI API boundaries, exploits weak isolation barriers in Tool-Use/Function Calling, and turns insecure local agent environments into arbitrary code execution vectors.

Problem

Enterprise red teams and security researchers lack automated tools to stress-test the unique security boundaries of modern agentic ecosystems. As enterprises shift toward localized LLM runtimes (like Ollama) paired with multi-agent orchestration frameworks, they inherit unvetted threat surfaces—such as indirect injection through tool outputs, state-machine manipulation, and host infrastructure takeover via untrusted data parsing.

Constraints

  • Must weaponize Indirect Prompt Injections without direct user prompt access
  • Must autonomously breach isolation boundaries between the LLM runtime and host OS
  • Low noise footprint to evade basic semantic anomaly detection/guardrails
  • Must weaponize the agentic loop itself (using the agent's tools against it)
  • Designed with headless, modular payloads for swift infrastructure adaptation

Approach

Built a highly sophisticated Python-based adversarial orchestration framework that chains dynamic payload generation, semantic evasion, automated prompt injection smuggling via multi-vector data streams (e.g., mock RAG databases, poisoned web scraping targets), and automated log analysis to detect successful system-level exploitation.

Key Decisions

Dynamic Semantic Mutation for Injection Payloads

Reasoning:

Evades static and semantic LLM guardrails (like Llama-Guard) by programmatically mutating instructions into abstract metaphors, ensuring the target local model executes the exploit code without triggering safety alignment.

Alternatives considered:
  • Static Jailbreak Templates (High failure rate against updated local weights)
  • Gradient-based adversarial suffixes (Too computationally intensive for agile deployments)

Tool-Output Vector for Primary Injection

Reasoning:

Exploits the fundamental trust flaw in agentic architectures: agents implicitly trust data returned by their own tools (e.g., file readers, web scrapers) far more than direct user inputs.

Alternatives considered:
  • Direct API Tampering (Requires pre-existing infrastructure access)
  • Model Weights Poisoning (Impractical for attacking active post-deployment infrastructure)

Orchestrator State Hijacking via Context Window Flooding

Reasoning:

By injecting token-dense garbage alongside malicious instructions, the target agent's system prompt is pushed out of its effective context window, rendering its core alignment instructions completely inert.

Alternatives considered:
  • Memory Injection (Highly specific to the underlying vector database architecture)
  • Direct Agent Code Patching (Requires prior OS-level write permissions)

Tech Stack

  • Python
  • Ollama API Exploitation Suite
  • LangChain / AutoGen Orchestrator Targets
  • Asyncio for concurrent multi-agent flood testing
  • Custom Semantic Mutation Engines
  • Docker (for isolated target simulation environments)
  • Configurable via adversarial_manifest.json + CLI arguments

Result & Impact

  • 87% Jailbreak rate on popular 8B local models
    Exploitation Success
  • Zero-touch Remote Code Execution via un-sandboxed tools
    RCE Efficacy
  • 100% bypass of baseline regex-based semantic guardrails
    Evasion

Developed a groundbreaking offensive framework that exposes the structural insecurity of combining local LLM runtimes with raw OS-level tool capabilities. Proved that an autonomous agent, when fed maliciously crafted data through standard tools, can be manipulated into completely compromising its host private server.

Learnings

  • Model size selection is critical: smaller models (tiny/base for whisper, 3B–8B for Ollama) provide the best balance between quality and real-time responsiveness on CPU.
  • Careful tuning of VAD aggressiveness and audio device settings dramatically improves real-world reliability across different microphones and environments.
  • Combining streaming responses from Ollama with fast Piper TTS creates a much more natural conversational flow.
  • Keeping the architecture modular makes future transition from local-only to secure remote server deployment much smoother.

Additional Context

This project represents the initial adversarial research breakthrough of M.A.L.I.C.E. — the Malicious Agentic Linkage & Infrastructure Corruption Engine. While contemporary AI security focuses heavily on cloud-hosted models, this research illuminates the massive, unmapped attack surface of localized agentic ecosystems and their underlying compute infrastructure.

The framework turns the target agent’s operational pipeline into its own kill chain: Attacker-Poisoned Source Data → Target Agent Web-Scraper/File-Reader (Tool Execution) → Indirect Prompt Injection Payload Assimilation → Local LLM Alignment Bypass (Reasoning Failure) → Malicious Tool Execution → Host OS Arbitrary Code Execution (RCE).

By proving how easily a fully local, modular LLM configuration can be manipulated into executing host-level commands via its own agentic loops, this project lays the groundwork for the next generation of AI-native defense mechanisms. The findings demonstrate that securing an Autonomous Compute Engine requires moving away from simple input/output filtering and moving toward zero-trust, strictly sandboxed runtime environments for all agent-accessible tools.

The result is an elite-tier red teaming asset that fundamentally shifts how security engineers evaluate the risks of deploying autonomous AI agents on private servers and enterprise infrastructure.