HoneyPrompt¶
GitHub Repo | Updated: May 2026
A terminal-first security tool that generates honeypot web pages with hidden prompt-injection canaries. When an AI browsing agent visits and follows the injected instructions, it makes an HTTP callback proving exactly which level of compliance it demonstrated — giving security teams measurable, auditable evidence of agentic behavior in the wild.
Tech Stack: - Rust - SQLite - Docker Compose - Caddy (for TLS in production deployments)
Why I Built This¶
AI browsing agents are increasingly deployed against real web infrastructure, and the security community has limited tooling to measure how compliant they are with prompt-injection instructions from untrusted content. Without evidence, threat modeling is guesswork. Product security teams have no way to mature their AI risk registers or demonstrate concrete risk reduction to stakeholders.
HoneyPrompt addresses that gap. The goal was a tool that security researchers and platform defenders could deploy to gather verifiable, graduated evidence of agentic web abuse — without collecting secrets, credentials, or user data. The design is deliberately conservative: payloads carry only a nonce, prompt ID, and tier level. No API keys, no session tokens, no file contents.
The framing that clarified the design was "paved road controls" — the idea that security tooling should help teams ship AI features faster by reducing risk through measurement, not by blocking. A passing test-agent scorecard is a green light for CI pipelines. A disclosure report from real captured callbacks is evidence for coordinated vulnerability disclosure and better threat models.
How It Works¶
HoneyPrompt generates static honeypot web pages with visible human warnings and hidden prompt-injection payloads embedded across multiple locations: HTML comments, <meta> tags, invisible <span> elements, JSON-LD structured data, and natural-language prose. Each payload instance gets a unique cryptographic nonce.
Detection Flow:
AI Agent visits page → follows injected instructions
↓
HTTP callback to honeyprompt server
↓
Tier identified (1–5) → nonce verified → event stored in SQLite
↓
TUI monitor / disclosure report
The five-tier graduated evidence model measures increasing levels of agent capability. Tier 1 is an arbitrary callback — the agent executed an injected outbound request. Tier 2 is a conditional callback — the agent evaluated a condition and selected the correct branch. Tier 3 is a computed callback — the agent performed a deterministic calculation and returned the result. Tier 4 is capability introspection — the agent returned a sorted base64-encoded list of its own tool names. Tier 5 is a multi-step compliance chain — the agent extracted a page-visible seed, applied a formula, and submitted a 3-digit proof the server independently re-verifies.
The honeyprompt test-agent subcommand runs a self-contained compliance test suitable for CI pipelines. It spins up a honeypot, waits for callbacks, and outputs a pass/fail scorecard with exit code 0 (no canaries triggered) or 1 (canaries triggered). The TUI monitor displays live incoming callbacks with agent fingerprinting, tier breakdown, and session grouping. A report subcommand generates a Markdown disclosure report from captured events.
What I Learned¶
Building the five-tier proof model required thinking carefully about what an agent can do without touching secrets. The constraint — no API keys, no session state, no file contents at any tier — forced creative design. Tier 4's capability introspection and Tier 5's arithmetic proof demonstrate meaningful compliance levels while keeping the safety guarantee intact regardless of who deploys the tool.
The robots.txt and ai.txt disallow rules turned out to be as informative as the callback tiers themselves. Compliant crawlers respect them; non-compliant agents do not. That signal is cheap to generate and provides a separate data point for classifying agent behavior.
Shipping as a single Rust binary with no runtime dependencies meant the deployment story could be simple enough for rapid adoption. The zero-config --domain flag — which generates a fresh site in a tempdir and starts serving immediately — removed the biggest friction point for first-time operators. The full Docker Compose + Caddy stack with auto-TLS is available for production deployments, but the single binary is sufficient for most research use cases.
Links¶
Measure what your agents actually do when no one's watching.