AI Agent Security Lab
A security research lab for testing AI agent behavior — delegation, network containment, and prompt injection. A Mac Mini M4 on a VLAN-isolated network runs OpenClaw as a multi-agent system accessible through Discord, while a Raspberry Pi 4 router controls what it can and can't reach. RST applied to AI agents is almost completely unexplored territory.
Questions I'm Investigating
- 01How do AI agents behave when delegating tasks across a hierarchy — and where does that break down?
- 02What happens when you apply prompt injection techniques to agents with real tool access?
- 03Can network-level containment actually constrain an agent that controls its own filesystem and terminal?
Tools
RST Lens
RST applied to AI agents — each experiment is a charter, each agent response is evaluated against oracles for safety, accuracy, and containment. The agents have real tool access (filesystem, terminal, web browsing, financial data), making the testing consequential rather than theoretical. The goal is documented findings and a testing methodology for AI agent deployments.
Log
Network containment layer complete. Built a Raspberry Pi 4 router running OpenWrt 24.10.4 with VLAN isolation through a Netgear GS308Ev4 managed switch. Three zones: WAN (VLAN 10, internet via ONT), LAN (VLAN 1, main network + TDS WiFi mesh), and Isolated (VLAN 20, Mac Mini — internet only, no LAN access). Firewall rules, DHCP pools, DNS logging, and SSH lockdown all configured.
First real debugging session: the TDS ONT refused to issue a DHCP lease to the Pi. After extensive investigation, discovered it was MAC-locked to the previous gateway. Cloned the gateway's MAC onto the Pi's WAN device. ONT required a 10+ minute power-off to clear the binding. Classic oracle mismatch — the system behaved correctly by its own rules, just not by ours.