Full project description
Sentinel is a security scanner for AI agents on Bittensor. It inspects every piece of data an agent consumes -- API responses, search results, chain queries, LLM outputs - for prompt injection attacks: hidden instructions designed to hijack agent behavior. Three detection layers run on eery scan, never short-circuiting: Layer 1 (Known Patterns): 200+ signatures matched against normalized input. Nomrmalization strips zero-width unicode, replaces homoglyphs (Cyrillic "a" -> Latin "a"), collapses whitespace. Catches "ignore all instructions" and its many variants. Layer 2 (Structural Signatures): 16+ rules detecting injection structure rather than specific phrases - instruction boundary markers, role assumption, privilege escalation, encoded payloads (base64, ROT13, URL encoding) character splitting, and JSON value inspection. Catches novel attacks that reuse known structural patterns. Layer 3 (Anomaly Baselines): Compares incoming data against registered source profiles. Flags length anomalies, type mismatches (a number failed suddenly containing a string), unexpected fields. Catches attacks that alter data structure even if the injected text is completely novel. Scores are weighted with a corroboration bonus when multiple layers agree. Results map to four threat levels: clean, suspicious, quarantine, reject. An oracle Resistance layer prevents attackers from reverse engineering detection logic by quantizing scores to 5 bands, normalizing response timing, and progressively suppressing detail at higher threat levels. Sentinel operates as an MCP server. Any MCP-compatible agent adds it to their tool config, calls sentinel_scan before processing external content, and gets a structured threat assessment. Monitor mode is the default - it flags but never blocks, letting operators tune sensitivity before enabling enforcement.
Why it works on Bittensor
Sentinel is an immune system for AI agents on Bittensor. It scans every piece of incoming data -- API responses, search results, chain queries -- for hidden instructions designed to mainuplate agent behavior. Three detection layers run in parallel: known attack patterns (200+ currently), structural analysis (encoding tricks, role manipulation, privilege escalation), and anomaly detection (flags when data deviates from registered baselines). Bittensor is the ideal environment because miners are financially incentivized to manipulate validators - injection is a rational economic strategy, not a theoretical threat. The endgame is a Bittensor subnet where miners are paid to discover new attacks and validators curate them into detection patterns, creating a self-improving security network.
