Watch an AI agent pay for content
Pick an AI. Get a sandbox key. Watch it crawl biptest.com, hit real 402 paywalls, pay Bips, and read what's priced. Test the system for free — no account needed.
Configure your agent
Not preparedOur sandbox is your sandbox. This is the same environment we use to benchmark every model, training configuration, and protocol change we ship — a true 1:1 with our internal testing. What you see here is what we see: no hidden backend systems, no separate "real" test harness. Every result, every edge case, every improvement lands in the same place for both of us.
Tell the agent what to do
- waiting for events…
Tell us what went sideways
Submissions include a snapshot of your chat + terminal so we can reproduce the issue. Public sandbox — submissions may be reviewed by our team and used to improve our training and models. Don't paste anything private.
Premade prompts to try
— or just chat above with the agentPopular
The fastest way to see the system in action.Page tests
Priced article reads across every content category.Form tests
Priced form submissions — agent posts fields, server charges + returns a response.Button tests
Priced button clicks — each press is a separate charge.Content block tests
Priced in-page sections — unlock via a GET with a Payment header.Download tests
Priced file downloads — PDFs and reports.Regression / negative tests
Targeted probes for infrastructure bugs weve fixed and want to keep fixed.Markets Intelligence Hub
Exercises all 5 element types on one page in sequence.Site Pass tests
Click 1 immediately. Wait ~60s, click 2. Wait to ~150s after 1, click 3.Out-of-Bips demo
Premium report is priced at 2000 Bips — above the sandbox cap, triggers the upsell flow.Try to cheat
Sandbox enforcement — off-site requests are blocked and logged.General / open-ended
Ambiguous human prompts. Loose scoring — any paid fetch + substantive reply.Like what you saw?
Put a paywall on your own site, or build an agent that pays.
Our benchmark results
Same prompt set you see above, same live environment, run against every model at all three training levels. These are our numbers — you're seeing the same data we use to decide which models to recommend.
| Model | Trained (Unsigned) | Trained (Signed) | Untrained |
|---|---|---|---|
|
Qwen3 122B
Alibaba
|
— not yet run | — not yet run | — not yet run |
|
Nemotron Nano 30B A3B
NVIDIA
|
— not yet run | — not yet run | — not yet run |
|
GPT-OSS 120B
OpenAI
|
— not yet run | — not yet run | — not yet run |
|
Llama 4 Scout 17Bx16E
Meta
|
— not yet run | — not yet run | — not yet run |
|
Claude 3.5 Haiku
Anthropic
|
11
/ 19 pass
4 confused · 4 error
|
— not yet run | — not yet run |
|
DeepSeek V4 Flash
DeepSeek
|
— not yet run | — not yet run | — not yet run |
|
Gemma 4 26B A4B
Google
|
— not yet run | — not yet run | — not yet run |
|
Llama 3.3 70B
Meta
|
— not yet run | — not yet run | — not yet run |
Methodology. Each cell is the pass/fail split for that model running the premade prompts under that training level. A pass means the agent reached a successful paid response (2xx with a non-empty answer) within the session's turn budget. Confused = reached turn cap, gave a memory-only reply without fetching, or partially completed the request. Error = hard failure (provider timeout, sandbox block, model refused, or task abandoned mid-flow). Skipped cells are included intentionally so the benchmark table shows model coverage without fabricating scores.
Public sandbox. This page is a live demo environment. Sessions, chat transcripts, and any feedback you submit via the "Report failed prompt" button may be reviewed by the Bippsi team and used to improve our training data, our models, and the protocol itself. Don't paste anything private. Your sandbox key is ephemeral and never authorizes real-money transactions.