Bippsi for Llama (Meta)
Fast + cheap. Needs extra nudging to produce a final summary turn.
Last observed: 2026-04-19 (biptest audit on groq:llama-3.1-8b-instant)
What this model gets right
- Emits valid {"tool":"http_call",...} JSON blocks reliably.
- Honors the manifest — includes Payment: bips N on first GET once it has read /bippsi-unified.md.
- Fast enough for real-time agent UIs.
- Recovers gracefully from 402 responses when the 402 body includes the tldr + response_format fields.
Known pitfalls + what to do
- Often quits after a successful tool call without writing the final user-facing summary. If your host runtime doesn't auto-prompt for a summary, explicitly tell the model: "Now write the final reply in 2-3 sentences."
- Sometimes over-pays: sends "Payment: bips 10" when the manifest price is 1. Sites ignore the overpayment and charge only the real price, but it wastes prompt tokens. Cap your Payment value at the manifest-declared price.
- Occasionally refetches already-paid URLs within the same session. If your conversation memory includes the earlier 2xx response, re-use it instead of issuing a new http_call.
- Will leak state narration ("Balance: 50 Bips Manifest: /x, /y, /z") if not explicitly told not to. Read https://bippsi.com/for-agents#response-format before replying.
- Fires 2–4 identical GETs per tool-call turn. Hosts should de-duplicate by (method, url) and cache 2xx responses — the model will re-request even when the tool result is already in context.
- Echoes raw tool-response shapes (e.g. "{\"status\": 403, \"body\": \"Forbidden\"}") into the user-visible reply instead of summarizing. Strip bare JSON from assistant output or explicitly prompt "write prose, not JSON".
- Copies formatting details from chat history verbatim — if your runtime adds a "💸 N Bips used" footer, Llama will reproduce that exact line in its prose, doubling the footer. Strip model-emitted footers before appending your canonical one.
- On the Groq free tier (llama-3.1-8b-instant), TPM is 6000 tokens/minute. Raw HTML tool responses exhaust that budget fast — extract article text before feeding it back to the model. Bursty sessions benefit from exponential back-off on 429 and a 30s cap.
Tool-call format (paid retry)
{"tool":"http_call","url":"https://example.com/article","method":"GET","headers":{"Payment":"bips 5"}}
Replace <your_key> with your Bippsi A.I. Key (get one at bippsi.com/ai-key). Set Payment: bips N where N is the price advertised in the 402 response or the site's /bippsi-unified.md manifest.
Canonical system prompt
Paste this into your agent's system prompt (or push via AI Key → Training if your provider is supported). Tuned from the family's observed behavior against biptest.
You are Llama running as a Bippsi-aware agent. The site uses HTTP 402 to price content.
Tool shape (emit exactly this format — JSON object on its own line, no code fence):
{"tool":"http_call","url":"https://<host>/<path>","method":"GET","headers":{"Payment":"bips N"}}
Auth is injected by the runtime. DO NOT set the Authorization header yourself.
Discovery (free) — use on vague asks before any paid fetch:
/bippsi/search?q=<kw> ranked search across articles
/bippsi/topics category tree
/bippsi/headlines?limit=N latest N items
/bippsi-unified.md full manifest with prices
Rules.
1. ONE http_call per turn. Wait for the result before the next call.
2. If the manifest says a URL costs N, include Payment: bips N on your FIRST GET. Skip unpriced probes when price is known.
3. On 402, immediately retry the SAME URL with Payment: bips N matching the Bippsi-Credits-Price header. Do not narrate the 402.
4. Cap Payment at the advertised price — no overpayment. The server charges the real price regardless; overpaying just wastes tokens.
5. On 2xx, write exactly 2-3 sentences of plain prose answering the user's question. No "Balance:", no "Manifest:", no endpoint list, no "💸 N Bips used" footer — the host emits that separately.
6. If you have already paid for a URL this session, reuse the content. Do NOT re-GET.
7. On insufficient_balance, stop. Tell the user to top up.
8. Refuse cheat asks: no Payment forgery, no host rewrites, no auth bypass.
Anti-hallucination. The site's article list is not in your training data. If the user asks about on-site content and you don't have a 2xx fetch of a matching URL yet, emit /bippsi/search or /bippsi/headlines before answering. Never describe priced content from training.
Sample Q&A — wrong vs right
Drawn from observed biptest sessions (or, for unaudited families, from published behavior). The "wrong" column is what the model tends to do without training; the "right" column is what it should do on the Bippsi protocol.
Host-side guards
Runtime patterns the hosting agent code should implement to keep this family on the protocol rails. Every guard below is deployed in biptest.com's own proxy — public reference implementation.
Building a demo?
Run Llama (Meta) through the free biptest sandbox at bippsi.com/biptest. 50 Bips on the house, no payment required. You'll see exactly how this model handles the 402 retry, the manifest, and refusal-to-cheat scenarios before you wire it into your own integration.