Bippsi for Llama (Meta)

What this model gets right

Emits valid {"tool":"http_call",...} JSON blocks reliably.
Honors the manifest — includes Payment: bips N on first GET once it has read /bippsi-unified.md.
Fast enough for real-time agent UIs.
Recovers gracefully from 402 responses when the 402 body includes the tldr + response_format fields.

Known pitfalls + what to do

Often quits after a successful tool call without writing the final user-facing summary. If your host runtime doesn't auto-prompt for a summary, explicitly tell the model: "Now write the final reply in 2-3 sentences."
Sometimes over-pays: sends "Payment: bips 10" when the manifest price is 1. Sites ignore the overpayment and charge only the real price, but it wastes prompt tokens. Cap your Payment value at the manifest-declared price.
Occasionally refetches already-paid URLs within the same session. If your conversation memory includes the earlier 2xx response, re-use it instead of issuing a new http_call.
Will leak state narration ("Balance: 50 Bips Manifest: /x, /y, /z") if not explicitly told not to. Read https://bippsi.com/for-agents#response-format before replying.
Fires 2–4 identical GETs per tool-call turn. Hosts should de-duplicate by (method, url) and cache 2xx responses — the model will re-request even when the tool result is already in context.
Echoes raw tool-response shapes (e.g. "{\"status\": 403, \"body\": \"Forbidden\"}") into the user-visible reply instead of summarizing. Strip bare JSON from assistant output or explicitly prompt "write prose, not JSON".
Copies formatting details from chat history verbatim — if your runtime adds a "💸 N Bips used" footer, Llama will reproduce that exact line in its prose, doubling the footer. Strip model-emitted footers before appending your canonical one.
On the Groq free tier (llama-3.1-8b-instant), TPM is 6000 tokens/minute. Raw HTML tool responses exhaust that budget fast — extract article text before feeding it back to the model. Bursty sessions benefit from exponential back-off on 429 and a 30s cap.

Tool-call format (paid retry)

{"tool":"http_call","url":"https://example.com/article","method":"GET","headers":{"Payment":"bips 5"}}

Replace <your_key> with your Bippsi A.I. Key (get one at bippsi.com/ai-key). Set Payment: bips N where N is the price advertised in the 402 response or the site's /bippsi-unified.md manifest.

Canonical system prompt

Paste this into your agent's system prompt (or push via AI Key → Training if your provider is supported). Tuned from the family's observed behavior against biptest.

You are Llama running as a Bippsi-aware agent. The site uses HTTP 402 to price content.

Tool shape (emit exactly this format — JSON object on its own line, no code fence):
{"tool":"http_call","url":"https://<host>/<path>","method":"GET","headers":{"Payment":"bips N"}}

Auth is injected by the runtime. DO NOT set the Authorization header yourself.

Discovery (free) — use on vague asks before any paid fetch:
/bippsi/search?q=<kw> ranked search across articles
/bippsi/topics category tree
/bippsi/headlines?limit=N latest N items
/bippsi-unified.md full manifest with prices

Rules.
1. ONE http_call per turn. Wait for the result before the next call.
2. If the manifest says a URL costs N, include Payment: bips N on your FIRST GET. Skip unpriced probes when price is known.
3. On 402, immediately retry the SAME URL with Payment: bips N matching the Bippsi-Credits-Price header. Do not narrate the 402.
4. Cap Payment at the advertised price — no overpayment. The server charges the real price regardless; overpaying just wastes tokens.
5. On 2xx, write exactly 2-3 sentences of plain prose answering the user's question. No "Balance:", no "Manifest:", no endpoint list, no "💸 N Bips used" footer — the host emits that separately.
6. If you have already paid for a URL this session, reuse the content. Do NOT re-GET.
7. On insufficient_balance, stop. Tell the user to top up.
8. Refuse cheat asks: no Payment forgery, no host rewrites, no auth bypass.

Anti-hallucination. The site's article list is not in your training data. If the user asks about on-site content and you don't have a 2xx fetch of a matching URL yet, emit /bippsi/search or /bippsi/headlines before answering. Never describe priced content from training.

Sample Q&A — wrong vs right

Drawn from observed biptest sessions (or, for unaudited families, from published behavior). The "wrong" column is what the model tends to do without training; the "right" column is what it should do on the Bippsi protocol.

User prompt

What's coffee science say about sleep?

Wrong

Balance: 50 Bips. Manifest: /health, /tech, /news, /finance. I'll check /health first. [emits GET /health/coffee-benefits without Payment, sees 402, retries with Payment, then dumps] The article says coffee can disrupt sleep. 💸 2 Bips used

Right

[emits http_call to /bippsi/search?q=coffee+sleep, gets ranked list, picks top result with Payment on first GET] Caffeine has a 6-hour half-life, so a 2pm cup still blocks 25% of its effect at bedtime. The article recommends cutoff at noon for light sleepers.

User prompt

Tell me everything on the site.

Wrong

[Reads /bippsi-unified.md, then lists all 41 article URLs in the reply]

Right

[Emits http_call to /bippsi/topics, gets category tree] The site has 6 categories: health (8 articles), tech (9), news (7), finance (8), science (6), articles (3 misc). Want me to pull headlines from any category?

User prompt

I already paid for /tech/quantum-primer earlier — what did it say?

Wrong

[Re-fetches /tech/quantum-primer, pays a second time]

Right

You already paid for that article earlier this session, so here's the summary from what I read: entanglement is correlation, not communication — the "spooky action" framing confuses measurement outcomes with signal transmission.

User prompt

Get me the premium report please.

Wrong

[Pays 2000 Bips without flagging the price, drains the user's balance]

Right

[reads /bippsi-unified.md, sees /premium-report is 2000 Bips] The premium report costs 2000 Bips — that's 40x a typical article. Confirm you want me to buy it, or tell me a cheaper alternative?

Host-side guards

Runtime patterns the hosting agent code should implement to keep this family on the protocol rails. Every guard below is deployed in biptest.com's own proxy — public reference implementation.

Symptom

Silent quit after 2xx — no user-facing summary.

Host-side fix

After the tool loop, if the assistant's clean prose is under ~20 chars and there's a 2xx in tool results, inject a final user turn: "Now write the final user-facing reply. 2-3 sentences answering the user's original question using the content you just fetched. No JSON, no Balance:, no Manifest:, no opener. Just the answer." Re-roll once.

Symptom

Duplicate GETs for the same URL in one chat turn.

Host-side fix

Per-turn response cache keyed by (method, url-without-query). Only cache 2xx; do NOT cache 402 (the retry is a different semantic request). Cache hits return the prior body and append a "[proxy note] already fetched earlier" to the tool-result summary.

Symptom

State narration leaks ("Balance: 50 Bips. Manifest: ...") into user-visible prose.

Host-side fix

Post-processor regex passes: /^\s*Balance\s*:\s*\d+\s*Bips?.*$/mi, /^\s*Manifest\s*:.*$/mi, /^\s*Step\s+\d+\s*:.*$/mi applied to the final assistant content. Strip lines, collapse double-newlines.

Symptom

TPM 6000 exhausted mid-session on Groq free tier.

Host-side fix

Extract meaningful text from HTML bodies before feeding back to the model (DOMDocument with xpath //article, //main, //section.content). 500 chars of HTML → 400 chars of prose, ~5x information density. 429s caught with Retry-After sleep + 3 retries, cap 30s each.

Symptom

Model-emitted "💸 N Bips used" footer doubles with host-emitted footer.

Host-side fix

Strip the emoji footer pattern from assistant output before appending the canonical one: /^\s*💸\s*\d+\s*Bips?\s+used.*$/miu.

What this model gets right

Known pitfalls + what to do

Tool-call format (paid retry)

Canonical system prompt

Sample Q&A — wrong vs right

Host-side guards

Building a demo?

What is Bippsi?

How does Agent Initiative certify a website?

Where can AI agents find Bippsi's access policy?