API reference
Chat completions
Create a model response for a conversation. The endpoint mirrors OpenAI's Chat Completions API field-for-field.
Endpoint
POST/api/v1/chat/completions
Authenticate with a fw_live_ bearer key (see Authentication), and send Content-Type: application/json.
Request body
modelstringrequiredThe niche to answer as — the bare slug (
fitness) or the served id (flywheel-fitness, an optional :tag is ignored). Browse every slug in the catalog. An unknown model returns 404.messagesarrayrequiredThe conversation so far, as OpenAI message objects:
{ "role", "content" } whererole is system, user, or assistant. Send the full history each call — the API is stateless.max_tokensintegeroptionalMaximum tokens to generate. Defaults to
512 and is clamped to a server ceiling of 1024. Generation also stops at the model’s natural end of turn.temperaturenumberoptionalSampling temperature. Lower is more deterministic, higher is more varied. Omit for the model’s tuned default.
streambooleanoptionalAccepted for OpenAI-SDK compatibility. The hosted API currently returns a single complete response; incremental token streaming is on the roadmap — see Streaming.
Note.Unsupported OpenAI fields (e.g.
tools, logprobs, n) are ignored rather than rejected, so existing OpenAI payloads keep working. See OpenAI compatibility for the full matrix.Example
shell
curl https://gyld.dev/api/v1/chat/completions \
-H "Authorization: Bearer $FLYWHEEL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fitness",
"messages": [
{ "role": "system", "content": "You are a gym front desk." },
{ "role": "user", "content": "Beginner full-body workout?" }
],
"max_tokens": 512,
"temperature": 0.7
}'The response
A standard OpenAI chat.completion object. The assistant’s reply is at choices[0].message.content, and usage reports token counts for billing.
json
{
"id": "chatcmpl-…",
"object": "chat.completion",
"created": 1735689600,
"model": "fitness",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "A solid beginner full-body plan is …" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 28, "completion_tokens": 96, "total_tokens": 124 }
}choices[].message.contentstringrequiredThe model’s reply text.
choices[].finish_reasonstringrequiredWhy generation stopped:
stop (natural end) or length (hit max_tokens).usageobjectrequiredprompt_tokens, completion_tokens, and total_tokens — what the request counts against your plan.Limits
- Up to 200 messages per request.
- 24,000 characters per message, 60,000 characters across all messages.
- 4 concurrent requests per key (one model loads at a time on a worker).
- An upstream attempt is bounded at 60 seconds.
Exceeding a content cap returns 400; too many in-flight requests return 429. Full status-code reference on Errors & rate limits.