API reference

Chat completions

Create a model response for a conversation. The endpoint mirrors OpenAI's Chat Completions API field-for-field.

Endpoint

POST/api/v1/chat/completions

Authenticate with a fw_live_ bearer key (see Authentication), and send Content-Type: application/json.

Request body

modelstringrequired

The niche to answer as — the bare slug (fitness) or the served id (flywheel-fitness, an optional :tag is ignored). Browse every slug in the catalog. An unknown model returns 404.

messagesarrayrequired

The conversation so far, as OpenAI message objects: { "role", "content" } whererole is system, user, or assistant. Send the full history each call — the API is stateless.

max_tokensintegeroptional

Maximum tokens to generate. Defaults to 512 and is clamped to a server ceiling of 1024. Generation also stops at the model’s natural end of turn.

temperaturenumberoptional

Sampling temperature. Lower is more deterministic, higher is more varied. Omit for the model’s tuned default.

streambooleanoptional

Accepted for OpenAI-SDK compatibility. The hosted API currently returns a single complete response; incremental token streaming is on the roadmap — see Streaming.

Note.Unsupported OpenAI fields (e.g. tools, logprobs, n) are ignored rather than rejected, so existing OpenAI payloads keep working. See OpenAI compatibility for the full matrix.

Example

shell

curl https://gyld.dev/api/v1/chat/completions \
  -H "Authorization: Bearer $FLYWHEEL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fitness",
    "messages": [
      { "role": "system", "content": "You are a gym front desk." },
      { "role": "user", "content": "Beginner full-body workout?" }
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

The response

A standard OpenAI chat.completion object. The assistant’s reply is at choices[0].message.content, and usage reports token counts for billing.

json

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "fitness",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "A solid beginner full-body plan is …" },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 28, "completion_tokens": 96, "total_tokens": 124 }
}

choices[].message.contentstringrequired

The model’s reply text.

choices[].finish_reasonstringrequired

Why generation stopped: stop (natural end) or length (hit max_tokens).

usageobjectrequired

prompt_tokens, completion_tokens, and total_tokens — what the request counts against your plan.

Limits

Up to 200 messages per request.
24,000 characters per message, 60,000 characters across all messages.
4 concurrent requests per key (one model loads at a time on a worker).
An upstream attempt is bounded at 60 seconds.

Exceeding a content cap returns 400; too many in-flight requests return 429. Full status-code reference on Errors & rate limits.

← Authentication Streaming →