API Reference

Chat Completions

POSThttps://api.tokenbay.com/v1/chat/completions

OpenAI-compatible chat completion entry

Chat Completions

This route is registered in TokenBay Gateway and enters the OpenAI platform orchestration chain.

openai

The gateway performs API key authentication, group checks, model parsing, account scheduling, protocol conversion or pass-through when needed, then returns the upstream response.

POST/v1/chat/completions

Request

schema

modelstringRequired

An available model ID shown on the live Models page or in the console.

messagesarray<object>Required

OpenAI-compatible messages array with roles such as system, user, assistant, and tool.

messages[].rolestringRequired

Message role: system, developer, user, assistant, tool, and similar roles.

messages[].contentstring | array<object>Required

Plain text or a multimodal content array.

content[].typestringOptional

Content block type, such as text, image_url, or input_audio.

content[].textstringOptional

Text content block.

content[].image_urlobjectOptional

Image URL or data URL input. Support depends on the model.

content[].input_audioobjectOptional

Audio input block, usually containing data and format.

messages[].namestringOptional

Optional participant name.

messages[].tool_call_idstringOptional

Tool call ID for tool-role messages.

messages[].tool_callsarray<object>Optional

Tool calls requested by an assistant message.

streambooleanOptional

Set true to receive SSE streaming chunks.

temperaturenumberOptional

Sampling temperature. Support depends on the actual model.

top_pnumberOptional

Nucleus sampling parameter. Usually avoid moving both temperature and top_p aggressively.

max_tokens / max_completion_tokensintegerOptional

Maximum output tokens. Some models support only one of these fields.

stopstring | string[]Optional

Sequences where generation should stop.

presence_penalty / frequency_penaltynumberOptional

Repetition and topic penalties. Supported ranges depend on the upstream model.

response_formatobjectOptional

Structured output configuration such as JSON object or JSON schema.

response_format.typestringOptional

text, json_object, or json_schema.

response_format.json_schema.namestringOptional

JSON Schema name.

response_format.json_schema.schemaobjectOptional

JSON Schema definition.

response_format.json_schema.strictbooleanOptional

Whether to strictly follow the schema.

tools / tool_choicearray<object> | object | stringOptional

Function/tool calling configuration.

tools[].typestringOptional

Usually function.

tools[].function.namestringOptional

Function name.

tools[].function.descriptionstringOptional

Function description.

tools[].function.parametersobjectOptional

Function parameter JSON Schema.

tool_choicestring | objectOptional

auto, none, required, or a specific function.

parallel_tool_callsbooleanOptional

Whether parallel tool calls are allowed.

stream_options.include_usagebooleanOptional

Ask for usage statistics in streaming responses. Whether it is returned depends on the upstream.

seedintegerOptional

Best-effort deterministic output. Not supported by every upstream.

logprobs / top_logprobsboolean | integerOptional

Request token probability details when supported by the model.

modalities / audioarray<string> | objectOptional

Multimodal or audio output fields.

reasoning_effortstringOptional

Reasoning intensity control for models that support reasoning.

metadata / userobject | stringOptional

Client-side tracing fields. Do not include sensitive data.

Response

schema

Non-streaming responses usually keep the OpenAI Chat Completions shape. Streaming calls return SSE chunks. After streaming headers are written, the gateway cannot switch accounts for retry.

idstringOptional

Response ID returned by the upstream.

objectstringOptional

Usually chat.completion or chat.completion.chunk.

createdintegerOptional

Creation timestamp.

modelstringOptional

Actual response model.

choices[]array<object>Optional

Candidate outputs. Non-streaming responses usually include message; streaming chunks usually include delta.

choices[].indexintegerOptional

Candidate index.

choices[].message.rolestringOptional

Non-streaming message role.

choices[].message.contentstring | arrayOptional

Non-streaming message content.

choices[].message.tool_callsarray<object>Optional

Tool calls requested by the model.

choices[].deltaobjectOptional

Streaming delta payload.

choices[].finish_reasonstringOptional

Stop reason, such as stop, length, tool_calls, or content_filter.

choices[].logprobsobjectOptional

Token probability details when supported and requested.

usageobjectOptional

Token usage. For some streaming calls it may only appear in the final event or when supported upstream.

usage.prompt_tokensintegerOptional

Input token count.

usage.completion_tokensintegerOptional

Output token count.

usage.total_tokensintegerOptional

Total token count.

This entry can schedule requests to different upstream accounts. The client still sends an OpenAI-compatible body; whether protocol conversion happens depends on provider, account protocol group, and model configuration.

Chat Completions

bashStandard request

Language

curl -i -X POST https://api.tokenbay.com/v1/chat/completions \
  -H "Authorization: Bearer sk-XXXXXXX" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "user", "content": "hello"}
    ]
  }'

bashStreaming request

Language

curl -N -X POST https://api.tokenbay.com/v1/chat/completions \
  -H "Authorization: Bearer sk-XXXXXXX" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role": "user", "content": "hello"}],
    "stream": true
  }'

Chat Completions

Request

Response

Related