Chat Completions
https://api.tokenbay.com/v1/chat/completionsOpenAI-compatible chat completion entry
Chat Completions
This route is registered in TokenBay Gateway and enters the OpenAI platform orchestration chain.
The gateway performs API key authentication, group checks, model parsing, account scheduling, protocol conversion or pass-through when needed, then returns the upstream response.
/v1/chat/completionsRequest
schemamodelstringRequiredAn available model ID shown on the live Models page or in the console.
messagesarray<object>Required
OpenAI-compatible messages array with roles such as system, user, assistant, and tool.
messages[].rolestringRequiredMessage role: system, developer, user, assistant, tool, and similar roles.
messages[].contentstring | array<object>Required
Plain text or a multimodal content array.
content[].typestringOptionalContent block type, such as text, image_url, or input_audio.
content[].textstringOptionalText content block.
content[].image_urlobjectOptionalImage URL or data URL input. Support depends on the model.
content[].input_audioobjectOptionalAudio input block, usually containing data and format.
messages[].namestringOptionalOptional participant name.
messages[].tool_call_idstringOptionalTool call ID for tool-role messages.
messages[].tool_callsarray<object>OptionalTool calls requested by an assistant message.
streambooleanOptionalSet true to receive SSE streaming chunks.
temperaturenumberOptionalSampling temperature. Support depends on the actual model.
top_pnumberOptionalNucleus sampling parameter. Usually avoid moving both temperature and top_p aggressively.
max_tokens / max_completion_tokensintegerOptionalMaximum output tokens. Some models support only one of these fields.
stopstring | string[]OptionalSequences where generation should stop.
presence_penalty / frequency_penaltynumberOptionalRepetition and topic penalties. Supported ranges depend on the upstream model.
response_formatobjectOptional
Structured output configuration such as JSON object or JSON schema.
response_format.typestringOptionaltext, json_object, or json_schema.
response_format.json_schema.namestringOptionalJSON Schema name.
response_format.json_schema.schemaobjectOptionalJSON Schema definition.
response_format.json_schema.strictbooleanOptionalWhether to strictly follow the schema.
tools / tool_choicearray<object> | object | stringOptional
Function/tool calling configuration.
tools[].typestringOptionalUsually function.
tools[].function.namestringOptionalFunction name.
tools[].function.descriptionstringOptionalFunction description.
tools[].function.parametersobjectOptionalFunction parameter JSON Schema.
tool_choicestring | objectOptionalauto, none, required, or a specific function.
parallel_tool_callsbooleanOptionalWhether parallel tool calls are allowed.
stream_options.include_usagebooleanOptionalAsk for usage statistics in streaming responses. Whether it is returned depends on the upstream.
seedintegerOptionalBest-effort deterministic output. Not supported by every upstream.
logprobs / top_logprobsboolean | integerOptionalRequest token probability details when supported by the model.
modalities / audioarray<string> | objectOptionalMultimodal or audio output fields.
reasoning_effortstringOptionalReasoning intensity control for models that support reasoning.
metadata / userobject | stringOptionalClient-side tracing fields. Do not include sensitive data.
Response
schemaNon-streaming responses usually keep the OpenAI Chat Completions shape. Streaming calls return SSE chunks. After streaming headers are written, the gateway cannot switch accounts for retry.
idstringOptionalResponse ID returned by the upstream.
objectstringOptionalUsually chat.completion or chat.completion.chunk.
createdintegerOptionalCreation timestamp.
modelstringOptionalActual response model.
choices[]array<object>Optional
Candidate outputs. Non-streaming responses usually include message; streaming chunks usually include delta.
choices[].indexintegerOptionalCandidate index.
choices[].message.rolestringOptionalNon-streaming message role.
choices[].message.contentstring | arrayOptionalNon-streaming message content.
choices[].message.tool_callsarray<object>OptionalTool calls requested by the model.
choices[].deltaobjectOptionalStreaming delta payload.
choices[].finish_reasonstringOptionalStop reason, such as stop, length, tool_calls, or content_filter.
choices[].logprobsobjectOptionalToken probability details when supported and requested.
usageobjectOptional
Token usage. For some streaming calls it may only appear in the final event or when supported upstream.
usage.prompt_tokensintegerOptionalInput token count.
usage.completion_tokensintegerOptionalOutput token count.
usage.total_tokensintegerOptionalTotal token count.
This entry can schedule requests to different upstream accounts. The client still sends an OpenAI-compatible body; whether protocol conversion happens depends on provider, account protocol group, and model configuration.
