Skip to main content

Overview

Generation functions accept configuration options as flat top-level parameters on the options object. The core options temperature, maxTokens, and topP are available on every generation call. Additional parameters like stopSequences, frequencyPenalty, and presencePenalty are provider-specific and passed via providerOptions.

Core options

These options are available on BaseGenerateOptions and apply to generate(), stream(), generateObject(), and streamObject():
type BaseGenerateOptions = {
  messages: Message[];
  temperature?: number;
  maxTokens?: number;
  topP?: number;
  reasoning?: ReasoningConfig;
  providerOptions?: GenerateProviderOptions;
  signal?: AbortSignal;
};

temperature

Controls randomness in the output. Higher values make output more creative and random, lower values make it more focused and deterministic. Type: number
Range: 0.0 to 2.0 (provider-dependent)
Default: Usually 1.0
import { generate } from '@core-ai/core-ai';

const creative = await generate({
  model,
  messages: [{ role: 'user', content: 'Write a fantasy story opening' }],
  temperature: 1.5,
});

const factual = await generate({
  model,
  messages: [{ role: 'user', content: 'What is the capital of Germany?' }],
  temperature: 0.2,
});
Use low temperature (0.0-0.3) for factual tasks, code generation, and consistency. Use high temperature (1.0-2.0) for creative writing, brainstorming, and varied outputs.

maxTokens

Maximum number of tokens to generate in the response. Type: number
Range: Varies by model and provider
const result = await generate({
  model,
  messages: [{ role: 'user', content: 'Explain quantum physics' }],
  maxTokens: 150,
});
Some providers (like Anthropic) require maxTokens to be set. The provider wrapper may set a default value if not specified.
Token Estimation:
  • 1 token ≈ 0.75 words (English)
  • 100 tokens ≈ 75 words
  • 1000 tokens ≈ 750 words

topP

Nucleus sampling: considers only tokens whose cumulative probability is above this threshold. Type: number
Range: 0.0 to 1.0
Default: Usually 1.0
const result = await generate({
  model,
  messages: [{ role: 'user', content: 'Generate product names' }],
  temperature: 1.0,
  topP: 0.9,
});
Don’t use both high temperature and low topP together. They serve similar purposes and can conflict. Choose one approach.

Provider-specific options

Options like stopSequences, frequencyPenalty, and presencePenalty are not part of the core options. They are passed through providerOptions, namespaced by provider:
type GenerateProviderOptions = {
  [provider: string]: Record<string, unknown> | undefined;
};
Each provider defines and validates its own set of options. See the provider pages for the full schema:
  • OpenAI — Responses API (createOpenAI): store, serviceTier, include, parallelToolCalls, user. Chat Completions API (createOpenAICompat): store, serviceTier, parallelToolCalls, user, stopSequences, frequencyPenalty, presencePenalty, seed
  • AnthropictopK, stopSequences, betas, outputConfig, cacheControl
  • Google GenAIstopSequences, frequencyPenalty, presencePenalty, seed, topK
  • MistralstopSequences, frequencyPenalty, presencePenalty, randomSeed, parallelToolCalls, promptMode, safePrompt

stopSequences

Array of sequences that stop generation when encountered. Passed via providerOptions:
const result = await generate({
  model,
  messages: [{ role: 'user', content: 'Count from 1 to 100' }],
  providerOptions: {
    google: { stopSequences: ['10'] },
  },
});

console.log(result.content);
// "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
stopSequences support varies by provider. For OpenAI, it’s only available with createOpenAICompat (Chat Completions API). The default createOpenAI (Responses API) does not support it.

frequencyPenalty

Reduces likelihood of repeating tokens based on how often they’ve appeared. Range: -2.0 to 2.0 (provider-dependent)
const result = await generate({
  model,
  messages: [{ role: 'user', content: 'List creative product features' }],
  providerOptions: {
    google: { frequencyPenalty: 0.7 },
  },
});
Use frequencyPenalty between 0.5-1.0 for creative writing or lists where you want diverse output without repetitive phrases.

presencePenalty

Reduces likelihood of tokens that have already appeared at least once. Range: -2.0 to 2.0 (provider-dependent)
const result = await generate({
  model,
  messages: [{ role: 'user', content: 'Suggest unique vacation destinations' }],
  providerOptions: {
    google: { presencePenalty: 1.0 },
  },
});
Difference from Frequency Penalty:
  • presencePenalty: Binary — penalizes any token that appeared at least once
  • frequencyPenalty: Proportional — penalizes based on how many times token appeared

Complete configuration example

import { generate } from '@core-ai/core-ai';
import { createOpenAICompat } from '@core-ai/openai/compat';

const openai = createOpenAICompat();
const model = openai.chatModel('gpt-5-mini');

const result = await generate({
  model,
  messages: [
    { role: 'system', content: 'You are a creative writing assistant.' },
    { role: 'user', content: 'Write an engaging story opening' },
  ],
  temperature: 1.2,
  maxTokens: 500,
  topP: 0.95,
  providerOptions: {
    openai: {
      frequencyPenalty: 0.6,
      presencePenalty: 0.3,
      stopSequences: ['---'],
    },
  },
});

console.log(result.content);
console.log('Tokens used:', result.usage.outputTokens);

Reasoning configuration

For models that support extended thinking:
type ReasoningConfig = {
  effort: ReasoningEffort;
};

type ReasoningEffort =
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'max';
Usage:
const result = await generate({
  model: anthropic.chatModel('claude-sonnet-4-6'),
  messages: [
    { role: 'user', content: 'Solve this complex logic puzzle...' },
  ],
  reasoning: {
    effort: 'high',
  },
});

if (result.reasoning) {
  console.log('Reasoning:', result.reasoning);
}
console.log('Answer:', result.content);
Reasoning configuration is provider-dependent. Check if your model supports extended thinking before using this option.
Providers interpret reasoning differently. Anthropic and OpenAI enforce model-specific restrictions, Google maps effort to thinking level or budget, and Mistral accepts the option but does not send effort to the API.

Configuration best practices

For different tasks

Code Generation:
const result = await generate({
  model,
  messages,
  temperature: 0.2,
  maxTokens: 2000,
  providerOptions: {
    google: { stopSequences: ['```\n\n'] },
  },
});
Creative Writing:
const result = await generate({
  model,
  messages,
  temperature: 1.3,
  providerOptions: {
    google: { frequencyPenalty: 0.7, presencePenalty: 0.4 },
  },
});
Question Answering:
const result = await generate({
  model,
  messages,
  temperature: 0.3,
  maxTokens: 300,
});
Brainstorming/Ideas:
const result = await generate({
  model,
  messages,
  temperature: 1.5,
  providerOptions: {
    google: { presencePenalty: 1.0 },
  },
});

Testing configurations

Start with default settings and adjust one parameter at a time. Temperature is usually the most impactful setting to tune first.
const baseline = await generate({ model, messages });

const temps = [0.3, 0.7, 1.0, 1.5];
for (const temp of temps) {
  const result = await generate({
    model,
    messages,
    temperature: temp,
  });
  console.log(`Temperature ${temp}:`, result.content);
}

Usage tracking

All generation results include token usage information:
type ChatUsage = {
  inputTokens: number;
  outputTokens: number;
  inputTokenDetails: ChatInputTokenDetails;
  outputTokenDetails: ChatOutputTokenDetails;
};

type ChatInputTokenDetails = {
  cacheReadTokens: number;
  cacheWriteTokens: number;
};

type ChatOutputTokenDetails = {
  reasoningTokens?: number;
};
Example:
const result = await generate({ model, messages });

console.log('Input tokens:', result.usage.inputTokens);
console.log('Output tokens:', result.usage.outputTokens);
console.log('Cache read:', result.usage.inputTokenDetails.cacheReadTokens);
if (result.usage.outputTokenDetails.reasoningTokens) {
  console.log('Reasoning tokens:', result.usage.outputTokenDetails.reasoningTokens);
}

Abort signal

Cancel long-running requests with AbortSignal:
import { CoreAIError } from '@core-ai/core-ai';

const controller = new AbortController();

setTimeout(() => controller.abort(), 5000);

try {
  const result = await generate({
    model,
    messages: [{ role: 'user', content: 'Write a long essay...' }],
    signal: controller.signal,
  });
} catch (error) {
  if (error instanceof CoreAIError) {
    console.log('Request was cancelled:', error.message);
  }
}

Next Steps