Overview
The ModelConfig type provides fine-grained control over model behavior. These options are passed to the config parameter in generation functions.
ModelConfig Type
type ModelConfig = {
temperature?: number;
maxTokens?: number;
topP?: number;
stopSequences?: string[];
frequencyPenalty?: number;
presencePenalty?: number;
};
Configuration Options
temperature
Controls randomness in the output. Higher values make output more creative and random, lower values make it more focused and deterministic.
Type: number
Range: 0.0 to 2.0 (provider-dependent)
Default: Usually 1.0
import { generate } from '@core-ai/core-ai';
// Creative writing
const creative = await generate({
model,
messages: [{ role: 'user', content: 'Write a fantasy story opening' }],
config: {
temperature: 1.5, // High temperature for creativity
},
});
// Factual responses
const factual = await generate({
model,
messages: [{ role: 'user', content: 'What is the capital of Germany?' }],
config: {
temperature: 0.2, // Low temperature for consistency
},
});
Use low temperature (0.0-0.3) for factual tasks, code generation, and consistency. Use high temperature (1.0-2.0) for creative writing, brainstorming, and varied outputs.
maxTokens
Maximum number of tokens to generate in the response.
Type: number
Range: Varies by model and provider
const result = await generate({
model,
messages: [{ role: 'user', content: 'Explain quantum physics' }],
config: {
maxTokens: 150, // Limit to ~150 tokens (~110 words)
},
});
Some providers (like Anthropic) require maxTokens to be set. The provider wrapper may set a default value if not specified.
Token Estimation:
- 1 token ≈ 0.75 words (English)
- 100 tokens ≈ 75 words
- 1000 tokens ≈ 750 words
topP
Nucleus sampling: considers only tokens whose cumulative probability is above this threshold.
Type: number
Range: 0.0 to 1.0
Default: Usually 1.0
const result = await generate({
model,
messages: [{ role: 'user', content: 'Generate product names' }],
config: {
temperature: 1.0,
topP: 0.9, // Consider top 90% probable tokens
},
});
Don’t use both high temperature and low topP together. They serve similar purposes and can conflict. Choose one approach.
stopSequences
Array of sequences that will stop generation when encountered.
Type: string[]
const result = await generate({
model,
messages: [{ role: 'user', content: 'Count from 1 to 100' }],
config: {
stopSequences: ['10'], // Stop when reaching "10"
},
});
console.log(result.content);
// "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
Common Use Cases:
- Limiting list generation:
stopSequences: ['\n\n']
- Code generation:
stopSequences: ['```', '\n\nfunction']
- Structured formats:
stopSequences: ['---END---']
frequencyPenalty
Reduces likelihood of repeating tokens based on how often they’ve appeared.
Type: number
Range: -2.0 to 2.0 (provider-dependent)
Default: 0.0
const result = await generate({
model,
messages: [{ role: 'user', content: 'List creative product features' }],
config: {
frequencyPenalty: 0.7, // Discourage repetition
},
});
Effects:
- Positive values: Reduce repetition
- Negative values: Allow more repetition
- Higher magnitude: Stronger effect
Use frequencyPenalty between 0.5-1.0 for creative writing or lists where you want diverse output without repetitive phrases.
presencePenalty
Reduces likelihood of tokens that have already appeared at least once.
Type: number
Range: -2.0 to 2.0 (provider-dependent)
Default: 0.0
const result = await generate({
model,
messages: [{ role: 'user', content: 'Suggest unique vacation destinations' }],
config: {
presencePenalty: 1.0, // Encourage new topics
},
});
Difference from Frequency Penalty:
presencePenalty: Binary - penalizes any token that appeared at least once
frequencyPenalty: Proportional - penalizes based on how many times token appeared
Complete Configuration Example
import { generate } from '@core-ai/core-ai';
import { createOpenAI } from '@core-ai/openai';
const openai = createOpenAI();
const model = openai.chatModel('gpt-4-turbo');
const result = await generate({
model,
messages: [
{ role: 'system', content: 'You are a creative writing assistant.' },
{ role: 'user', content: 'Write an engaging story opening' },
],
config: {
temperature: 1.2, // High creativity
maxTokens: 500, // Limit length
topP: 0.95, // Slight nucleus sampling
frequencyPenalty: 0.6, // Reduce repetition
presencePenalty: 0.3, // Encourage new ideas
stopSequences: ['---'], // Stop at custom marker
},
});
console.log(result.content);
console.log('Tokens used:', result.usage.outputTokens);
Provider-Specific Options
Some providers support additional options via providerOptions:
type GenerateOptions = {
messages: Message[];
config?: ModelConfig;
providerOptions?: Record<string, unknown>; // Provider-specific
// ... other options
};
OpenAI Example:
const result = await generate({
model: openai.chatModel('gpt-4-turbo'),
messages: [{ role: 'user', content: 'Hello' }],
providerOptions: {
user: 'user-12345', // OpenAI-specific: user identifier for abuse monitoring
seed: 42, // OpenAI-specific: deterministic sampling
},
});
Provider options are not validated by Core AI and are passed directly to the provider. Consult provider documentation for available options.
Reasoning Configuration
For models that support extended thinking (like Claude with extended thinking):
type ReasoningConfig = {
effort: ReasoningEffort;
};
type ReasoningEffort =
| 'minimal'
| 'low'
| 'medium'
| 'high'
| 'max';
Usage:
const result = await generate({
model: anthropic.chatModel('claude-3-5-sonnet-20241022'),
messages: [
{ role: 'user', content: 'Solve this complex logic puzzle...' },
],
reasoning: {
effort: 'high', // Allow extensive internal reasoning
},
});
if (result.reasoning) {
console.log('Reasoning:', result.reasoning);
}
console.log('Answer:', result.content);
Reasoning configuration is provider-dependent. Check if your model supports extended thinking before using this option.
Configuration Best Practices
For Different Tasks
Code Generation:
config: {
temperature: 0.2,
maxTokens: 2000,
stopSequences: ['```\n\n'],
}
Creative Writing:
config: {
temperature: 1.3,
frequencyPenalty: 0.7,
presencePenalty: 0.4,
}
Question Answering:
config: {
temperature: 0.3,
maxTokens: 300,
}
Brainstorming/Ideas:
config: {
temperature: 1.5,
presencePenalty: 1.0,
}
Testing Configurations
Start with default settings and adjust one parameter at a time. Temperature is usually the most impactful setting to tune first.
// Baseline
const baseline = await generate({ model, messages });
// Test temperature variations
const temps = [0.3, 0.7, 1.0, 1.5];
for (const temp of temps) {
const result = await generate({
model,
messages,
config: { temperature: temp },
});
console.log(`Temperature ${temp}:`, result.content);
}
Usage Tracking
All generation results include token usage information:
type ChatUsage = {
inputTokens: number; // Total input tokens (including cached)
outputTokens: number; // Total output tokens (including reasoning)
inputTokenDetails: ChatInputTokenDetails;
outputTokenDetails: ChatOutputTokenDetails;
};
type ChatInputTokenDetails = {
cacheReadTokens: number; // Tokens served from cache
cacheWriteTokens: number; // Tokens written to cache
};
type ChatOutputTokenDetails = {
reasoningTokens?: number; // Tokens used for reasoning/thinking
};
Example:
const result = await generate({ model, messages });
console.log('Input tokens:', result.usage.inputTokens);
console.log('Output tokens:', result.usage.outputTokens);
console.log('Cache read:', result.usage.inputTokenDetails.cacheReadTokens);
if (result.usage.outputTokenDetails.reasoningTokens) {
console.log('Reasoning tokens:', result.usage.outputTokenDetails.reasoningTokens);
}
Abort Signal
Cancel long-running requests with AbortSignal:
const controller = new AbortController();
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);
try {
const result = await generate({
model,
messages: [{ role: 'user', content: 'Write a long essay...' }],
signal: controller.signal,
});
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled');
}
}
Next Steps