embed() - core-ai

Overview

The embed() function generates vector embeddings for text input using embedding models. Embeddings are useful for semantic search, clustering, recommendations, and other AI tasks that require numerical representations of text.

Function Signature

export async function embed(
    params: EmbedParams
): Promise<EmbedResult>

export type EmbedParams = EmbedOptions & {
    model: EmbeddingModel;
};

Parameters

model

EmbeddingModel

required

The embedding model instance to use for generating embeddings.

input

string | string[]

required

Text input to embed. Can be a single string or an array of strings. Must not be empty.

dimensions

number

Optional dimension size for the output embeddings. Not all models support this parameter.

providerOptions

EmbedProviderOptions

Provider-specific options, namespaced by provider name (e.g. { openai: { encodingFormat: 'float' } }).

Return Value

Returns a Promise<EmbedResult> with the following properties:

embeddings

number[][]

Array of embedding vectors. Each vector is an array of numbers representing the embedding dimensions.

For single string input: Returns array with one embedding
For array input: Returns array with one embedding per input string

usage

EmbeddingUsage | undefined

Optional token usage metadata. Some providers/models do not expose token usage for embedding calls.

Show EmbeddingUsage

inputTokens

number

Number of tokens consumed by the embedding input.

Examples

Single String Embedding

import { embed } from '@core-ai/core-ai';
import { createOpenAI } from '@core-ai/openai';

const openai = createOpenAI();

const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: 'Hello, world!'
});

console.log(result.embeddings[0]); // [0.1, 0.2, -0.3, ...]
console.log(result.embeddings[0].length); // 1536 (dimension size)

Batch Embedding

const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: [
    'First document about AI',
    'Second document about machine learning',
    'Third document about neural networks'
  ]
});

console.log(result.embeddings.length); // 3
console.log(result.embeddings[0]); // First document's embedding
console.log(result.embeddings[1]); // Second document's embedding
console.log(result.embeddings[2]); // Third document's embedding

With Custom Dimensions

const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: 'This is a test',
  dimensions: 256 // Reduce from default 1536 to 256
});

console.log(result.embeddings[0].length); // 256

Checking Token Usage

const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: 'Sample text for embedding'
});

if (result.usage) {
  console.log('Tokens used:', result.usage.inputTokens);
} else {
  console.log('Usage information not available');
}

Semantic Search Use Case

// Embed documents
const documents = [
  'The quick brown fox jumps over the lazy dog',
  'Artificial intelligence is transforming technology',
  'Machine learning models require training data'
];

const docResult = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: documents
});

const docEmbeddings = docResult.embeddings;

// Embed query
const queryResult = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: 'What is AI?'
});

const queryEmbedding = queryResult.embeddings[0];

// Calculate cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

// Find most similar document
const similarities = docEmbeddings.map(docEmbed => 
  cosineSimilarity(queryEmbedding, docEmbed)
);

const mostSimilarIndex = similarities.indexOf(Math.max(...similarities));
console.log('Most similar document:', documents[mostSimilarIndex]);
// Output: "Artificial intelligence is transforming technology"

Clustering Documents

const articles = [
  'Python programming tutorial',
  'JavaScript web development',
  'Cooking pasta recipes',
  'Italian cuisine guide',
  'TypeScript type system'
];

const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: articles
});

// Use embeddings for clustering (e.g., K-means)
// Group similar articles together based on their embeddings
const embeddings = result.embeddings;

Error Handling

Throws ValidationError if:

Input is an empty string
Input is an empty array

May also throw:

ProviderError if the provider returns an error during embedding

import { ValidationError } from '@core-ai/core-ai';

try {
  const result = await embed({
    model: openai.embeddingModel('text-embedding-3-small'),
    input: '' // Empty string
  });
} catch (error) {
  if (error instanceof ValidationError) {
    console.error('Embedding failed:', error.message);
    // Output: "input must not be empty"
  }
}

try {
  const result = await embed({
    model: openai.embeddingModel('text-embedding-3-small'),
    input: [] // Empty array
  });
} catch (error) {
  if (error instanceof ValidationError) {
    console.error('Embedding failed:', error.message);
    // Output: "input must not be empty"
  }
}

Provider Support

Different providers have different embedding models and capabilities:

import { createOpenAI } from '@core-ai/openai';
const openai = createOpenAI();
const smallModel = openai.embeddingModel('text-embedding-3-small');
const largeModel = openai.embeddingModel('text-embedding-3-large');

Performance Tips

Batch multiple inputs in a single call instead of making separate calls for each input. This is more efficient and faster.

// Good: Single batch call
const result = await embed({
  model: openai.embeddingModel('text-embedding-3-small'),
  input: ['text1', 'text2', 'text3']
});

// Less efficient: Multiple separate calls
const result1 = await embed({ model, input: 'text1' });
const result2 = await embed({ model, input: 'text2' });
const result3 = await embed({ model, input: 'text3' });

Use smaller dimension sizes when possible to reduce storage and computation costs. The text-embedding-3-small and text-embedding-3-large models support custom dimensions.

Common Use Cases

Semantic Search: Find documents similar to a query
Clustering: Group similar documents together
Recommendations: Recommend items based on similarity
Classification: Use embeddings as features for ML models
Anomaly Detection: Identify outliers based on embedding distance
Deduplication: Find and remove duplicate or near-duplicate content

​Overview

​Function Signature

​Parameters

​Return Value

​Examples

​Single String Embedding

​Batch Embedding

​With Custom Dimensions

​Checking Token Usage

​Semantic Search Use Case

​Clustering Documents

​Error Handling

​Provider Support

​Performance Tips

​Common Use Cases

Overview

Function Signature

Parameters

Return Value

Examples

Single String Embedding

Batch Embedding

With Custom Dimensions

Checking Token Usage

Semantic Search Use Case

Clustering Documents

Error Handling

Provider Support

Performance Tips

Common Use Cases