Use the IEmbeddingGenerator interface

The IEmbeddingGenerator<TInput,TEmbedding> interface represents a generic generator of embeddings. For the generic type parameters, TInput is the type of input values being embedded, and TEmbedding is the type of generated embedding, which inherits from the Embedding class.

The Embedding class serves as a base class for embeddings generated by an IEmbeddingGenerator. It's designed to store and manage the metadata and data associated with embeddings. Derived types, like Embedding<T>, provide the concrete embedding vector data. For example, an Embedding<float> exposes a ReadOnlyMemory<float> Vector { get; } property for access to its embedding data.

The IEmbeddingGenerator interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.

Create embeddings

The primary operation performed with an IEmbeddingGenerator<TInput,TEmbedding> is embedding generation, which is accomplished with its GenerateAsync method.

using Microsoft.Extensions.AI;
using OllamaSharp;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini");

foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

Accelerator extension methods also exist to simplify common cases, such as generating an embedding vector from a single input.

ReadOnlyMemory<float> vector = await generator.GenerateVectorAsync("What is AI?");

Pipelines of functionality

As with IChatClient, IEmbeddingGenerator implementations can be layered. Microsoft.Extensions.AI provides a delegating implementation for IEmbeddingGenerator for caching and telemetry.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OllamaSharp;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter
string sourceName = Guid.NewGuid().ToString();
TracerProvider tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

// Explore changing the order of the intermediate "Use" calls to see
// what impact that has on what gets cached and traced.
IEmbeddingGenerator<string, Embedding<float>> generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
        new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini"))
    .UseDistributedCache(
        new MemoryDistributedCache(
            Options.Create(new MemoryDistributedCacheOptions())))
    .UseOpenTelemetry(sourceName: sourceName)
    .Build();

GeneratedEmbeddings<Embedding<float>> embeddings = await generator.GenerateAsync(
[
    "What is AI?",
    "What is .NET?",
    "What is AI?"
]);

foreach (Embedding<float> embedding in embeddings)
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

The IEmbeddingGenerator enables building custom middleware that extends the functionality of an IEmbeddingGenerator. The DelegatingEmbeddingGenerator<TInput,TEmbedding> class is an implementation of the IEmbeddingGenerator<TInput, TEmbedding> interface that serves as a base class for creating embedding generators that delegate their operations to another IEmbeddingGenerator<TInput, TEmbedding> instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as GenerateAsync and Dispose, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.

The following is an example implementation of such a delegating embedding generator that rate-limits embedding generation requests:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

public class RateLimitingEmbeddingGenerator(
    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
{
    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        return await base.GenerateAsync(values, options, cancellationToken);
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            rateLimiter.Dispose();
        }

        base.Dispose(disposing);
    }
}

This can then be layered around an arbitrary IEmbeddingGenerator<string, Embedding<float>> to rate limit all embedding generation operations.

using Microsoft.Extensions.AI;
using OllamaSharp;
using System.Threading.RateLimiting;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new RateLimitingEmbeddingGenerator(
        new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini"),
        new ConcurrencyLimiter(new()
        {
            PermitLimit = 1,
            QueueLimit = int.MaxValue
        }));

foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

In this way, the RateLimitingEmbeddingGenerator can be composed with other IEmbeddingGenerator<string, Embedding<float>> instances to provide rate-limiting functionality.

Implementation examples

Most users don't need to implement the IEmbeddingGenerator interface. However, if you're a library author, then it might be helpful to look at these implementation examples.

The following code shows how the SampleEmbeddingGenerator class implements the IEmbeddingGenerator<TInput,TEmbedding> interface. It has a primary constructor that accepts an endpoint and model ID, which are used to identify the generator. It also implements the GenerateAsync(IEnumerable<TInput>, EmbeddingGenerationOptions, CancellationToken) method to generate embeddings for a collection of input values.

using Microsoft.Extensions.AI;

public sealed class SampleEmbeddingGenerator(
    Uri endpoint, string modelId)
        : IEmbeddingGenerator<string, Embedding<float>>
{
    private readonly EmbeddingGeneratorMetadata _metadata =
        new("SampleEmbeddingGenerator", endpoint, modelId);

    public async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Simulate some async operation.
        await Task.Delay(100, cancellationToken);

        // Create random embeddings.
        return [.. from value in values
            select new Embedding<float>(
                Enumerable.Range(0, 384)
                .Select(_ => Random.Shared.NextSingle()).ToArray())];
    }

    public object? GetService(Type serviceType, object? serviceKey) =>
        serviceKey is not null
        ? null
        : serviceType == typeof(EmbeddingGeneratorMetadata)
            ? _metadata
            : serviceType?.IsInstanceOfType(this) is true
                ? this
                : null;

    void IDisposable.Dispose() { }
}

This sample implementation just generates random embedding vectors. For a more realistic, concrete implementation, see OpenTelemetryEmbeddingGenerator.cs.

Feedback

Was this page helpful?

Last updated on 2025-12-12