OpenAI Transcriptions

Spring AI supports OpenAI’s Transcription model.

Prerequisites

You will need to create an API key with OpenAI to access ChatGPT models. Create an account at OpenAI signup page and generate the token on the API Keys page. The Spring AI project defines a configuration property named spring.ai.openai.api-key that you should set to the value of the API Key obtained from openai.com. Exporting an environment variable is one way to set that configuration property:

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the OpenAI Image Generation Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Transcription Properties

The prefix spring.ai.openai.audio.transcription is used as the property prefix that lets you configure the retry mechanism for the OpenAI Image client.

Property Description Default

spring.ai.openai.audio.transcription.options.model

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

whisper-1

spring.ai.openai.audio.transcription.options.response-format

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

json

spring.ai.openai.audio.transcription.options.prompt

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

spring.ai.openai.audio.transcription.options.language

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

spring.ai.openai.audio.transcription.options.temperature

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

0

spring.ai.openai.audio.transcription.options.timestamp_granularities

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

segment

Runtime Options

The OpenAiAudioTranscriptionOptions class provides the options to use when making a transcription. On start-up, the options specified by spring.ai.openai.audio.transcription are used but you can override these at runtime.

For example:

OpenAiAudioApi.TranscriptResponseFormat responseFormat = OpenAiAudioApi.TranscriptResponseFormat.VTT;

OpenAiAudioTranscriptionOptions transcriptionOptions = OpenAiAudioTranscriptionOptions.builder()
    .withLanguage("en")
    .withPrompt("Ask not this, but ask that")
    .withTemperature(0f)
    .withResponseFormat(responseFormat)
    .build();
AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(audioFile, transcriptionOptions);
AudioTranscriptionResponse response = openAiTranscriptionClient.call(transcriptionRequest);

Manual Configuration

Add the spring-ai-openai dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create a OpenAiAudioTranscriptionClient

var openAiAudioApi = new OpenAiAudioApi(System.getenv("OPENAI_API_KEY"));

var openAiAudioTranscriptionClient = new OpenAiAudioTranscriptionClient(openAiAudioApi);

var transcriptionOptions = OpenAiAudioTranscriptionOptions.builder()
    .withResponseFormat(TranscriptResponseFormat.TEXT)
    .withTemperature(0f)
    .build();

var audioFile = new FileSystemResource("/path/to/your/resource/speech/jfk.flac");

AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(audioFile, transcriptionOptions);
AudioTranscriptionResponse response = openAiTranscriptionClient.call(transcriptionRequest);

Example Code