Llama Chat

Following the Bedrock recommendations, Spring AI is transitioning to using Amazon Bedrock’s Converse API for all Chat conversation implementations in Spring AI. While the existing InvokeModel API supports conversation applications, we strongly recommend adopting the Bedrock Converse API for several key benefits:

Unified Interface: Write your code once and use it with any supported Amazon Bedrock model
Model Flexibility: Seamlessly switch between different conversation models without code changes
Extended Functionality: Support for model-specific parameters through dedicated structures
Tool Support: Native integration with function calling and tool usage capabilities
Multimodal Capabilities: Built-in support for vision and other multimodal features
Future-Proof: Aligned with Amazon Bedrock’s recommended best practices

Meta’s Llama Chat is part of the Llama collection of large language models. It excels in dialogue-based applications with a parameter scale ranging from 7 billion to 70 billion. Leveraging public datasets and over 1 million human annotations, Llama Chat offers context-aware dialogues.

Trained on 2 trillion tokens from public data sources, Llama-Chat provides extensive knowledge for insightful conversations. Rigorous testing, including over 1,000 hours of red-teaming and annotation, ensures both performance and safety, making it a reliable choice for AI-driven dialogues.

The AWS Llama Model Page and Amazon Bedrock User Guide contains detailed information on how to use the AWS hosted model.

Prerequisites

Refer to the Spring AI documentation on Amazon Bedrock for setting up API access.

Add Repositories and BOM

Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

Add the spring-ai-bedrock-ai-spring-boot-starter dependency to your project’s Maven pom.xml file:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-bedrock-ai-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-bedrock-ai-spring-boot-starter'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Enable Llama Chat Support

By default the Bedrock Llama model is disabled. To enable it set the spring.ai.bedrock.llama.chat.enabled property to true. Exporting environment variable is one way to set this configuration property:

export SPRING_AI_BEDROCK_LLAMA_CHAT_ENABLED=true

Chat Properties

The prefix spring.ai.bedrock.aws is the property prefix to configure the connection to AWS Bedrock.

Property	Description	Default
spring.ai.bedrock.aws.region	AWS region to use.	us-east-1
spring.ai.bedrock.aws.timeout	AWS timeout to use.	5m
spring.ai.bedrock.aws.access-key	AWS access key.	-
spring.ai.bedrock.aws.secret-key	AWS secret key.	-

Property

Description

Default

spring.ai.bedrock.aws.region

AWS region to use.

us-east-1

spring.ai.bedrock.aws.timeout

AWS timeout to use.

spring.ai.bedrock.aws.access-key

AWS access key.

spring.ai.bedrock.aws.secret-key

AWS secret key.

The prefix spring.ai.bedrock.llama.chat is the property prefix that configures the chat model implementation for Llama.

Property	Description	Default
spring.ai.bedrock.llama.chat.enabled	Enable or disable support for Llama	false
spring.ai.bedrock.llama.chat.model	The model id to use (See Below)	meta.llama3-70b-instruct-v1:0
spring.ai.bedrock.llama.chat.options.temperature	Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the model. This value specifies default to be used by the backend while making the call to the model.	0.7
spring.ai.bedrock.llama.chat.options.top-p	The maximum cumulative probability of tokens to consider when sampling. The model uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.	AWS Bedrock default
spring.ai.bedrock.llama.chat.options.max-gen-len	Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds maxGenLen.	300

Property

Description

Default

spring.ai.bedrock.llama.chat.enabled

Enable or disable support for Llama

false

spring.ai.bedrock.llama.chat.model

The model id to use (See Below)

meta.llama3-70b-instruct-v1:0

spring.ai.bedrock.llama.chat.options.temperature

Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the model. This value specifies default to be used by the backend while making the call to the model.

0.7

spring.ai.bedrock.llama.chat.options.top-p

The maximum cumulative probability of tokens to consider when sampling. The model uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.

AWS Bedrock default

spring.ai.bedrock.llama.chat.options.max-gen-len

Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds maxGenLen.

300

Look at LlamaChatBedrockApi#LlamaChatModel for other model IDs. The other value supported is meta.llama2-13b-chat-v1. Model ID values can also be found in the AWS Bedrock documentation for base model IDs.

All properties prefixed with spring.ai.bedrock.llama.chat.options can be overridden at runtime by adding a request specific Runtime Options to the Prompt call.

Runtime Options

The BedrockLlChatOptions.java provides model configurations, such as temperature, topK, topP, etc.

On start-up, the default options can be configured with the BedrockLlamaChatModel(api, options) constructor or the spring.ai.bedrock.llama.chat.options.* properties.

At run-time you can override the default options by adding new, request specific, options to the Prompt call. For example to override the default temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        BedrockLlamaChatOptions.builder()
            .temperature(0.4)
        .build()
    ));

In addition to the model specific BedrockLlamaChatOptions you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder().

Sample Controller

Create a new Spring Boot project and add the spring-ai-bedrock-ai-spring-boot-starter to your pom (or gradle) dependencies.

Add a application.properties file, under the src/main/resources directory, to enable and configure the Anthropic chat model:

spring.ai.bedrock.aws.region=eu-central-1
spring.ai.bedrock.aws.timeout=1000ms
spring.ai.bedrock.aws.access-key=${AWS_ACCESS_KEY_ID}
spring.ai.bedrock.aws.secret-key=${AWS_SECRET_ACCESS_KEY}

spring.ai.bedrock.llama.chat.enabled=true
spring.ai.bedrock.llama.chat.options.temperature=0.8

replace the regions, access-key and secret-key with your AWS credentials.

This will create a BedrockLlamaChatModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final BedrockLlamaChatModel chatModel;

    @Autowired
    public ChatController(BedrockLlamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}

Manual Configuration

The BedrockLlamaChatModel implements the ChatModel and StreamingChatModel and uses the Low-level LlamaChatBedrockApi Client to connect to the Bedrock Anthropic service.

Add the spring-ai-bedrock dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-bedrock</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-bedrock'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create an BedrockLlamaChatModel and use it for text generations:

LlamaChatBedrockApi api = new LlamaChatBedrockApi(LlamaChatModel.LLAMA2_70B_CHAT_V1.id(),
	EnvironmentVariableCredentialsProvider.create(),
	Region.US_EAST_1.id(),
	new ObjectMapper(),
	Duration.ofMillis(1000L));

BedrockLlamaChatModel chatModel = new BedrockLlamaChatModel(this.api,
    BedrockLlamaChatOptions.builder()
        .temperature(0.5)
        .maxGenLen(100)
        .topP(0.9).build());

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// Or with streaming responses
Flux<ChatResponse> response = this.chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

Low-level LlamaChatBedrockApi Client

LlamaChatBedrockApi provides is lightweight Java client on top of AWS Bedrock Meta Llama 2 and Llama 2 Chat models.

Following class diagram illustrates the LlamaChatBedrockApi interface and building blocks:

The LlamaChatBedrockApi supports the meta.llama3-8b-instruct-v1:0,meta.llama3-70b-instruct-v1:0,meta.llama2-13b-chat-v1 and meta.llama2-70b-chat-v1 models for both synchronous (e.g. chatCompletion()) and streaming (e.g. chatCompletionStream()) responses.

Here is a simple snippet how to use the api programmatically:

LlamaChatBedrockApi llamaChatApi = new LlamaChatBedrockApi(
        LlamaChatModel.LLAMA3_70B_INSTRUCT_V1.id(),
        Region.US_EAST_1.id(),
        Duration.ofMillis(1000L));

LlamaChatRequest request = LlamaChatRequest.builder("Hello, my name is")
		.temperature(0.9)
		.topP(0.9)
		.maxGenLen(20)
		.build();

LlamaChatResponse response = this.llamaChatApi.chatCompletion(this.request);

// Streaming response
Flux<LlamaChatResponse> responseStream = this.llamaChatApi.chatCompletionStream(this.request);
List<LlamaChatResponse> responses = this.responseStream.collectList().block();

Follow the LlamaChatBedrockApi.java's JavaDoc for further information.