Managing Chat Memory in Quarkus Langchain4j

Leave a comment

When I first starting using Quarkus Langchain4j I ran into some issues because I didn’t fully understand how Quarkus managed chat memory with Langchain4j. So, here’s what I’m gonna discuss:

  • How CDI bean scopes of AI Services effect chat memory
  • How chat memory is managed when @MemoryId is used as a parameter in Quarkus
  • How default chat memory id works in Quarkus
  • How chat memory can be leaked in some scenarios
  • How to write your own Default Memory Id Provider
  • How chat memory can effect your application

Chat Memory in Quarkus Langchain4j

Chat Memory is a history of the conversation you’ve had with an LLM. When using Quarkus Langchain4j (or just Langchain4j too) this chat history is automatically sent to the LLM when you interact with it. Think of chat memory as a chat session. Chat memory is a list of messages sent to and from the LLM. Each chat history is referenced by a unique ID and stored in a chat store. How the chat store stores memory depends on how you’ve built your app. Could be stored in memory or in a database for instance.

@MemoryId

With Quarkus Langchain4j you either use the @MemoryId annotation on a parameter to identify the chat history to use, or you let Quarkus provide this identifier by default. Let’s look at @MemoryId first:

@RegisterAiService
public interface MyChat {
     @SystemMessage("You are a nice assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
public interface AnotherChat {
     @SystemMessage("You are a mean assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

With @MemoryId, the application developer is providing the chat memory identifier to use. The chat history is a concatenation of any other AiService that used the same memory ID. For example

@Inject
MyChat myChat;

@Inject 
AnotherChat another;

public void call() {
    String id = "1234";
    String my = myChat.chat("Hello!", id);
    String another = another.chat("GoodBye", id);
}

There’s a couple of things to think about when sharing a @MemoryId between different AiServices (prompts).

Shared Chat History

With the call to AnotherChat.chat() in line #10, the chat history of the previous call in line #9 is also included because the same memory id is passed to both function calls.

Only 1 SystemMessage per history

Another thing about running this code is that the original SystemMessage from MyChat is removed from chat history and a new SystemMessage from AnotherChat is added. Only one SystemMessage is allowed per history.

Self Management of ID

The application developer is responsible for creating and managing the @MemoryId. You have to ensure that id is unique (easily done with something like a UUID), otherwise different chat sessions could corrupt the other. If chatting is a string of REST calls, then you’ll have to make sure the client is passing along this memory id between HTTP invocations.

Sometimes LLMs are sensitive to what is in chat history. In the case above, the chat history has a mix of chat messages from two different prompts. It also loses the context of MyChat in line #10 as the MyChat system message is removed. Usually not a big deal, but every once in a while you might see your LLM get confused.

Default Memory Id

If a @MemoryId is not specified, then Quarkus Langchain4j decides what the memory id is.

package com.acme;

@RegisterAiService
public interface MyChat {
    String chat(@UserMessage msg);
}

In vanilla, standalone Langchain4j, the default memory id is “default“. If you’re using langchain4j on its own, then you should not use default memory ids in multi-user/multi-session applications as chat history will be completely corrupted.

Quarkus Langchain4j does something different. A unique id is provided per CDI request scope. Request scope being the HTTP invocation, Kafka invocation, etc. Also, the interface and method name of the ai service is tacked on the end of this string. A “#” character is in the middle of the request id and the interface and method name. In other words, the format of the default memory id is:

<random-per-request-id>#<full qualified interface name>.<method-name>

So, for the above Java code, the default memory id for MyChat.chat would be:

@2342351#com.acme.MyChat.chat

There is a couple of things to think about with this default Quarkus implementation

Default Memory Id is tied to the request scope

Since the default id is generated as a unique id tied to the request scope, when your HTTP invocation finishes, the next time you invoke a ai service, a different default memory id will be used and thus you’ll have a completely new chat history.

Different chat history per AI Service method

Since the default id incorporates the ai service interface and method name, then there is a different chat history per ai service method and unlike the example in the @MemoryId section, chat history is not shared between prompts.

Using the Websocket extension gives you per session chat histories

If you use the websocket integration to implement your chat, then the default id is instead unique per session instead of per request. This means that default memory id is retained and meaningful for the entire chat session and you’ll retain chat history in between remote chat requests. The ai service interface name and method is still appended to the default memory id though!

Default memory ids vs. using @MessageId

So what should you use? Default memory ids or @MessageId? If you have a remote chat app where user interactions are in-between remote requests (i.e. HTTP/REST), then you should only use default memory ids for prompts that don’t want or need a complete chat history. In other words, only use default ids if the prompt doesn’t need chat memory. If you need a chat history in between remote requests, then you’ll need to use @MemoryId and manage ids for yourself.

The Websocket extension flips this. When using the WebSocket extension, since the default memory id is generated per websocket connection, you can have a real session and default memory ids are wonderful as you don’t have to manage memory ids in your application.

Memory Lifecycle tied to CDI bean scope

Ai services in Quarkus Langchain4j are CDI beans. If you do not specify a scope for this bean, it defaults to the @RequestScope. What a bean goes out of scope and is destroy an interesting thing happens. Any memory id referenced by the bean is wiped from the chat memory store and is gone forever. ANY memory id: default memory id or any id provided by @MemoryId parameters.

@RegisterAiService
@ApplicationScoped
public interface AppChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@ApplicationScoped
public interface SessionChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@RequestScoped
public interface RequestChat {
     String chat(@UserMessage msg, @MemoryId id);
}

So, for the above code, any memory referenced by the id parameter of RequestChat.chat() will be wiped at the end of the request scope (i.e. the HTTP request). For SessionChat, when the CDI session is destroy, and AppChat when the application shuts down.

Memory tied to the smallest scope used

So, what if within the same rest invocation, you use the same memory id with all three of the ai services above?

@Inject AppChat app;
@Inject RequestChat req;

@GET
public String restCall() {
     String memoryId = "1234";
     app.chat("hello", memoryId);
     req.chat("goodbye", memoryId);
}

So, in the restCall() method, even though AppChat is application scoped, since RequestChat uses the same memory id, “1234″, the chat history will be wiped from the chat memory store at the end of the REST request.

Default memory id can cause a leak

If you are relying on default memory ids and your ai service has a scope other than @RequestScoped, then you will leak chat memory and it will grow to the constraints of the memory store. For example

@ApplicationScoped
@RegisterAiService
public interface AppChat {
     String chat(@UserMessage msg);
}

Since Quarkus’s default memory id is generated for the current request scope each and every time AppChat.chat() is called within a different request scope. Chat memory entries in the chat memory store will grow until the application shuts down.

Never use @ApplicationScoped with default ids

So, the moral of the story is never used @ApplicationScoped with your ai services if you’re relying on default ids. If you are using the websocket extension, then you can use @SessionScoped, but otherwise make sure your ai services are @RequestScoped.

What bean scopes should you use?

For REST-based chat applications:

  • use the combination @ApplicationScoped and @MemoryId parameters to provide a chat history in between requests
  • Use @RequestScoped and default memory ids for prompts that don’t need a chat history
  • Do not share the same memory ids between @ApplicationScoped and @RequestScoped ai services
  • If using the Websocket extension, then use @SessionScoped on your ai services that require a chat history.

Chat Memory and your LLM

So, hopefully you understand how chat memory works with Quarkus Langchain4j now. Just remember:

  • Chat history is sent to your LLM with each request.
  • Limiting chat history can speed up LLM interactions and cost you less money!
  • Limiting chat history can focus your LLM.

All discussions for another blog! Cheers.

Workflow for releasing on nexus and graal binaries on github

Leave a comment

It took me a while to figure this out so I thought I’d share it.

I have Java quarkiverse project that releases a quarkus extension to nexus using the maven release plugin and builds CLI binaries with Graal after the fact for windows, macosx, and linux.  The binaries are uploaded to a Github release of the tagged release.

https://github.com/quarkiverse/quarkus-playpen/blob/0.9.1/.github/workflows/release.yml

https://github.com/quarkiverse/quarkus-playpen/releases/tag/0.9.1

Azure Functions + Quarkus HTTP Native/Graal

Leave a comment

You can take an existing Quarkus HTTP application and deploy it as a native executable (Graal build) to Azure Functions using the Custom Handler runtime provided with Azure Functions.

Prerequisites:

You’ll need to configure a root path in application.properties:

quarkus.http.root-path=/api/{functionName}

Replace {functionName} with whatever you want to call the function you’re creating for the Quarkus HTTP application.

Next, rebuild your application to create a native executable targeting linux 64bit x86 runtime.

$ mvn package -Pnative -DskipTests=true \
         -Dquarkus.native.container-build=true \
         -Dquarkus.native.builder-image=quay.io/quarkus/ubi-quarkus-mandrel-builder-image:jdk-17


After you do this create a directory in the root of your project and create function deployment descriptors. Specify anything you want for the name of the function

$ mkdir my-app
$ cd my-app
$ func new --language Custom --template HttpTrigger  \
                   --name my-func --authlevel anonymous

The func command will generate a custom handler host.json file and an http trigger function.json for you for the function named my-func.

Next copy your application’s native executable to your app directory:

$ cp ../target/code-with-quarkus-1.0.0-SNAPSHOT-runner app

Next you’ll need to edit my-func/function.json. Add a route that matches to all paths.

{
  "bindings": [
    {
      "authLevel": "Anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get",
        "post"
      ],
      "route": "{*path}"
    },
    {
      "type": "http",
      "direction": "out",
      "name": "res"
    }
  ]
}

Without specifying a route, you will not be able to get requests to rest endpoints defined in your application.

Next you need to edit your host.json file to specify some configuration. You’ll need to enable http forwarding (enableForwardingHttpRequest), specify the executable name of your application (defaultExecutablePath), and define the http port for Quarkus to bind to by specifying a system property to pass to the application as an argument (arguments. Here’s what it should look like in the end:

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[3.*, 4.0.0)"
  },
  "customHandler": {
    "enableForwardingHttpRequest": true,
    "description": {
    "defaultExecutablePath": "app",
    "workingDirectory": "",
	"arguments": [
       "-Dquarkus.http.port=${FUNCTIONS_CUSTOMHANDLER_PORT:8080}"
      ]
    }
  }
}

Test Locally

To test locally, use the func start command. Make sure you are in the my-app directory you created earlier!

$ func start


# Azure Functions Core Tools
# Core Tools Version:       4.0.5198 Commit hash: N/A  (64-bit)
#Function Runtime Version: 4.21.1.20667
#
#
# Functions:
#
#	my-func: [GET,POST] http://localhost:7071/api/my-func/{*path}
#
# For detailed output, run func with --verbose flag.
# [2023-08-10T19:08:59.190Z] __  ____  __  _____   ___  __ ____  ______ 
# [2023-08-10T19:08:59.191Z]  --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
# [2023-08-10T19:08:59.191Z]  -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
# [2023-08-10T19:08:59.191Z] --\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,187 INFO  [io.quarkus] (main) code-with-quarkus-1.0.0-SNAPSHOT native (powered by Quarkus 999-SNAPSHOT) started in 0.043s. Listening on: http://0.0.0.0:33917
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,188 INFO  [io.quarkus] (main) Profile prod activated. 
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,188 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy, smallrye-context-propagation, vertx]
# [2023-08-10T19:08:59.209Z] Worker process started and initialized.
# [2023-08-10T19:09:04.023Z] Host lock lease acquired by instance ID '000000000000000000000000747C0C10'.

To test go to http://localhost:7071/api/my-func/hello. The hello part of the path can be replaced with your REST API.

Deployment

To deploy you’ll need to create an Azure Group and Function Application.

# login
$ az login

# list subscriptions
$ az account list -o table

# set active subscription.  You do not have to do this if you only have one subscription
$ az account set --subscription <SUBSCRIPTION_ID>

# create an Azure Resource Group 
az group create -n rg-quarkus-functions \
  -l eastus

# create an Azure Storage Account (required for Azure Functions App)
az storage account create -n sargquarkusfunctions2023 \
  -g rg-quarkus-functions \
  -l eastus

# create an Azure Functions App
az functionapp create -n my-app-quarkus \
  -g rg-quarkus-functions \
  --consumption-plan-location eastus\
  --os-type Linux \
  --runtime custom \
  --functions-version 4 \
  --storage-account sargquarkusfunctions2023 

Make sure that the os-type is Linux!!! Note that your Function Application name must be unique and may collide with others.

Now you can deploy your application. Again, make sure you are in the my-app directory!

$ func azure functionapp publish my-app-quarkus

# Getting site publishing info...
# Uploading package...
# Uploading 15.32 MB 
# Upload completed successfully.
# Deployment completed successfully.
# Syncing triggers...
# Functions in my-app-quarkus:
#    my-func - [httpTrigger]
#         Invoke url: https://my-app-quarkus.azurewebsites.net/api/{*path}

A successful deployment will tell you how to access the app. Just remember the root context will always be /api/{functionName}.

QSON: New Java JSON parser for Quarkus

10 Comments

Quarkus has a new JSON parser and object mapper called QSON. It does bytecode generation for the Java classes you want to map to and from JSON around a small core library. I’m not going to get into details on how to use it, just visit the github page for more information.

I started this project because I noticed a huge startup time for Jackson as relative to the other components within Quarkus applications. IIRC it was taking about 20% of the boot time for a simple JAX-RS microservice. So the initial prototype was to see how much I could improve boot time and I was pleasantly surprised that the parser I implemented was a bit better than Jackson at runtime too!

The end result was that boot time improved about 20% for a simple Quarkus JAX-RS microservice. The runtime performance is also better in most instances too. Here are the numbers from a JMH benchmark I did:

Benchmark                           Mode  Cnt       Score   Error  Units
MyBenchmark.testParserAfterburner  thrpt    2  223630.276          ops/s
MyBenchmark.testParserJackson      thrpt    2  218748.065          ops/s
MyBenchmark.testParserQson         thrpt    2  251086.874          ops/s
MyBenchmark.testWriterAfterburner  thrpt    2  189243.175          ops/s
MyBenchmark.testWriterJackson      thrpt    2  168637.541          ops/s
MyBenchmark.testWriterQson         thrpt    2  177855.879          ops/s

These are runtime throughput numbers so the higher the better. Qson is better than regular Jackson and Jackson+Afterburner for json to object mapping (reading/parsing). For output, Qson is better than regular Jackson, but is a little behind Afterburner.

There’s still some work to do for Qson. One of the big things I need is a maven and gradle plugin to handle bytecode generation so that Qson can be used outside of Quarkus. We’ll also be adding more features to Qson like custom mappings. One thing to note though is that I won’t add features that hurt performance, increase memory footprint, or hurt boot time.

Over time, we’ll be integrating Qson as an option for any Quarkus extension that needs Json object mapping. So far, I’ve done integration with JAX-RS (Resteasy). Funqy is a prime candidate next.

Quarkus Funqy: Portable Function API

Leave a comment

Quarkus Funqy is a new FaaS API that is portable across cloud runtimes like AWS Lambda, Azure Functions, Knative Events, and Google Cloud Functions. Or, if you’re deploying to a traditional environment, Funqy functions can work standalone as well.

public class MyClass {
   @Funq
   public Greeting greet(String name) {
     ...
   }
}

The idea of Funqy is simple. You write one Java method that has one optional input parameter and that returns optional output. Either primitives or POJOs are supported as input and output types. Under the covers, Funqy integrates with whatever plumbing is needed depending on your deployment environment. Funqy classes are Quarkus components and can accept injections using Spring DI or CDI annotations (through Arc).

Funqy Bindings

Funqy can be deployed in a number of environments.

Motivations for Funqy

Why did we create Funqy? Part of our Quarkus Serverless Strategy was to make popular REST frameworks available for use in environments like AWS Lambda. When the Quarkus team was asked to integrate with Cloud Events, we felt like traditional REST frameworks didn’t quite fit even though Cloud Events has an HTTP binding. Funqy gave us an opportunity to not only unify under one API for FaaS development, but to greatly simplify the development API and to create a framework that was written for and optimized for the Quarkus platform.

REST vs Funqy

The author of this blog loves JAX-RS, was the founder of Resteasy, and even wrote a book on JAX-RS. REST is still the preferred architecture and REST over HTTP is still an ubiquitous way of writing service APIs. The thing is though, if you go out into the wild you’ll find that many application developers don’t follow REST principles. HTTP and REST frameworks are pretty much used as an RPC mechanism. Cool features in HTTP like cache-control and content negotiation are rarely used and JSON is the de facto representation exchanged between client and server.

If all this is true, you don’t need 80% of the features that a spec like JAX-RS provides. Nor do you want the overhead of supporting those unused features in your runtime. Since Funqy is a small, tightly constrained API, all the overhead of supporting unused REST features are ripped out. If you look at the implementation, its a very thin integration layer over the ridiculously fast Vert.x Web runtime. Each Funqy binding is a handful of classes. Funqy’s overhead is purely marshalling.

Who knows what the future holds for Funqy. It’s part of a quest to reduce the complexity and overhead of Java development as well as provide something that is portable to many environments so that you aren’t locked into a specific cloud vendor API. Enjoy everybody!

Quarkus Serverless Strategy

Leave a comment

What is Serverless?

Serverless architectures allow us to scale our services from zero instances to many based on request and event traffic.  The advantages are clear.  If our services are idle most of the day, why should we have to pay a cloud provider for a full day or waste scarce resources in our company’s private cloud?  Why should we have to plan for peak load when our architecture can scale up for this peak load automatically based on volume of incoming traffic?  Serverless solves these types of problems.

Function as a Service (FaaS) is also part of the Serverless paradigm and focuses on exposing one remote function per deployment.  It is a more fine grain approach than Microservices, with the idea being that you can be more agile at getting functionality to market with even smaller deployment.  AWS Lambda and Azure Functions are an example of FaaS implementations.   FaaS frameworks like AWS Lambda and Azure Functions not only bring autoscaling to your services, but they’ve started to make it much easier to deploy your code to the cloud.  In a Lambda or Azure environment, developers don’t worry about the container anymore and can just focus on pushing their code.  FaaS environments have started to take the “Ops” out of “DevOps”.

Java’s Disadvantages

Unless you’re focusing solely on batch processing, one of the disadvantages of a Serverless architecture is the instance spin up time.  In other words, the cold-start latency.  If you need milliseconds to response to a client request, and your service spinup is measured in seconds, then you have a problem.

Java frameworks like Spring, Hibernate, Microprofile, Java EE and other technologies traditionally have been slow to boot and even microservices written in these technologies take seconds to start up.  This is because most of these frameworks do all their configration and metadata processing at boot time.  Spring and Hibernate scan classes for annotations.  Hibernate additionally builds SQL queries.  They do the same exact pre-processing every single time they are spun up.

Java also has a huge memory problem.  If FaaS is the way to go and you’re having many more fine grain deployments, then Java based deployments are going to take up a huge amount of memory.  Some cloud environments also charge based on the memory used compounding the issue.

Quarkus Perfect Match for Serverless

Quarkus’s core values are to drastically reduce memory footprint and boot time for Java based applications and services.  There are two of the biggest concerns when dealing with Serverless architectures.  Quarkus has moved most of the pre-processing that frameworks like Spring and Hibernate do from boot time to build time.  This approach has drastically reduced service spin up and memory footprint.  Quarkus has also smoothed out the rough edges with Graal so that you can compile your Java microservices into native executables which provide even faster boot time and a lesser memory footprint than running with the JVM.

Quarkus Serverless Strategy

The Quarkus team is tackling Serverless in a variety of ways:

  • Enhance existing Serverless Java stacks out of the box
  • Bring the Java ecosystem to existing Serverless Java stacks
  • Provide portability between Serverless stacks through traditional, mature, existing Java APIs
  • Provide a new Java function API (Funqy) that is portable across Serverless providers

Quarkus Enhances Lambda

By modifying your pom and adding a few Quarkus AWS Lambda integration dependencies like the Quarkus maven plugin, you can compile your AWS Lambda Java projects into a native binary that the AWS Lambda Custom Runtime can consume.  Watch your cold-start latency and memory footprint drop dramatically.  Try it out yourself.

The idea here is to bring Graal support to AWS Lambda through Quarkus in a seemless way.  We have smoothed out the rough edges Graal introduces for a variety of AWS SDKs.

Pull in Java Ecosystem

Another part of the Quarkus Serverless strategy is to pull in the Java ecosystem into existing Serverless stacks.  Through Quarkus your AWS Lambda classes can inject service components via Spring DI or CDI.  You’re not stuck with using whatever AWS SDK provides and can use the mature Java frameworks you’ve been using for years.

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.springframework.beans.factory.annotation.Autowired;

public class GreetingLambda implements RequestHandler<String, String> {

    @Autowired
    GreetingService service;

    @Override
    public String handleRequest(String name, Context context) {
        return service.greeting(name);
    }
}

Avoid Vendor Lock-in

Let’s face it.  If you use AWS, Azure, or any other cloud provider SDKs, then you are locked into that platform.  If AWS jacks up their prices down the road, you’re going to have a tough time moving off their platform.  Quarkus helps alleviate this issue by providing integration between REST and HTTP frameworks like JAX-RS, Spring MVC, Servlet, and Vertx.Web with AWS Lambda and Azure Functions.  Let REST and HTTP be your portable architecture between cloud providers and avoid vendor lock-in by using REST frameworks that you’ve been using for years.  Try it out with AWS Lambda or Azure Functions.

One great thing about using our JAX-RS or Spring MVC support with AWS Lambda or Azure Functions is that you’re not stuck with one REST endpoint per deployment.  You can deploy existing microservices as one Lambda deployment if you desire.  This alleviates some of the management issues that an explosion of function deployments might create down the road as you can aggregate as many endpoints as you want into one Lambda deployment.

Funqy Cross Platform Functions

The final piece of our Quarkus Serverless Strategy is a new cross-platform function API called Funqy.  Quarkus Funqy is a simple API that allows you to write functions that are usable in a variety of FaaS environment:  AWS Lambda, Azure Functions, Knative Events, and more.

public class MyClass {
   @Funq
   public MyOutput myFunction(MyInput input) {
     ...
   }
}

Funqy is still in development.  We’ll have a follow up blog as soon as it is ready to release.

More to come

Quarkus will continue to revise and expand our Serverless Strategy.  Come try out our integrations and new APIs.

Quarkus unifies reactive and imperative for REST

Leave a comment

The latest release of Quarkus has unified Vert.x Web, Servlet (Undertow), and JAX-RS (Resteasy) under one I/O abstraction.  Specifically, Servlet and JAX-RS were written on top of Vert.x.

What this means for you is that if you are using Vert.x, Servlet, and/or JAX-RS in one application they will all share the same io and worker thread pools.  Scarce resources are now reused.  Because everything is unified under Vert.x, there’s a lot of future optimizations and features that we can bring to Resteasy and the JAX-RS coding model.  More info on this coming soon!