Managing Chat Memory in Quarkus Langchain4j

Leave a comment

When I first starting using Quarkus Langchain4j I ran into some issues because I didn’t fully understand how Quarkus managed chat memory with Langchain4j. So, here’s what I’m gonna discuss:

  • How CDI bean scopes of AI Services effect chat memory
  • How chat memory is managed when @MemoryId is used as a parameter in Quarkus
  • How default chat memory id works in Quarkus
  • How chat memory can be leaked in some scenarios
  • How to write your own Default Memory Id Provider
  • How chat memory can effect your application

Chat Memory in Quarkus Langchain4j

Chat Memory is a history of the conversation you’ve had with an LLM. When using Quarkus Langchain4j (or just Langchain4j too) this chat history is automatically sent to the LLM when you interact with it. Think of chat memory as a chat session. Chat memory is a list of messages sent to and from the LLM. Each chat history is referenced by a unique ID and stored in a chat store. How the chat store stores memory depends on how you’ve built your app. Could be stored in memory or in a database for instance.

@MemoryId

With Quarkus Langchain4j you either use the @MemoryId annotation on a parameter to identify the chat history to use, or you let Quarkus provide this identifier by default. Let’s look at @MemoryId first:

@RegisterAiService
public interface MyChat {
     @SystemMessage("You are a nice assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
public interface AnotherChat {
     @SystemMessage("You are a mean assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

With @MemoryId, the application developer is providing the chat memory identifier to use. The chat history is a concatenation of any other AiService that used the same memory ID. For example

@Inject
MyChat myChat;

@Inject 
AnotherChat another;

public void call() {
    String id = "1234";
    String my = myChat.chat("Hello!", id);
    String another = another.chat("GoodBye", id);
}

There’s a couple of things to think about when sharing a @MemoryId between different AiServices (prompts).

Shared Chat History

With the call to AnotherChat.chat() in line #10, the chat history of the previous call in line #9 is also included because the same memory id is passed to both function calls.

Only 1 SystemMessage per history

Another thing about running this code is that the original SystemMessage from MyChat is removed from chat history and a new SystemMessage from AnotherChat is added. Only one SystemMessage is allowed per history.

Self Management of ID

The application developer is responsible for creating and managing the @MemoryId. You have to ensure that id is unique (easily done with something like a UUID), otherwise different chat sessions could corrupt the other. If chatting is a string of REST calls, then you’ll have to make sure the client is passing along this memory id between HTTP invocations.

Sometimes LLMs are sensitive to what is in chat history. In the case above, the chat history has a mix of chat messages from two different prompts. It also loses the context of MyChat in line #10 as the MyChat system message is removed. Usually not a big deal, but every once in a while you might see your LLM get confused.

Default Memory Id

If a @MemoryId is not specified, then Quarkus Langchain4j decides what the memory id is.

package com.acme;

@RegisterAiService
public interface MyChat {
    String chat(@UserMessage msg);
}

In vanilla, standalone Langchain4j, the default memory id is “default“. If you’re using langchain4j on its own, then you should not use default memory ids in multi-user/multi-session applications as chat history will be completely corrupted.

Quarkus Langchain4j does something different. A unique id is provided per CDI request scope. Request scope being the HTTP invocation, Kafka invocation, etc. Also, the interface and method name of the ai service is tacked on the end of this string. A “#” character is in the middle of the request id and the interface and method name. In other words, the format of the default memory id is:

<random-per-request-id>#<full qualified interface name>.<method-name>

So, for the above Java code, the default memory id for MyChat.chat would be:

@2342351#com.acme.MyChat.chat

There is a couple of things to think about with this default Quarkus implementation

Default Memory Id is tied to the request scope

Since the default id is generated as a unique id tied to the request scope, when your HTTP invocation finishes, the next time you invoke a ai service, a different default memory id will be used and thus you’ll have a completely new chat history.

Different chat history per AI Service method

Since the default id incorporates the ai service interface and method name, then there is a different chat history per ai service method and unlike the example in the @MemoryId section, chat history is not shared between prompts.

Using the Websocket extension gives you per session chat histories

If you use the websocket integration to implement your chat, then the default id is instead unique per session instead of per request. This means that default memory id is retained and meaningful for the entire chat session and you’ll retain chat history in between remote chat requests. The ai service interface name and method is still appended to the default memory id though!

Default memory ids vs. using @MessageId

So what should you use? Default memory ids or @MessageId? If you have a remote chat app where user interactions are in-between remote requests (i.e. HTTP/REST), then you should only use default memory ids for prompts that don’t want or need a complete chat history. In other words, only use default ids if the prompt doesn’t need chat memory. If you need a chat history in between remote requests, then you’ll need to use @MemoryId and manage ids for yourself.

The Websocket extension flips this. When using the WebSocket extension, since the default memory id is generated per websocket connection, you can have a real session and default memory ids are wonderful as you don’t have to manage memory ids in your application.

Memory Lifecycle tied to CDI bean scope

Ai services in Quarkus Langchain4j are CDI beans. If you do not specify a scope for this bean, it defaults to the @RequestScope. What a bean goes out of scope and is destroy an interesting thing happens. Any memory id referenced by the bean is wiped from the chat memory store and is gone forever. ANY memory id: default memory id or any id provided by @MemoryId parameters.

@RegisterAiService
@ApplicationScoped
public interface AppChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@ApplicationScoped
public interface SessionChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@RequestScoped
public interface RequestChat {
     String chat(@UserMessage msg, @MemoryId id);
}

So, for the above code, any memory referenced by the id parameter of RequestChat.chat() will be wiped at the end of the request scope (i.e. the HTTP request). For SessionChat, when the CDI session is destroy, and AppChat when the application shuts down.

Memory tied to the smallest scope used

So, what if within the same rest invocation, you use the same memory id with all three of the ai services above?

@Inject AppChat app;
@Inject RequestChat req;

@GET
public String restCall() {
     String memoryId = "1234";
     app.chat("hello", memoryId);
     req.chat("goodbye", memoryId);
}

So, in the restCall() method, even though AppChat is application scoped, since RequestChat uses the same memory id, “1234″, the chat history will be wiped from the chat memory store at the end of the REST request.

Default memory id can cause a leak

If you are relying on default memory ids and your ai service has a scope other than @RequestScoped, then you will leak chat memory and it will grow to the constraints of the memory store. For example

@ApplicationScoped
@RegisterAiService
public interface AppChat {
     String chat(@UserMessage msg);
}

Since Quarkus’s default memory id is generated for the current request scope each and every time AppChat.chat() is called within a different request scope. Chat memory entries in the chat memory store will grow until the application shuts down.

Never use @ApplicationScoped with default ids

So, the moral of the story is never used @ApplicationScoped with your ai services if you’re relying on default ids. If you are using the websocket extension, then you can use @SessionScoped, but otherwise make sure your ai services are @RequestScoped.

What bean scopes should you use?

For REST-based chat applications:

  • use the combination @ApplicationScoped and @MemoryId parameters to provide a chat history in between requests
  • Use @RequestScoped and default memory ids for prompts that don’t need a chat history
  • Do not share the same memory ids between @ApplicationScoped and @RequestScoped ai services
  • If using the Websocket extension, then use @SessionScoped on your ai services that require a chat history.

Chat Memory and your LLM

So, hopefully you understand how chat memory works with Quarkus Langchain4j now. Just remember:

  • Chat history is sent to your LLM with each request.
  • Limiting chat history can speed up LLM interactions and cost you less money!
  • Limiting chat history can focus your LLM.

All discussions for another blog! Cheers.

Workflow for releasing on nexus and graal binaries on github

Leave a comment

It took me a while to figure this out so I thought I’d share it.

I have Java quarkiverse project that releases a quarkus extension to nexus using the maven release plugin and builds CLI binaries with Graal after the fact for windows, macosx, and linux.  The binaries are uploaded to a Github release of the tagged release.

https://github.com/quarkiverse/quarkus-playpen/blob/0.9.1/.github/workflows/release.yml

https://github.com/quarkiverse/quarkus-playpen/releases/tag/0.9.1

QSON: New Java JSON parser for Quarkus

10 Comments

Quarkus has a new JSON parser and object mapper called QSON. It does bytecode generation for the Java classes you want to map to and from JSON around a small core library. I’m not going to get into details on how to use it, just visit the github page for more information.

I started this project because I noticed a huge startup time for Jackson as relative to the other components within Quarkus applications. IIRC it was taking about 20% of the boot time for a simple JAX-RS microservice. So the initial prototype was to see how much I could improve boot time and I was pleasantly surprised that the parser I implemented was a bit better than Jackson at runtime too!

The end result was that boot time improved about 20% for a simple Quarkus JAX-RS microservice. The runtime performance is also better in most instances too. Here are the numbers from a JMH benchmark I did:

Benchmark                           Mode  Cnt       Score   Error  Units
MyBenchmark.testParserAfterburner  thrpt    2  223630.276          ops/s
MyBenchmark.testParserJackson      thrpt    2  218748.065          ops/s
MyBenchmark.testParserQson         thrpt    2  251086.874          ops/s
MyBenchmark.testWriterAfterburner  thrpt    2  189243.175          ops/s
MyBenchmark.testWriterJackson      thrpt    2  168637.541          ops/s
MyBenchmark.testWriterQson         thrpt    2  177855.879          ops/s

These are runtime throughput numbers so the higher the better. Qson is better than regular Jackson and Jackson+Afterburner for json to object mapping (reading/parsing). For output, Qson is better than regular Jackson, but is a little behind Afterburner.

There’s still some work to do for Qson. One of the big things I need is a maven and gradle plugin to handle bytecode generation so that Qson can be used outside of Quarkus. We’ll also be adding more features to Qson like custom mappings. One thing to note though is that I won’t add features that hurt performance, increase memory footprint, or hurt boot time.

Over time, we’ll be integrating Qson as an option for any Quarkus extension that needs Json object mapping. So far, I’ve done integration with JAX-RS (Resteasy). Funqy is a prime candidate next.

Quarkus Serverless Strategy

Leave a comment

What is Serverless?

Serverless architectures allow us to scale our services from zero instances to many based on request and event traffic.  The advantages are clear.  If our services are idle most of the day, why should we have to pay a cloud provider for a full day or waste scarce resources in our company’s private cloud?  Why should we have to plan for peak load when our architecture can scale up for this peak load automatically based on volume of incoming traffic?  Serverless solves these types of problems.

Function as a Service (FaaS) is also part of the Serverless paradigm and focuses on exposing one remote function per deployment.  It is a more fine grain approach than Microservices, with the idea being that you can be more agile at getting functionality to market with even smaller deployment.  AWS Lambda and Azure Functions are an example of FaaS implementations.   FaaS frameworks like AWS Lambda and Azure Functions not only bring autoscaling to your services, but they’ve started to make it much easier to deploy your code to the cloud.  In a Lambda or Azure environment, developers don’t worry about the container anymore and can just focus on pushing their code.  FaaS environments have started to take the “Ops” out of “DevOps”.

Java’s Disadvantages

Unless you’re focusing solely on batch processing, one of the disadvantages of a Serverless architecture is the instance spin up time.  In other words, the cold-start latency.  If you need milliseconds to response to a client request, and your service spinup is measured in seconds, then you have a problem.

Java frameworks like Spring, Hibernate, Microprofile, Java EE and other technologies traditionally have been slow to boot and even microservices written in these technologies take seconds to start up.  This is because most of these frameworks do all their configration and metadata processing at boot time.  Spring and Hibernate scan classes for annotations.  Hibernate additionally builds SQL queries.  They do the same exact pre-processing every single time they are spun up.

Java also has a huge memory problem.  If FaaS is the way to go and you’re having many more fine grain deployments, then Java based deployments are going to take up a huge amount of memory.  Some cloud environments also charge based on the memory used compounding the issue.

Quarkus Perfect Match for Serverless

Quarkus’s core values are to drastically reduce memory footprint and boot time for Java based applications and services.  There are two of the biggest concerns when dealing with Serverless architectures.  Quarkus has moved most of the pre-processing that frameworks like Spring and Hibernate do from boot time to build time.  This approach has drastically reduced service spin up and memory footprint.  Quarkus has also smoothed out the rough edges with Graal so that you can compile your Java microservices into native executables which provide even faster boot time and a lesser memory footprint than running with the JVM.

Quarkus Serverless Strategy

The Quarkus team is tackling Serverless in a variety of ways:

  • Enhance existing Serverless Java stacks out of the box
  • Bring the Java ecosystem to existing Serverless Java stacks
  • Provide portability between Serverless stacks through traditional, mature, existing Java APIs
  • Provide a new Java function API (Funqy) that is portable across Serverless providers

Quarkus Enhances Lambda

By modifying your pom and adding a few Quarkus AWS Lambda integration dependencies like the Quarkus maven plugin, you can compile your AWS Lambda Java projects into a native binary that the AWS Lambda Custom Runtime can consume.  Watch your cold-start latency and memory footprint drop dramatically.  Try it out yourself.

The idea here is to bring Graal support to AWS Lambda through Quarkus in a seemless way.  We have smoothed out the rough edges Graal introduces for a variety of AWS SDKs.

Pull in Java Ecosystem

Another part of the Quarkus Serverless strategy is to pull in the Java ecosystem into existing Serverless stacks.  Through Quarkus your AWS Lambda classes can inject service components via Spring DI or CDI.  You’re not stuck with using whatever AWS SDK provides and can use the mature Java frameworks you’ve been using for years.

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.springframework.beans.factory.annotation.Autowired;

public class GreetingLambda implements RequestHandler<String, String> {

    @Autowired
    GreetingService service;

    @Override
    public String handleRequest(String name, Context context) {
        return service.greeting(name);
    }
}

Avoid Vendor Lock-in

Let’s face it.  If you use AWS, Azure, or any other cloud provider SDKs, then you are locked into that platform.  If AWS jacks up their prices down the road, you’re going to have a tough time moving off their platform.  Quarkus helps alleviate this issue by providing integration between REST and HTTP frameworks like JAX-RS, Spring MVC, Servlet, and Vertx.Web with AWS Lambda and Azure Functions.  Let REST and HTTP be your portable architecture between cloud providers and avoid vendor lock-in by using REST frameworks that you’ve been using for years.  Try it out with AWS Lambda or Azure Functions.

One great thing about using our JAX-RS or Spring MVC support with AWS Lambda or Azure Functions is that you’re not stuck with one REST endpoint per deployment.  You can deploy existing microservices as one Lambda deployment if you desire.  This alleviates some of the management issues that an explosion of function deployments might create down the road as you can aggregate as many endpoints as you want into one Lambda deployment.

Funqy Cross Platform Functions

The final piece of our Quarkus Serverless Strategy is a new cross-platform function API called Funqy.  Quarkus Funqy is a simple API that allows you to write functions that are usable in a variety of FaaS environment:  AWS Lambda, Azure Functions, Knative Events, and more.

public class MyClass {
   @Funq
   public MyOutput myFunction(MyInput input) {
     ...
   }
}

Funqy is still in development.  We’ll have a follow up blog as soon as it is ready to release.

More to come

Quarkus will continue to revise and expand our Serverless Strategy.  Come try out our integrations and new APIs.

Quarkus unifies reactive and imperative for REST

Leave a comment

The latest release of Quarkus has unified Vert.x Web, Servlet (Undertow), and JAX-RS (Resteasy) under one I/O abstraction.  Specifically, Servlet and JAX-RS were written on top of Vert.x.

What this means for you is that if you are using Vert.x, Servlet, and/or JAX-RS in one application they will all share the same io and worker thread pools.  Scarce resources are now reused.  Because everything is unified under Vert.x, there’s a lot of future optimizations and features that we can bring to Resteasy and the JAX-RS coding model.  More info on this coming soon!

 

Keycloak 1.1.Beta 1 Released: SAML, Clustering, Tomcat 7

Leave a comment

(Copied from Stian’s announcement) Pretty big feature release:
  • SAML 2.0 support.  Keycloak already supports OpenID Connect, but with this release we’re also introducing support for SAML 2.0.  We did this by pulling in and building on top of Picketlink’s SAML libraries.
  • Vastly improved clustering support.  We’ve also significantly improved our clustering support, for the server and application adapters. The server can now be configured to use an invalidation cache for realm meta-data and user profiles, while user-sessions can be stored in a distributed cache allowing for both increased scalability and availability. Application adapters can be configured for either sticky-session or stateless if sticky-sessions are not available. We’ve also added support for nodes to dynamically register with Keycloak to receive for example logout notifications.
  • Adapter multi-tenancy support.  Thanks to Juraci Paixão Kröhling we now have multi-tenancy support in application adapters. His contribution makes it easy to use more than one realm for a single application. It’s up to you to decide which realm is used for a request, but this could for example be depending on domain name or context-path. For anyone interested in this feature there’s a simple example that shows how to get started.
  • Tomcat 7 Adapter.  A while back Davide Ungari contributed a Tomcat 7 application adapter for Keycloak, but we haven’t had time to document, test and make it a supported adapter until now.
What’s next?
The next release of Keycloak should see the introduction of more application adapters, with support for JBoss BRMS, JBoss Fuse, UberFire, Hawt.io and Jetty.
For a complete list of all features and fixes for this release check out JIRA.
I’d like to especially thank all external contributors, please keep contributing! For everyone wanting to contribute Keycloak don’t hesitate, it’s easy to get started and we’re here to help if you need any pointers.

Resteasy 3.0.9 Released

Leave a comment

I really want to thank Ron Sigal, Weinan Li, and the rest of the Resteasy community for having my back the last 5 months while I was focused on other things.  Thanks for your hard work and patience.  3.0.9.Final is a maintenance release.  There are a few minor migration notes you should read before you upgrade, but most applications shouldn’t be affected.  We’ll try and do another maintenance release in like 6-8 weeks.  Check out resteasy.jboss.org for download links, jira release notes, and documentation.

Keycloak 1.0 Final Released

1 Comment

After 1 year of hard work, the team is proud to release our first final 1.0 release of Keycloak.  We’ve stabilized our database schemas, improved performance, and refactored our SPIs and you should be good to go!  I don’t want to list all the features, but check out our project website at http://keycloak.org for more information.  You can find our download links there as well as screen cast tutorials on our documentation page.

What’s Next?

Keycloak 1.1 will be our integration release where we start bringing Keycloak to different protocols, projects, and environments.  Here’s a priority list of what we’re tackling

  • SAML 2.0 – by merging with Picketlink IDP
  • Uberfire/BRMS adapter
  • Fuse FSW adapter
  • EAP 6.x and Wildfly console integration
  • Tomcat 7 adapter
  • …More planned, but we’ll see how fast we can move before we announce anymore

In parallel, we hope to look into a few new features:

  • Internationalization
  • TOTP Improvements like allowing multiple token generators
  • IP Filtering

Keycloak 1.0 RC 1 Released

Leave a comment

Many bugs fixes and cleanup.  Not much for features although we did add a ton of tooltips to the admin console.  We’re getting very close to a final release and are still on schedule to release 2nd week on September.

See keycloak.org for links to download and documentation.

Keycloak Beta 4 Released

2 Comments

After a summer of multiple vacations from various team members, we’re finally ready to release Keycloak 1.0 Beta 4.  There’s not a lot of new features in the release because we focused mainly on performance, creating new SPIs, refactoring code, improving usability, and lastly fixing bugs. 64 issues completed.  As usually go to the main keycloak.org page to find download links and to browse our documentation, release notes, or view our screencast tutorials.  Here are some of the highlights of the release:

  • Server side memory cache for all UI pages.
  • Cache-control settings for UI pages
  • Server side cache for all backend metadata: realms, applications, and users.
  • In-memory implementation for user sessions
  • New Federation SPI.  Gives you a lot of flexibility to federation external stores into Keycloak
  • Improved LDAP/Active Directory support
  • Token validation REST API
  • Support for HttpServletRequest.logout()
  • Lots and lots of bugs fixes and minor improvements

You should see a big performance increase with this release as everything is cachable in memory and the database can be fully bypassed.

1.0 Final is on the way!

What’s next for Keycloak?  This month we will be focusing on resolving the remaining issues logged in Jira, improving our test coverage, and updating our documentation and screencasts.  No new major features.  We’ll have a RC release around 3rd week of August, then our first Final release 2nd week of September!

 

Older Entries