Managing Chat Memory in Quarkus Langchain4j

November 25, 2025

billburke java, langchain4j, quarkus ai, artificial-intelligence, chatgpt, llm Leave a comment

When I first starting using Quarkus Langchain4j I ran into some issues because I didn’t fully understand how Quarkus managed chat memory with Langchain4j. So, here’s what I’m gonna discuss:

How CDI bean scopes of AI Services effect chat memory
How chat memory is managed when @MemoryId is used as a parameter in Quarkus
How default chat memory id works in Quarkus
How chat memory can be leaked in some scenarios
How to write your own Default Memory Id Provider
How chat memory can effect your application

Chat Memory in Quarkus Langchain4j

Chat Memory is a history of the conversation you’ve had with an LLM. When using Quarkus Langchain4j (or just Langchain4j too) this chat history is automatically sent to the LLM when you interact with it. Think of chat memory as a chat session. Chat memory is a list of messages sent to and from the LLM. Each chat history is referenced by a unique ID and stored in a chat store. How the chat store stores memory depends on how you’ve built your app. Could be stored in memory or in a database for instance.

@MemoryId

With Quarkus Langchain4j you either use the @MemoryId annotation on a parameter to identify the chat history to use, or you let Quarkus provide this identifier by default. Let’s look at @MemoryId first:

@RegisterAiService
public interface MyChat {
     @SystemMessage("You are a nice assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
public interface AnotherChat {
     @SystemMessage("You are a mean assistant")
     String chat(@UserMessage msg, @MemoryId id);
}

With @MemoryId, the application developer is providing the chat memory identifier to use. The chat history is a concatenation of any other AiService that used the same memory ID. For example

@Inject
MyChat myChat;

@Inject 
AnotherChat another;

public void call() {
    String id = "1234";
    String my = myChat.chat("Hello!", id);
    String another = another.chat("GoodBye", id);
}

There’s a couple of things to think about when sharing a @MemoryId between different AiServices (prompts).

Shared Chat History

With the call to AnotherChat.chat() in line #10, the chat history of the previous call in line #9 is also included because the same memory id is passed to both function calls.

Only 1 SystemMessage per history

Another thing about running this code is that the original SystemMessage from MyChat is removed from chat history and a new SystemMessage from AnotherChat is added. Only one SystemMessage is allowed per history.

Self Management of ID

The application developer is responsible for creating and managing the @MemoryId. You have to ensure that id is unique (easily done with something like a UUID), otherwise different chat sessions could corrupt the other. If chatting is a string of REST calls, then you’ll have to make sure the client is passing along this memory id between HTTP invocations.

Sometimes LLMs are sensitive to what is in chat history. In the case above, the chat history has a mix of chat messages from two different prompts. It also loses the context of MyChat in line #10 as the MyChat system message is removed. Usually not a big deal, but every once in a while you might see your LLM get confused.

Default Memory Id

If a @MemoryId is not specified, then Quarkus Langchain4j decides what the memory id is.

package com.acme;

@RegisterAiService
public interface MyChat {
    String chat(@UserMessage msg);
}

In vanilla, standalone Langchain4j, the default memory id is “default“. If you’re using langchain4j on its own, then you should not use default memory ids in multi-user/multi-session applications as chat history will be completely corrupted.

Quarkus Langchain4j does something different. A unique id is provided per CDI request scope. Request scope being the HTTP invocation, Kafka invocation, etc. Also, the interface and method name of the ai service is tacked on the end of this string. A “#” character is in the middle of the request id and the interface and method name. In other words, the format of the default memory id is:

<random-per-request-id>#<full qualified interface name>.<method-name>

So, for the above Java code, the default memory id for MyChat.chat would be:

@2342351#com.acme.MyChat.chat

There is a couple of things to think about with this default Quarkus implementation

Default Memory Id is tied to the request scope

Since the default id is generated as a unique id tied to the request scope, when your HTTP invocation finishes, the next time you invoke a ai service, a different default memory id will be used and thus you’ll have a completely new chat history.

Different chat history per AI Service method

Since the default id incorporates the ai service interface and method name, then there is a different chat history per ai service method and unlike the example in the @MemoryId section, chat history is not shared between prompts.

Using the Websocket extension gives you per session chat histories

If you use the websocket integration to implement your chat, then the default id is instead unique per session instead of per request. This means that default memory id is retained and meaningful for the entire chat session and you’ll retain chat history in between remote chat requests. The ai service interface name and method is still appended to the default memory id though!

Default memory ids vs. using @MessageId

So what should you use? Default memory ids or @MessageId? If you have a remote chat app where user interactions are in-between remote requests (i.e. HTTP/REST), then you should only use default memory ids for prompts that don’t want or need a complete chat history. In other words, only use default ids if the prompt doesn’t need chat memory. If you need a chat history in between remote requests, then you’ll need to use @MemoryId and manage ids for yourself.

The Websocket extension flips this. When using the WebSocket extension, since the default memory id is generated per websocket connection, you can have a real session and default memory ids are wonderful as you don’t have to manage memory ids in your application.

Memory Lifecycle tied to CDI bean scope

Ai services in Quarkus Langchain4j are CDI beans. If you do not specify a scope for this bean, it defaults to the @RequestScope. What a bean goes out of scope and is destroy an interesting thing happens. Any memory id referenced by the bean is wiped from the chat memory store and is gone forever. ANY memory id: default memory id or any id provided by @MemoryId parameters.

@RegisterAiService
@ApplicationScoped
public interface AppChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@ApplicationScoped
public interface SessionChat {
      String chat(@UserMessage msg, @MemoryId id);
}

@RegisterAiService
@RequestScoped
public interface RequestChat {
     String chat(@UserMessage msg, @MemoryId id);
}

So, for the above code, any memory referenced by the id parameter of RequestChat.chat() will be wiped at the end of the request scope (i.e. the HTTP request). For SessionChat, when the CDI session is destroy, and AppChat when the application shuts down.

Memory tied to the smallest scope used

So, what if within the same rest invocation, you use the same memory id with all three of the ai services above?

@Inject AppChat app;
@Inject RequestChat req;

@GET
public String restCall() {
     String memoryId = "1234";
     app.chat("hello", memoryId);
     req.chat("goodbye", memoryId);
}

So, in the restCall() method, even though AppChat is application scoped, since RequestChat uses the same memory id, “1234″, the chat history will be wiped from the chat memory store at the end of the REST request.

Default memory id can cause a leak

If you are relying on default memory ids and your ai service has a scope other than @RequestScoped, then you will leak chat memory and it will grow to the constraints of the memory store. For example

@ApplicationScoped
@RegisterAiService
public interface AppChat {
     String chat(@UserMessage msg);
}

Since Quarkus’s default memory id is generated for the current request scope each and every time AppChat.chat() is called within a different request scope. Chat memory entries in the chat memory store will grow until the application shuts down.

Never use @ApplicationScoped with default ids

So, the moral of the story is never used @ApplicationScoped with your ai services if you’re relying on default ids. If you are using the websocket extension, then you can use @SessionScoped, but otherwise make sure your ai services are @RequestScoped.

What bean scopes should you use?

For REST-based chat applications:

use the combination @ApplicationScoped and @MemoryId parameters to provide a chat history in between requests
Use @RequestScoped and default memory ids for prompts that don’t need a chat history
Do not share the same memory ids between @ApplicationScoped and @RequestScoped ai services
If using the Websocket extension, then use @SessionScoped on your ai services that require a chat history.

Chat Memory and your LLM

So, hopefully you understand how chat memory works with Quarkus Langchain4j now. Just remember:

Chat history is sent to your LLM with each request.
Limiting chat history can speed up LLM interactions and cost you less money!
Limiting chat history can focus your LLM.

All discussions for another blog! Cheers.

LangChain4j: Using IMMEDIATE with tools is great performance boost

October 7, 2025

billburke langchain4j ai, langchain4j, llm, OpenAI, quarkus-langchain4j Leave a comment

TLDR;

Reduce callbacks to your LLM by immediately returning from tool invocations. This can greatly improve performance, give you fine grain control of your tool invocations, and even save you money too. Check out the docs for more information, or read on.

Intro

In my Baldur’s Forge chat application I talked about in my last blog, I found I had a number of cases where I just wanted the LLM to understand a chat user message and route it to the appropriate tool method. I didn’t care what the LLM’s response was after a tool was invoked. I had a few different scenarios:

I wanted the LLM to just execute a task and return and not analyze any response
The tool itself might use my chat application’s architecture to take over rendering a response to the client and I didn’t care or want what the LLM’s response to the tool invocation
I was just using the LLM to route the user to a different prompt chat conversation.

When you have one of those scenarios the interaction with the LLM can be quite slow, why? What’s happening? This is the flow of tool invocations

LLM client sends the user message, chat history, and a json document describing what tools are available to the LLM.
The LLM interprets the user message, chat history, and the tool description json and decides what to do. If the LLM thinks that it needs to invoke a tool, it responds to the client with a list of tools that should be invoked and the parameters to pass to the tool.
The LLM client sees that the LLM wants to invoke a set of tools. It invokes them, then sends another message back to the LLM with the chat history and a json document describing the tool responses.
The LLM looks at the tool responses and chat history, then decides whether to respond with more tool requests, or, answer the chat

If you don’t care what the LLM’s response is after invoking a tool, steps #3 and #4 can be quite expensive both in latency and, if you’re using Open AI or a commercial LLM, it will cost you more money too. The LLM has to do a lot within Step #3. It has to understand the entire chat history as well as the tool responses that the client sent it. This can be quite time consuming, and even with my little app, would add seconds to one interaction depending how busy Open AI was that day.

@Tool(returnBehavior = ReturnBehavior.IMMEDIATE)

The @Tool annotation attribute returnBehavior can help you out with this. Let’s elaborate on the LangChain4j IMMEDIATE docs say, with a Quarkus LangChain4j spin.

@ApplicationScoped
public class CalculatorWithImmediateReturn {
    @Tool(returnBehavior = ReturnBehavior.IMMEDIATE)
    double add(int a, int b) {
        return a + b;
    }
}

@RegisterAiService
interface Assistant {
    @ToolBox({CalculatorWithImmediateReturn.class})
    Result<String> chat(String userMessage);
}

If and only if the call to Assistant.chat() triggers only IMMEDIATE tool calls, the chat() method will return immediately after one or more tool methods are invoked. If there are any tool calls that are not IMMEDIATE then all the tool responses triggered will be sent back to the LLM for processing (Steps #3 and #4).

Here’s an example of calling our chat service:

Result<String> result = assistant.chat("What is the value of 11 + 31?");

if (result.content() != null) { // There were no tool calls.  LLM may want more info from user
    System.out.println(result.content);
    System.exit(0);
}

double val = (Double)result.toolExecutions().get(0).resultObject();

System.out.println("The return value from the Java tool method was: " + val);

String jsonVal = result.toolExecutions().get(0).result();

System.out.println("The json representation of the Java tool method response was: " + jsonVal);

If no tool calls were invoked, or, the LLM decided to invoke a tool that wasn’t IMMEDIATE, the full tool chat exchange would occur (Steps #3 and #4) and result.content() will return a non-null value. If result.content() is null, then we know that there may have been tool invocations.

The result.toolExecutions() method returns list of tool response objects where you can get the tool name, the Java result from the tool method call, and the tool response as a string json representation.

If you want to see a more concrete example of using this feature, check out the class MainMenuChatFrame from my demo chat application. This class invokes an AiService who’s tools are all IMMEDIATE. Notice that in some instances, IMMEDIATE gives me really fine grain control over things and tools can do things like asking chat memory to be cleared.

Here’s another example in EquipmentBuilder. This prompt has a mixture of IMMEDIATE and traditional tool methods that can be called. Building a piece of equipment required full interaction with the LLM, while informational queries triggered tools that returned immediately.

NOTE: Quarkus Langchain4j will not support this feature of LangChain4j until the next release.

My CRUD AI/LLM Chat App Experiences

September 21, 2025

billburke spam Leave a comment

So, writing a hello world examples did not really help me to understand the limitations of LLMs. I decided to see if I could write a pure natural language chat interface for a CRUD application. My thought was maybe a natural language chat interface for CRUD was a high volume application that most organizations would want to implement. The Read part of CRUD would be RAG. The rest would be a pattern I would come up with.

I wanted a CRUD application that would actually do something useful. A real use case. A real application. I was playing Baldur’s Gate 3 a lot, even experimenting with writing mods for it, so I decided to write a BG3 AI/LLM chat bot application.

Here’s the code.

Features

Entire Baldur’s Gate 3 item and spell database is imported
Chat commands to search the database. Nice tooltips are rendered to display searches. For example you can say: Show me weapons that deal radiant damange
You can create new items and give them enchantments
You can package new items into a mod that is usable within the game
You can import an existing mod into the internal database
All these features via a pure chat user interface

The app is functional, but not fully featured. Would need a lot more work to get complete coverage for item creation.

Terms you should know:

Tools – functions published to the LLM that the LLM can decide to invoke
RAG – go look it up
Prompt – basic term. go look it up.
Chat Memory – History of user chat messages and LLM responses. It is passed to the LLM with every chat request.

High level architecture:

Software Stack

Java and Quarkus for the Server
quarkus-rest-jackson
quarkus-langchain4j AI library I used.
Open AI, with 4o language model
PGVectorDB
JQuery for browser client

6 Different prompts

Main Menu Prompt. List of high level commands that can delegate to other prompts and tools.
2 Metadata prompts for RAG. Item type and Slot
Natural language to enchantment macro prompt. BG3 has a rich macro language for application enchantments to magical items. This prompt converts natural language to this macro language
General prompt for equipment building
Prompt for gathering metadata to package the BG3 mod

Main Menu Tool Box

The Main Menu Prompt is hooked up to a list of tool methods. Each of these tools can perform a capability of the application. Creating new items, finding an item, searching for an item, editing or deleting an item, or packaging up a mod. There is a tool for each of these actions and these tools use other prompts and services to complete the actions.

Basic RAG with Metadata Prompts

I imported every BG3 item into a custom in-memory DB. From that in-memory DB I generate a document describing each item in the DB. Its type, properties, enchantments, etc. This document is semi-structured english descriptions. Each document is sent to Open AI to create embeddings for searches. These embeddings are stored in a vector DB. Each item stored in the vector DB has metadata associated with it: ID, Type (Weapon or Armor), Slot (Head, Body, Glove, Boot, Necklace etc…)

For search queries, the user message is filtered through a specific prompt to identify the type (Weapon or Armor). The same user message is sent through a another prompt to extract the Slot. This metadata is used as a filter to the vector DB. This is Basic RAG. Look at my code or do a search on RAG to understand this basic well-known pattern.

Find vs. Search

The Main Menu Prompt has a find command which can find an item by name via a hash lookup, and a search command that is for broader general RAG queries. When the user asks to find something, the LLM matches to the findBy tool. If the item can be found by name, it returns it. Otherwise it throws an exception whose message states the item is not found and that a more general search should be done. The LLM understands this and immediately calls search which does a RAG query. It was pretty cool how this just worked (until lately, when it just decided not to work!). That the combination of langchain4j and OpenAI could catch the exception, process it, and invoke a different appropriate tool.

Chat Context

I found that tools need to store structered information in between chat messages. I call this the Chat Context. For example, when building a piece of armor, the current representation of the new armor is stored in the chat context so that it can be accessed between chat messages.

The client sends the Chat Context as a json document to the server. It contains the user message and a map of context data returned from the server. The server sends the Chat Context as a response to any chats. It can contain one or more response objects, the client side chat memory (see later), and context data that tools might need to perform future actions.

Chat Frames

Some prompts, like the natural language to enchantment macro prompt, are just input and output. There is no conversation or chat memory needed.

Other prompts need to have a conversation with the user. The web client does not make the decision on what prompt to call. It just sends posted text to the server. The server knows what the current prompt is and calls it. Tools can decide to change the current chat prompt. I call this Chat Frames. The Chat Context is used to set the current prompt between chat messages.

When a chat frame is pushed or poped, the current chat memory is cleared.

Client-side Chat Memory

I needed per-user chat memory. I didn’t like the idea of storing chat memory on the server as I prefer my services to be stateless. Obviously, being stateless solves a lot of architectural issues as there’s no need to have sticky sessions or distributed cache to hold session state. So, I developed client side chat memory which simply serializes chat memory into json and stores it in the Chat Context. Client memory is piggy-backed with the client user message when chatting with the server.

I had to do a few hacks to make this work with Quarkus correctly. All that is documented in the code. Take a look!

Tool/UI Messages

As you’ll see in the Lessons Learned section, it was quite difficult to get the LLM to return formatted output. So, what I implemented was that tools can piggyback additional response messages to the client. These messages tell the client to perform specific actions and could provide data to perform those actions. For example, search provides a list of equipment. This is sent back to the client as a ListEquipment message. From that message, the client renders something nice a specific.

LLM Json Document Builder

For general data gathering (equipment building and mod packaging) I used prompt/tool pattern of building a JSON document. The prompt would provide a JSON schema and the current JSON built document and ask the user to fill out the fields that haven’t been set yet. I used 2 different tool patterns.

If you look at ModPackager you’ll see that the LLM will call the updatePackage tool method passing in json of the package representation. The way Open AI 4o works(sometimes!) is that it will pass in the PackageModel with only the fields set in the user message. The updatePackage tool method figures out what fields were set and updates the current json object and stores it within the Chat Context. When the user says they are finished, the finishPackage tool method is invoked by the LLM and sends back a message to the client through the Chat Context to package up the mod.

Building items (magic rings, armor, weapons, etc..) is a little different. Instead of having an update method, I have a tool method that sets each and every property of the json document. An example is the setName. Why do I do it this way? Well…I used to use the update tool method pattern I used for ModPackager, but one day, after like 2 months, Open AI 4o decided not to call the update method consistently.

Lessons Learned

The most frustrating thing about developing an AI/LLM chatbot was how wildly inconsistent it was. Stuff that would work one day (or one week, or one month even!) would not work the next, then start working again, then stop again. This caused me to constantly refine how I was working with it.
This was most of the work: getting consistent and deterministic results from the AI. It could be the model I was using (OpenAI 4o).

Here’s more specific lessons learned:

Keep prompts focused

This is the one consistent thing you could find on the web on how to write prompts. Keep them focused on a specific thing you want to do. If you try to write one prompt to rule them all the LLM will get confused and may or may not invoke the tools or spit out the text you want.

Provide examples with sample input and output. Or sample input and what action you want the AI to take

Initiailly, the main menu prompt was very simple. I relied on the LLM to match the user message to the list of tools that were provided. The LLM ended up being really inconsistent on whether or not it would call the appropriate tool or not. What I finally did was rewrite the main menu promopt with a set of examples of input user messages and actions to take based on that input. Without those example, the LLM was really inconsistent. Sometimes it would call a tool, sometimes it wouldn’t. Depended on its mood!

You cannot trust the AI to provide consistently formatted output.

I honestly gave up on trying to get the AI to output HTML. I even tried returning HTML output from tool methods and the LLM would just ignore it, interpret the response from the tool and output anything it wanted. No matter what I put in the prompt, OpenAI would almost always return Markdown. I was able to get it to output strict json schema, but sometimes it would return json embedded within Markdown. It was very frustrating.

This is another reason I created Chat Context. When a tool needs something rendered, I piggyback a structured Message object on the chat response. You can find examples of this all over the place in my code.

Need to communicate session state between tools and client

For the, browser clients and server-side prompts and tool methods to even be able to work, I needed a way to pass session information between them. Chat memory just wasn’t enough. In fact, chat memory often caused problems (see other issues).

Your client code should format complex visual responses itself

If you want a nice interface with complex output that is formatted really nicely, you WILL NOT be able to use the LLM output as-is. You’ll need to extract the appropriate data structure from LLM responses and send it back to the client to be rendered correctly using human written code.

AI responses will be inconsistent. You cannot guarantee what the AI will output

I originally had my search return multiple items in a json array and I’d ask the AI to list and summarize the search. Sometimes it would provide a numbered list. Sometimes bulleted. Sometimes it would produce a paragraph with the names of things that it found. Thus, I couldn’t create a consistent and nice looking UI. I gave up trying to get the AI to produce nice output.

void tool responses confuse the AI

For instance, I had a tool method void listEquipment() It would invoke the tool correctly, but then output that it could not find anything to list even though I would tell the prompt and tell description not to output anything when listEquipment was called.

Tools will need access to the original user message

Multiple tool implementations in my code needed to get access to the current user message to get more accurate results from the LLM. “Why, you ask?” Read on dear reader!

You can’t guarantee what the LLM will send to tool parameters

For example, I had a search tool method that took a string query parameter. The AI would look at the user message and extract keywords before calling the search method. This would screw up search results. I had to manually make the raw user message available to the tool function. Context matters and keywords can lose context.

Take a look at the addBoost tool method. The LLM would look at the following user message and invoke the addBoost method.

create new longsword with boost of +3

Only +3 would be sent as a parameter and it would not know if it was armor or a weapon and didn’t know whether to output AC(3) or WeaponEnchantment(3). Again, I had to send get access to the original user message and send that to the boost prompt.

Chat memory can confuse the AI

When invoking the enchantment macro prompt, I originally hooked up this call to chat memory. The 2nd time addBoost tool was called within a chat session the call to the enchantment prompt would convert the user message , but ALSO would look in chat history and convert a previous add boost user message request.

For example:

User:  add boost advantage on saving throws
AI: Ai would return Advantage(SavingThrows) from enchantment prompt
User:  add boost +3 weapon enchantment
AI:  Ai would return Advantage(SavingThrows);WeaponEnchantment(3)

To solve this, I turned off chat memory for enchantment prompt invocations (aka I removed the @MemoryId parameter to the chat method call)

You can’t guarantee that the AI will call a tool

I used to have the enchantment prompt as part of the @ToolBox for WeaponBuilderChat.buildWeapon (and the other builders). For a long long time it would consistently convert a boost description to a boost macro by calling the enchantment prompt before invoking setBoost/addBoost tool methods. Then it just stopped working consistently!!! For no reason at all OpenAI would or would not call the enchantment prompt tool. It was different every time.

An even simpler example is that I have a setBoost tool and a addBoost tool which is called to set the enchantment macro. setBoost clears the existing value of the boost and resets it with the new. addBoost tool adds an additional macro to the existing enchantment. Simple right? So, it sometimes works, and sometimes doesn’t. For instance, I type add boost +3 to weapon it sometimes calls the addBoost tool other times it calls the setBoost tool. Frustrating! The workaround was to explicitly state in the prompt when to call the add or set boost tool. At least, I hope that’s the workaround!!! I wouldn’t be surprised if this just stopped working randomly someday.

Prompts and Code will break with another LLM model

I spent most of my time with Open AI 4o. After I got a working application, I decided to first try Ollama and llama3. I gave up quickly as it couldn’t even process the main menu commands. Qwen 3.2 model worked a little better. Main menu command processing sort of worked, but LLM output had thinking reasoning logic in it. You can turn this off, but the <thinking> XML blocks were still there which through off all my text processing. I quickly realized that it would take a considerable amount of time to port my app to another model. I’m going to write another blog about this.

Experiences with Cursor IDE

This project was the first time I ever used Cursor IDE. I love Cursor and can’t live without it, but when I first started using it I almost uninstalled it immediately. The first thing I asked Cursor chat window to do was to generate me a quarkus application that used quarkus-langchain4j. Oh, it generated a complete project, that compiled and everything. Except, this application didn’t do anything. It looked liked it should do something. It created embedding interfaces and stuff that looked like it should be for quarkus-langchain4j, but it wasn’t. A complete wild goose chase and waste of time. I found that anytime you ask Cursor to do something big, it creates something that compilable and even runnable, but doesn’t even come close to what you want.

For instance, I asked it to port a C# project to Java LSLib. Oh, it converted it to Java. All the files. Ran for a really long time too. But the output was complete garbage. It generated a lot of placeholder interfaces and classes even though those classes existed in the port. It did work a lot better when I asked it to port a file at a time. Like with the experience in the previous paragraph though, it either sort of worked, or didn’t work at all. I’m honestly not quite sure if it would have been faster to manually port LSLib.

What Cursor was completely amazing at was live coding. While I code, it offers suggestions that you can press the TAB key to use. It was quite eery sometimes how it guessed what I wanted. I’m completely addicted to this feature and can’t live without it.

It was also quite good when I asked it to do small concise things like

Create me a regular expression to match something
Create me a function to put a space before any capital letter in text

Stuff like that.

It was also incredible on the UI side. I asked it to add a menu button that had a pull down menu. It generated it perfectly. I asked it to generate a tooltip with HTML I would provide it whenever it hovered over a specific link. It generated it perfectly. I asked it to show a red dot over a button whenever a certain counter was above zero. Not only did it do that, but it added a cool effect to the red dot without me even asking. Again, if you ask it to do specific small concise tasks, it can do it. Well…. most of the time it can 🙂 Sometimes, even when you’re concise, it outputs garbage.

My opinion is, is that generative AI (for coding and writing) requires constant, repeated human interaction. It does not replace humans, it requires our input, constantly, because more often than not, it outputs garbage. What I liked about Cursor was this back and forth, between me and the AI was fast and seemless and it allowed me to avoid adding crappy code.

Quarkus Playpen: Live coding in dev cluster

November 22, 2024

billburke spam Leave a comment

Quarkus Playen 1.0.0 is released!

Live coding with your development cluster

Temporarily route requests to a service in a cluster to your laptop or to a temporary pod. This allows you to develop realtime with a development cluster so that you can see your service in action with the rest of your deployed service mesh.

https://github.com/quarkiverse/quarkus-playpen/tree/1.0.0

Workflow for releasing on nexus and graal binaries on github

October 15, 2024

billburke java, opensource, quarkus github actions, graal, java, maven release plugin, quarkus, workflow Leave a comment

It took me a while to figure this out so I thought I’d share it.

I have Java quarkiverse project that releases a quarkus extension to nexus using the maven release plugin and builds CLI binaries with Graal after the fact for windows, macosx, and linux. The binaries are uploaded to a Github release of the tagged release.

https://github.com/quarkiverse/quarkus-playpen/blob/0.9.1/.github/workflows/release.yml

https://github.com/quarkiverse/quarkus-playpen/releases/tag/0.9.1

Azure Functions + Quarkus HTTP Native/Graal

August 10, 2023

billburke azure functions, quarkus azure functions, custom handler, graal, HTTP, quarkus, REST Leave a comment

You can take an existing Quarkus HTTP application and deploy it as a native executable (Graal build) to Azure Functions using the Custom Handler runtime provided with Azure Functions.

Prerequisites:

An Azure Account. Free accounts work.
Azure CLI Installed
Azure Functions Core Tools version 4.x
A way to build a quarkus native executable (Graal installed or via docker build)

You’ll need to configure a root path in application.properties:

quarkus.http.root-path=/api/{functionName}

Replace {functionName} with whatever you want to call the function you’re creating for the Quarkus HTTP application.

Next, rebuild your application to create a native executable targeting linux 64bit x86 runtime.

$ mvn package -Pnative -DskipTests=true \
         -Dquarkus.native.container-build=true \
         -Dquarkus.native.builder-image=quay.io/quarkus/ubi-quarkus-mandrel-builder-image:jdk-17

After you do this create a directory in the root of your project and create function deployment descriptors. Specify anything you want for the name of the function

$ mkdir my-app
$ cd my-app
$ func new --language Custom --template HttpTrigger  \
                   --name my-func --authlevel anonymous

The func command will generate a custom handler host.json file and an http trigger function.json for you for the function named my-func.

Next copy your application’s native executable to your app directory:

$ cp ../target/code-with-quarkus-1.0.0-SNAPSHOT-runner app

Next you’ll need to edit my-func/function.json. Add a route that matches to all paths.

{
  "bindings": [
    {
      "authLevel": "Anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get",
        "post"
      ],
      "route": "{*path}"
    },
    {
      "type": "http",
      "direction": "out",
      "name": "res"
    }
  ]
}

Without specifying a route, you will not be able to get requests to rest endpoints defined in your application.

Next you need to edit your host.json file to specify some configuration. You’ll need to enable http forwarding (enableForwardingHttpRequest), specify the executable name of your application (defaultExecutablePath), and define the http port for Quarkus to bind to by specifying a system property to pass to the application as an argument (arguments. Here’s what it should look like in the end:

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[3.*, 4.0.0)"
  },
  "customHandler": {
    "enableForwardingHttpRequest": true,
    "description": {
    "defaultExecutablePath": "app",
    "workingDirectory": "",
	"arguments": [
       "-Dquarkus.http.port=${FUNCTIONS_CUSTOMHANDLER_PORT:8080}"
      ]
    }
  }
}

Test Locally

To test locally, use the func start command. Make sure you are in the my-app directory you created earlier!

$ func start


# Azure Functions Core Tools
# Core Tools Version:       4.0.5198 Commit hash: N/A  (64-bit)
#Function Runtime Version: 4.21.1.20667
#
#
# Functions:
#
#	my-func: [GET,POST] http://localhost:7071/api/my-func/{*path}
#
# For detailed output, run func with --verbose flag.
# [2023-08-10T19:08:59.190Z] __  ____  __  _____   ___  __ ____  ______ 
# [2023-08-10T19:08:59.191Z]  --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
# [2023-08-10T19:08:59.191Z]  -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
# [2023-08-10T19:08:59.191Z] --\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,187 INFO  [io.quarkus] (main) code-with-quarkus-1.0.0-SNAPSHOT native (powered by Quarkus 999-SNAPSHOT) started in 0.043s. Listening on: http://0.0.0.0:33917
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,188 INFO  [io.quarkus] (main) Profile prod activated. 
# [2023-08-10T19:08:59.191Z] 2023-08-10 15:08:59,188 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy, smallrye-context-propagation, vertx]
# [2023-08-10T19:08:59.209Z] Worker process started and initialized.
# [2023-08-10T19:09:04.023Z] Host lock lease acquired by instance ID '000000000000000000000000747C0C10'.

To test go to http://localhost:7071/api/my-func/hello. The hello part of the path can be replaced with your REST API.

Deployment

To deploy you’ll need to create an Azure Group and Function Application.

# login
$ az login

# list subscriptions
$ az account list -o table

# set active subscription.  You do not have to do this if you only have one subscription
$ az account set --subscription <SUBSCRIPTION_ID>

# create an Azure Resource Group 
az group create -n rg-quarkus-functions \
  -l eastus

# create an Azure Storage Account (required for Azure Functions App)
az storage account create -n sargquarkusfunctions2023 \
  -g rg-quarkus-functions \
  -l eastus

# create an Azure Functions App
az functionapp create -n my-app-quarkus \
  -g rg-quarkus-functions \
  --consumption-plan-location eastus\
  --os-type Linux \
  --runtime custom \
  --functions-version 4 \
  --storage-account sargquarkusfunctions2023

Make sure that the os-type is Linux!!! Note that your Function Application name must be unique and may collide with others.

Now you can deploy your application. Again, make sure you are in the my-app directory!

$ func azure functionapp publish my-app-quarkus

# Getting site publishing info...
# Uploading package...
# Uploading 15.32 MB 
# Upload completed successfully.
# Deployment completed successfully.
# Syncing triggers...
# Functions in my-app-quarkus:
#    my-func - [httpTrigger]
#         Invoke url: https://my-app-quarkus.azurewebsites.net/api/{*path}

A successful deployment will tell you how to access the app. Just remember the root context will always be /api/{functionName}.

Quarkus Live Coding and Continous Testing with Lambda

September 8, 2021

billburke spam Leave a comment

Last week I merged Quarkus Live Coding and Continuous Testing support for Lambda and it will be available in Quarkus 2.3. I’m excited to see how its received by the Quarkus community. The PR allows you to run in Quarkus Dev or Test mode all locally within a mock Lambda environment automatically set up by the Quarkus build and runtime. This works when writing raw lambdas, with Quarkus’s Lambda + HTTP support, as well as with Funqy and Lambda. For all these integration points with Lambda, Quarkus provides an HTTP server that you can invoke on with Curl manually, or via Rest Assured or another HTTP client if you’re writing tests.

To completely understand what is going on here and how it works, I need to explain a little bit how Lambda’s work in general. Lambda’s (at least Java Lambdas) do not receive or process requests directly. Instead, an AWS Lambda event server is polled by your Lambda deployment for new requests that need to be processed. The AWS Lambda Java runtime has this poll loop built in. For native binary deployments with Graal VM, Quarkus re-implemented this poll loop itself.

The new Quarkus Live Coding and Continuous Testing support I merged has a new Mock Event Server that simulates the AWS Lambda event server. When running in Dev or Test mode, Quarkus boots up this Mock Event Server and runs your Lambda’s within the Quarkus poll loop. The Mock Event Server starts an HTTP server under port 8080 in Dev mode and port 8081 in Test mode. You can send requests to the event server with any HTTP client at http://localhost:8080 and http://localhost:8081 respectively. The payload of the request is sent as a Lambda event and is picked up by the poll loop and sent to your lambda code.

For Quarkus’s Lambda + HTTP support, the same Mock Event Server is started as well. In this case the Mock Event Server looks like a normal HTTP server to the client, but under the covers it automatically converts HTTP requests to the JSON HTTP event payload that the AWS API Gateway uses. If you want to send raw API Gateway HTTP events, you can by posting to http://localhost:8080/_lambda_ or http://localhost:8081/_lambda_ depending on whether you are in dev or test mode. Sending raw API Gateway HTTP events is a great way to test AWS security locally.

Anyways, I hope you all like this support. I’m very curious how what I’ve done compares to running locally with SAM CLI. Is it better? Worse? Needs a few more tweaks? Hope to hear from you soon.

Anybody else hate PR reviews?

March 31, 2021

billburke flame bait 3 Comments

I’ll start by introducing some flaim bait:

PR REVIEWS ARE WORTHLESS!

How often are you waiting days or even weeks for a PR review? And when it is about to be approved you get a merge conflict and have to start all over again? How often are PR reviews subjective rather than objective? How often are you subjected to anal-retentive suggestions that have nothing to do with the efficiency or effectiveness of your code and all to do with somebody’s pet peeve?

Nobody ever reviews my tests

I’ve been paid to write code for 31 years now. Its never good to deal in absolutes like always or never or everybody or nobody, but I can honestly say that in the last 3 decades my tests have NEVER EVER been reviewed by ANYBODY. The only time I ever get a comment on testing is if I submit a PR that doesn’t have ANY test code. This is the biggest reason why I think PR reviews are completely worthless. Your testsuite is the most important part of your project. Without a good testsuite that CI green checkmark you waited so long to get has a lot less meaning. Without a good testsuite, developers become superstitious and fearful about changing any code. If PR reviewers aren’t reviewing the tests, what the fuck is the point of the review?

PR Reviews slow down development for little gain

PR reviews almost always slow down development and can even bring it to a standstill. I’ve experienced this at multiple companies so its not just a symptom of open source development at Red Hat. At the last company I worked at, you’d dread any inter-team PR as you would wait for weeks, even months for a PR review to get on somebody’s Sprint board. On the Quarkus team given the nature of open source development, the sheer volume of PRs is overwhelming for those that can actually review and approve PRs.

The results of the PR review backlog are merge conflicts, unintegrated code, and often paralyzed development. Every day you wait for an approval increases the probability of a merge conflict causing additional work you didn’t plan for. As PR’s pile up you end up with a bunch of unintegrated code that has not been tested together, with a CI green checkmark that could be days or weeks old. If you need to continuously work on the same part of the code base, your PR backlog can prevent you from working on bugs or improvements as you are waiting on your code to become part of master.

As a result of this, you find yourself looking for people that are rubber stamp reviewers. “Hey buddy, can you just approve my PR? The sprint ends tomorrow.”. We are all guilty of it at some point or another. Obviously, this defeats the purpose of the PR review.

PR reviewers look for low hanging fruit to deny your PR

The PR review process is a chore for developers. A good review takes time. Time most developers don’t have. This is especially true for cross-team PRs where the target team often does not have your PR on their sprint board. Because of this time squeeze, reviewers look for low hanging fruit to deny your PR so they don’t have to approve it. This is especially common for PRs that come from external non-team members as trust hasn’t been established. Because of the lack of time for a comprehensive review, they look for a simple subjective suggestion to bump you back to the CI/approval queue. i.e. “I don’t like the name of this method, class, or field.” Sometimes I think reviewers feel compelled to find something wrong. Like they aren’t doing their job if they don’t (especially younger devs).

PR reviews discourage iterative development

I’m old. I spent a significant number of years without the PR review process in place. What I’ve found personally is that I rarely do much iterative development anymore because the CI/PR review process takes so long. In the past I’d be fixing a small bug and find a piece of code or API that I think was poorly written and just rewrite it. Refactorings often create bigger PRs. The thing is, developers hate reviewing large PRs as it takes a lot of time. So, any large PR might spend a lot longer in the PR review queue. Because of this, I don’t do refactorings much anymore because a long review wait queue makes merge conflicts inevitable.

How to fix the problem?

How can we fix this PR review bottleneck?

Approve and open up a new issue instead

If the PR passes CI, is tested correctly, but you believe the code could be better, instead of demanding a change, open a new ticket and schedule the code improvement for later. Better yet, assign the new ticket to yourself.

Automatic approval after fixed time period

If a PR has the CI greenlight, but sits un-reviewed. Approve it blindly after a fixed time period.

Stan Silvert suggested this in the comments

Suggest, not demand subjective changes

Identify when you are asking for a subjective change. Be honest, aren’t the majority of PR reviews just subjective suggestions? Instead, just comment on your suggestion, or just use Github and code a change to the PR itself (Github allows you to autocommit this).

Introduce an “Approve if CI passes“

I think Github supports this, but, have a PR approval if CI passes option to PR requests. Automate it if possible.

Review tests

Read what my PR is trying to accomplish then review my test code to see if it is comprehensive enough. If you don’t have time to do that, then you are better off just approving the PR and removing yourself as a bottleneck.

Trust your test suite and CI

To speed up the PR approval you have trust in your test suite and CI builds. If your test suite sucks you are going to be more superstitious and fearful of changes to your codebase and will look for any excuse to deny a PR. If you are a project lead spend a Sprint or two every once and awhile and focus on improving your testsuite and CI builds. Introduce backward compatibility tests. Performance regression tests. Review code coverage reports. If your testsuite is strong, its ok to just gloss over PRs, or just blindly approve PRs from trusted individuals.

Encourage asynchronous, pre-PR feedback, but without Slack

As a PR submitter, you will ease a lot of pain if you talk about your design before you begin coding. Talk about API changes you want to make before getting to work. Get some input. You’ll save a lot of CI/PR review time if you do some of this upfront.

You’ll be even happier if you fish for asynchronous feedback. Instead of zoom, try sending out an email about what your ideas are. This allows people to give you feedback on their own time.

Why not just drop the PR review?

I’ll conclude my blog with some controversy: Why not just drop PR reviews altogether. Their trouble is worth more than their value. PR submitters look for rubber stampers. Reviewers don’t review tests: the most important part of your PR. Let your testsuite and CI become your automated implicit PR reviewer.

If you’re too scared to make this leap, maybe develop a circle of trust where you have a subset of developers that can be trusted to self-approve and merge their PRs. Before git, PR reviews, and CI, back in the old JBoss days of the 2000s, our open source developers earned commit rights. If you earned the trust of the project leaders, you earned the right to commit and merge whatever you wanted. Granted, in those days we also didn’t have Continuous Development, but maybe there is some middle ground we can reach.

/votetokick pr-reviews

Finally, please note that this blog is tagged as “flame bait”. I’ve received some internal emails with people that are relatively shocked I could even suggest or question the value of the PR review. Maybe I’m just trying to show people that they need to improve/streamline their review process or just get out of my way and trust me? 🙂

QSON: New Java JSON parser for Quarkus

December 15, 2020

billburke java, quarkus 10 Comments

Quarkus has a new JSON parser and object mapper called QSON. It does bytecode generation for the Java classes you want to map to and from JSON around a small core library. I’m not going to get into details on how to use it, just visit the github page for more information.

I started this project because I noticed a huge startup time for Jackson as relative to the other components within Quarkus applications. IIRC it was taking about 20% of the boot time for a simple JAX-RS microservice. So the initial prototype was to see how much I could improve boot time and I was pleasantly surprised that the parser I implemented was a bit better than Jackson at runtime too!

The end result was that boot time improved about 20% for a simple Quarkus JAX-RS microservice. The runtime performance is also better in most instances too. Here are the numbers from a JMH benchmark I did:

Benchmark                           Mode  Cnt       Score   Error  Units
MyBenchmark.testParserAfterburner  thrpt    2  223630.276          ops/s
MyBenchmark.testParserJackson      thrpt    2  218748.065          ops/s
MyBenchmark.testParserQson         thrpt    2  251086.874          ops/s
MyBenchmark.testWriterAfterburner  thrpt    2  189243.175          ops/s
MyBenchmark.testWriterJackson      thrpt    2  168637.541          ops/s
MyBenchmark.testWriterQson         thrpt    2  177855.879          ops/s

These are runtime throughput numbers so the higher the better. Qson is better than regular Jackson and Jackson+Afterburner for json to object mapping (reading/parsing). For output, Qson is better than regular Jackson, but is a little behind Afterburner.

There’s still some work to do for Qson. One of the big things I need is a maven and gradle plugin to handle bytecode generation so that Qson can be used outside of Quarkus. We’ll also be adding more features to Qson like custom mappings. One thing to note though is that I won’t add features that hurt performance, increase memory footprint, or hurt boot time.

Over time, we’ll be integrating Qson as an option for any Quarkus extension that needs Json object mapping. So far, I’ve done integration with JAX-RS (Resteasy). Funqy is a prime candidate next.

Quarkus Funqy: Portable Function API

June 9, 2020

billburke aws lambda, azure, cloud, faas, knative, quarkus, serverless Leave a comment

Quarkus Funqy is a new FaaS API that is portable across cloud runtimes like AWS Lambda, Azure Functions, Knative Events, and Google Cloud Functions. Or, if you’re deploying to a traditional environment, Funqy functions can work standalone as well.

public class MyClass {
   @Funq
   public Greeting greet(String name) {
     ...
   }
}

The idea of Funqy is simple. You write one Java method that has one optional input parameter and that returns optional output. Either primitives or POJOs are supported as input and output types. Under the covers, Funqy integrates with whatever plumbing is needed depending on your deployment environment. Funqy classes are Quarkus components and can accept injections using Spring DI or CDI annotations (through Arc).

Funqy Bindings

Funqy can be deployed in a number of environments.

HTTP – Run one or more functions as an HTTP service
AWS Lambda – Deploy a Funqy function directly to AWS Lambda
AWS Lambda + HTTP – Deploy to Lambda, but invokable through HTTP
Azure Functions HTTP – use HTTP trigger to invoke a Funqy function
Knative Events – Cloud Event support for Funqy functions.
Google Cloud Functions – (coming soon thanks to Loic Matheiu)

Motivations for Funqy

Why did we create Funqy? Part of our Quarkus Serverless Strategy was to make popular REST frameworks available for use in environments like AWS Lambda. When the Quarkus team was asked to integrate with Cloud Events, we felt like traditional REST frameworks didn’t quite fit even though Cloud Events has an HTTP binding. Funqy gave us an opportunity to not only unify under one API for FaaS development, but to greatly simplify the development API and to create a framework that was written for and optimized for the Quarkus platform.

REST vs Funqy

The author of this blog loves JAX-RS, was the founder of Resteasy, and even wrote a book on JAX-RS. REST is still the preferred architecture and REST over HTTP is still an ubiquitous way of writing service APIs. The thing is though, if you go out into the wild you’ll find that many application developers don’t follow REST principles. HTTP and REST frameworks are pretty much used as an RPC mechanism. Cool features in HTTP like cache-control and content negotiation are rarely used and JSON is the de facto representation exchanged between client and server.

If all this is true, you don’t need 80% of the features that a spec like JAX-RS provides. Nor do you want the overhead of supporting those unused features in your runtime. Since Funqy is a small, tightly constrained API, all the overhead of supporting unused REST features are ripped out. If you look at the implementation, its a very thin integration layer over the ridiculously fast Vert.x Web runtime. Each Funqy binding is a handful of classes. Funqy’s overhead is purely marshalling.

Who knows what the future holds for Funqy. It’s part of a quest to reduce the complexity and overhead of Java development as well as provide something that is portable to many environments so that you aren’t locked into a specific cloud vendor API. Enjoy everybody!

Chat Memory in Quarkus Langchain4j

@MemoryId

Default Memory Id

Default memory ids vs. using @MessageId

Memory Lifecycle tied to CDI bean scope

What bean scopes should you use?

Chat Memory and your LLM

Share this:

Share this:

Features

High level architecture:

Software Stack

6 Different prompts

Main Menu Tool Box

Basic RAG with Metadata Prompts

Find vs. Search

Chat Context

Chat Frames

Tool/UI Messages

LLM Json Document Builder

Lessons Learned

Keep prompts focused

Provide examples with sample input and output. Or sample input and what action you want the AI to take

You cannot trust the AI to provide consistently formatted output.

Need to communicate session state between tools and client

Your client code should format complex visual responses itself

AI responses will be inconsistent. You cannot guarantee what the AI will output

void tool responses confuse the AI

Tools will need access to the original user message

You can’t guarantee what the LLM will send to tool parameters

Chat memory can confuse the AI

You can’t guarantee that the AI will call a tool

Prompts and Code will break with another LLM model

Experiences with Cursor IDE

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

My Last Book

Recent Posts

Categories

Archives

Since July, 2007

Subscribe to Blog via Email