If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's ChatModel abstracts all of that away with a clean, builder-oriented API.
In this guide, I'll walk through building real, working AI features using ChatModel — from a simple chat call to a streaming chatbot with conversation memory.
1. What Is ChatModel?
ChatModel (package org.noear.solon.ai.chat) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:
- Synchronous calls — one-shot request, full response
-
Streaming calls — reactive streaming via Project Reactor (
Flux<ChatResponse>) - Tool/Function Calling — let the LLM invoke your Java methods
- Chat Sessions — automatic conversation memory
- Multi-modal messages — text, images, audio
- Dialect adaptation — works with OpenAI, Ollama, Anthropic, Gemini, DashScope, and more
The best part? It uses a dialect pattern — you point it at any compatible LLM endpoint, and it adapts automatically.
2. Setting Up
Add the dependency to your pom.xml (no parent POM needed — Solon works standalone):
<dependency>
<groupId>org.noear</groupId>
<artifactId>solon-ai</artifactId>
<version>${solon.version}</version>
</dependency>
This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).
3. Configuration
3.1 Via YAML (Recommended)
solon.ai.chat:
demo:
apiUrl: "http://127.0.0.1:11434/api/chat" # Full URL, not baseUrl
provider: "ollama" # Dialect identifier
model: "llama3.2" # Model name
headers:
x-demo: "demo1"
Then create a @Bean to get a ready-to-use ChatModel:
import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;
@Configuration
public class AiConfig {
@Bean
public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
return ChatModel.of(config).build();
}
}
3.2 Programmatic Builder
Prefer code over config? Use the builder directly:
@Bean
public ChatModel chatModel() {
return ChatModel.of("http://127.0.0.1:11434/api/chat")
.standard("ollama") // or .provider("ollama") pre-4.0
.model("llama3.2")
.timeout(Duration.ofSeconds(60))
.build();
}
3.3 Supported Model Providers
The standard (or provider) field selects the dialect:
| Standard | Example apiUrl
|
Models |
|---|---|---|
openai (default) |
https://api.openai.com/v1/chat/completions |
GPT, DeepSeek, Qwen, GLM, Kimi, etc. |
ollama |
http://127.0.0.1:11434/api/chat |
Any local Ollama model |
anthropic |
https://api.anthropic.com/v1/messages |
Claude |
gemini |
https://generativelanguage.googleapis.com/v1beta/models/... |
Gemini |
dashscope |
Aliyun DashScope endpoint | Qwen (DashScope native) |
4. Synchronous Calls (The Simple Way)
The most basic use case — send a prompt and get a full response:
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;
@Component
public class ChatService {
@Inject
ChatModel chatModel;
public String ask(String question) throws IOException {
ChatResponse resp = chatModel.prompt(question).call();
return resp.getMessage().getContent();
}
}
That's it. Three lines of business code.
5. Streaming Calls (Real-Time Responses)
For chatbots and assistants, streaming is essential. ChatModel returns a Reactor Flux<ChatResponse>:
import reactor.core.publisher.Flux;
public Flux<String> askStream(String question) throws IOException {
return chatModel.prompt(question)
.stream()
.filter(ChatResponse::hasContent) // skip empty chunks
.map(resp -> resp.getMessage().getContent());
}
You can then subscribe, or — if you're using Solon Web Reactive — return the Flux directly to an SSE endpoint:
import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;
@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
return chatModel.prompt(prompt)
.stream()
.filter(ChatResponse::hasContent)
.map(resp -> new SseEvent()
.data(resp.getMessage().getContent()));
}
The streaming protocol uses standard SSE (text/event-stream) or x-ndjson depending on the provider.
6. Conversation Memory with ChatSession
LLMs are stateless. To maintain conversation context, you need to pass history with each request. ChatSession handles this automatically.
6.1 Basic Session Usage
import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;
ChatSession session = InMemoryChatSession.builder()
.sessionId("user-123")
.maxMessages(10) // keep last 10 turns
.build();
// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
.session(session)
.call();
// Second turn — model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
.session(session)
.call();
6.2 Web Chat with Per-User Sessions
In a real web app, you'll want one session per user. Here's a controller that does exactly that:
import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Controller
public class ChatController {
@Inject
ChatModel chatModel;
final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();
@Mapping("/chat")
public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
ChatSession session = sessionMap.computeIfAbsent(sessionId,
k -> InMemoryChatSession.builder().sessionId(k).build());
return chatModel.prompt(prompt)
.session(session)
.options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
.stream()
.filter(ChatResponse::hasContent)
.map(resp -> new SseEvent().data(resp.getMessage().getContent()));
}
}
6.3 Built-in Session Implementations
| Implementation | Storage | Use Case |
|---|---|---|
InMemoryChatSession |
Local Map | Dev, single-node |
FileChatSession |
File system | CLI tools, desktop apps |
RedisChatSession |
Redis | Production, distributed |
7. Fine-Tuning with ChatOptions
Control model behavior per-request with ChatOptions:
chatModel.prompt("Write a poem about Java")
.options(o -> o
.temperature(0.8)
.max_tokens(500)
.top_p(0.9)
.systemPrompt("You are a creative poet."))
.call();
Common options include:
| Method | Description |
|---|---|
temperature(val) |
Sampling temperature (0.0–2.0) |
max_tokens(val) |
Max output tokens |
top_p(val) |
Nucleus sampling |
top_k(val) |
Top-K sampling |
frequency_penalty(val) |
Reduce repetition |
presence_penalty(val) |
Encourage new topics |
tool_choice(val) |
Force tool use: none, auto, required, or tool name |
systemPrompt(val) |
System message for this request |
role(val) |
Agent role (v3.9.1+) |
instruction(val) |
Agent instruction (v3.9.1+) |
8. Multi-Message Prompts
Sometimes you need more than a simple string. Use Prompt and ChatMessage:
import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;
Prompt prompt = Prompt.of(
ChatMessage.ofSystem("You translate English to French."),
ChatMessage.ofUser("Hello, how are you?"),
ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
ChatMessage.ofUser("What is your name?")
);
ChatResponse resp = chatModel.prompt(prompt).call();
9. Putting It All Together: A Practical Example
Let's build a simple knowledge-aware chatbot — the kind of RAG-lite pattern you see in real projects. This example uses ChatMessage.ofUserAugment() to inject context into the prompt:
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;
@Component
public class KnowledgeChatbot {
@Inject
ChatModel chatModel;
public String answer(String question, String referenceContext) throws Exception {
// Augment the user message with reference context
ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);
ChatResponse resp = chatModel.prompt(augmented)
.options(o -> o
.temperature(0.3)
.systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
.call();
return resp.getMessage().getContent();
}
}
This pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.
10. What's Next?
ChatModel is just the entry point. Solon AI also offers:
-
Tool Calling — define
@ToolMappingmethods the LLM can invoke - Talent System — reusable capability modules (Skill-like)
-
Agents —
ReActAgentandTeamAgentfor multi-step reasoning - RAG — full pipeline with document loading, splitting, embedding, and retrieval
- MCP Protocol — connect to MCP servers for external tools
For the full documentation, check out the official Solon AI guide:
👉 https://solon.noear.org/article/918 (Model construction)
👉 https://solon.noear.org/article/920 (API reference)
Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.