Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications

If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's ChatModel abstracts all of that away with a clean, builder-oriented API.

In this guide, I'll walk through building real, working AI features using ChatModel — from a simple chat call to a streaming chatbot with conversation memory.

1. What Is ChatModel?

ChatModel (package org.noear.solon.ai.chat) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:

Synchronous calls — one-shot request, full response
Streaming calls — reactive streaming via Project Reactor (Flux<ChatResponse>)
Tool/Function Calling — let the LLM invoke your Java methods
Chat Sessions — automatic conversation memory
Multi-modal messages — text, images, audio
Dialect adaptation — works with OpenAI, Ollama, Anthropic, Gemini, DashScope, and more

The best part? It uses a dialect pattern — you point it at any compatible LLM endpoint, and it adapts automatically.

2. Setting Up

Add the dependency to your pom.xml (no parent POM needed — Solon works standalone):

<dependency>
    <groupId>org.noear</groupId>
    <artifactId>solon-ai</artifactId>
    <version>${solon.version}</version>
</dependency>

This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).

3. Configuration

3.1 Via YAML (Recommended)

solon.ai.chat:
  demo:
    apiUrl: "http://127.0.0.1:11434/api/chat"   # Full URL, not baseUrl
    provider: "ollama"                           # Dialect identifier
    model: "llama3.2"                            # Model name
    headers:
      x-demo: "demo1"

Then create a @Bean to get a ready-to-use ChatModel:

import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;

@Configuration
public class AiConfig {
    @Bean
    public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
        return ChatModel.of(config).build();
    }
}

3.2 Programmatic Builder

Prefer code over config? Use the builder directly:

@Bean
public ChatModel chatModel() {
    return ChatModel.of("http://127.0.0.1:11434/api/chat")
            .standard("ollama")      // or .provider("ollama") pre-4.0
            .model("llama3.2")
            .timeout(Duration.ofSeconds(60))
            .build();
}

3.3 Supported Model Providers

The standard (or provider) field selects the dialect:

Standard	Example `apiUrl`	Models
`openai` (default)	`https://api.openai.com/v1/chat/completions`	GPT, DeepSeek, Qwen, GLM, Kimi, etc.
`ollama`	`http://127.0.0.1:11434/api/chat`	Any local Ollama model
`anthropic`	`https://api.anthropic.com/v1/messages`	Claude
`gemini`	`https://generativelanguage.googleapis.com/v1beta/models/...`	Gemini
`dashscope`	Aliyun DashScope endpoint	Qwen (DashScope native)

4. Synchronous Calls (The Simple Way)

The most basic use case — send a prompt and get a full response:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;

@Component
public class ChatService {
    @Inject
    ChatModel chatModel;

    public String ask(String question) throws IOException {
        ChatResponse resp = chatModel.prompt(question).call();
        return resp.getMessage().getContent();
    }
}

That's it. Three lines of business code.

5. Streaming Calls (Real-Time Responses)

For chatbots and assistants, streaming is essential. ChatModel returns a Reactor Flux<ChatResponse>:

import reactor.core.publisher.Flux;

public Flux<String> askStream(String question) throws IOException {
    return chatModel.prompt(question)
            .stream()
            .filter(ChatResponse::hasContent)       // skip empty chunks
            .map(resp -> resp.getMessage().getContent());
}

You can then subscribe, or — if you're using Solon Web Reactive — return the Flux directly to an SSE endpoint:

import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;

@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
    return chatModel.prompt(prompt)
            .stream()
            .filter(ChatResponse::hasContent)
            .map(resp -> new SseEvent()
                    .data(resp.getMessage().getContent()));
}

The streaming protocol uses standard SSE (text/event-stream) or x-ndjson depending on the provider.

6. Conversation Memory with ChatSession

LLMs are stateless. To maintain conversation context, you need to pass history with each request. ChatSession handles this automatically.

6.1 Basic Session Usage

import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;

ChatSession session = InMemoryChatSession.builder()
        .sessionId("user-123")
        .maxMessages(10)     // keep last 10 turns
        .build();

// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
        .session(session)
        .call();

// Second turn — model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
        .session(session)
        .call();

6.2 Web Chat with Per-User Sessions

In a real web app, you'll want one session per user. Here's a controller that does exactly that:

import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Controller
public class ChatController {
    @Inject
    ChatModel chatModel;

    final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();

    @Mapping("/chat")
    public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
        ChatSession session = sessionMap.computeIfAbsent(sessionId,
                k -> InMemoryChatSession.builder().sessionId(k).build());

        return chatModel.prompt(prompt)
                .session(session)
                .options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
                .stream()
                .filter(ChatResponse::hasContent)
                .map(resp -> new SseEvent().data(resp.getMessage().getContent()));
    }
}

6.3 Built-in Session Implementations

Implementation	Storage	Use Case
`InMemoryChatSession`	Local Map	Dev, single-node
`FileChatSession`	File system	CLI tools, desktop apps
`RedisChatSession`	Redis	Production, distributed

7. Fine-Tuning with ChatOptions

Control model behavior per-request with ChatOptions:

chatModel.prompt("Write a poem about Java")
        .options(o -> o
            .temperature(0.8)
            .max_tokens(500)
            .top_p(0.9)
            .systemPrompt("You are a creative poet."))
        .call();

Common options include:

Method	Description
`temperature(val)`	Sampling temperature (0.0–2.0)
`max_tokens(val)`	Max output tokens
`top_p(val)`	Nucleus sampling
`top_k(val)`	Top-K sampling
`frequency_penalty(val)`	Reduce repetition
`presence_penalty(val)`	Encourage new topics
`tool_choice(val)`	Force tool use: `none`, `auto`, `required`, or tool name
`systemPrompt(val)`	System message for this request
`role(val)`	Agent role (v3.9.1+)
`instruction(val)`	Agent instruction (v3.9.1+)

8. Multi-Message Prompts

Sometimes you need more than a simple string. Use Prompt and ChatMessage:

import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;

Prompt prompt = Prompt.of(
    ChatMessage.ofSystem("You translate English to French."),
    ChatMessage.ofUser("Hello, how are you?"),
    ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
    ChatMessage.ofUser("What is your name?")
);

ChatResponse resp = chatModel.prompt(prompt).call();

9. Putting It All Together: A Practical Example

Let's build a simple knowledge-aware chatbot — the kind of RAG-lite pattern you see in real projects. This example uses ChatMessage.ofUserAugment() to inject context into the prompt:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;

@Component
public class KnowledgeChatbot {
    @Inject
    ChatModel chatModel;

    public String answer(String question, String referenceContext) throws Exception {
        // Augment the user message with reference context
        ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);

        ChatResponse resp = chatModel.prompt(augmented)
                .options(o -> o
                    .temperature(0.3)
                    .systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
                .call();

        return resp.getMessage().getContent();
    }
}

This pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.

10. What's Next?

ChatModel is just the entry point. Solon AI also offers:

Tool Calling — define @ToolMapping methods the LLM can invoke
Talent System — reusable capability modules (Skill-like)
Agents — ReActAgent and TeamAgent for multi-step reasoning
RAG — full pipeline with document loading, splitting, embedding, and retrieval
MCP Protocol — connect to MCP servers for external tools

For the full documentation, check out the official Solon AI guide:

👉 https://solon.noear.org/article/918 (Model construction)
👉 https://solon.noear.org/article/920 (API reference)

Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.