Multi-Agent Orchestration in Go

I spent a whole day debugging why my agent kept hallucinating tool calls. The model would output "Action: web_search" but forget the "Action Input" part, or mix up the format entirely. LangChainGo's ConversationalAgent expects a specific pattern: Thought, Action, Action Input. Local models basically struggle with it.

The fix wasn't tuning the prompt. It was abandoning the framework entirely.

This post shows the pipeline architecture I ended up with: three specialized agents (research, write, edit) that pass structured data between each other using direct LLM calls. No agent framework parsing. No format gymnastics. It just works. Honestly, I should have done this from the start.

Stop Overcomplicating This

Single agents are like hiring one person to be your lawyer, accountant, and chef. They can do all three, but badly. Splitting into specialists gives you clearer prompts, better error isolation, and the ability to parallelize. Your fact-checker agent works across content types without rewriting prompts.

I've tried supervisor patterns and handoff mechanisms. They add complexity you rarely need. The pipeline pattern (Researcher → Writer → Editor → Output) is the easiest to implement correctly and maps cleanly to Go's concurrency model. Supervisors make sense when you need runtime judgment about which specialist to invoke. Handoffs work for conversational bots that pivot mid-stream. For document generation and ETL workflows, use a pipeline.

The problem with LangChainGo's agent framework is basically that it requires models to output this exact format:

Thought: I need to search for this
Action: web_search
Action Input: AI impact on software engineering

Local models mangle this constantly. They skip the Thought section, hallucinate nonexistent actions, or merge fields into gibberish. Direct LLM calls eliminate this parsing overhead. You call GenerateContent(), manually invoke tools when needed, and pass results back. The code is simpler, debugging is easier, and local models behave predictably. No more regex parsing nightmares.

The Code

Three agents using direct LLM calls with streaming support, built on a shared BaseAgent foundation. Here's the actual project structure:

2-multi-agent-pipeline/
├── main.go
├── internal/
│   ├── agents/
│   │   ├── base.go              # Shared LLM calling logic
│   │   ├── simple_researcher.go # Research agent
│   │   ├── writer.go            # Writer agent
│   │   ├── editor.go            # Editor agent
│   │   ├── errors.go            # Structured error types
│   │   ├── options.go           # Configuration options
│   │   └── pipeline.go          # Pipeline runner abstraction
│   └── tools/
│       └── search.go            # Brave Search tool
├── go.mod
└── go.sum

Base Agent: The Foundation

All agents share common LLM interaction logic through a BaseAgent struct. This basically eliminates duplication and provides consistent streaming, retries, and logging:

// filename: base.go
package agents

import (
    "context"
    "fmt"
    "log"
    "strings"
    "time"

    "github.com/tmc/langchaingo/llms"
)

type StreamHandler func(chunk string)

// Agent is the common interface for all pipeline agents
type Agent interface {
    ExecuteWithStream(ctx context.Context, input string, handler StreamHandler) (string, error)
}

type BaseAgent struct {
    llm llms.Model
}

func NewBaseAgent(llm llms.Model) BaseAgent {
    return BaseAgent{llm: llm}
}

func (b *BaseAgent) callLLM(
    ctx context.Context,
    prompt, system string,
    temp float64,
    maxTokens int,
    handler StreamHandler,
    logPrefix string,
) (string, error) {
    content := []llms.MessageContent{
        llms.TextParts(llms.ChatMessageTypeSystem, system),
        llms.TextParts(llms.ChatMessageTypeHuman, prompt),
    }

    if handler != nil {
        result, err := b.streamLLM(ctx, content, temp, maxTokens, handler, logPrefix)
        if err == nil {
            return result, nil
        }
    }

    return b.simpleLLM(ctx, content, temp, maxTokens)
}

func (b *BaseAgent) streamLLM(
    ctx context.Context,
    content []llms.MessageContent,
    temp float64,
    maxTokens int,
    handler StreamHandler,
    logPrefix string,
) (string, error) {
    var response strings.Builder
    chunkCount := 0

    _, err := b.llm.GenerateContent(ctx, content,
        llms.WithTemperature(temp),
        llms.WithMaxTokens(maxTokens),
        llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
            if len(chunk) == 0 {
                if chunkCount < 5 && logPrefix != "" {
                    log.Printf("[%s] Empty chunk #%d received", logPrefix, chunkCount+1)
                }
                chunkCount++
                return nil
            }
            str := string(chunk)
            response.WriteString(str)
            if handler != nil {
                handler(str)
            }
            chunkCount++
            if chunkCount == 1 && logPrefix != "" {
                log.Printf("[%s] First chunk received (%d bytes): %.50s...", logPrefix, len(chunk), str)
            }
            return nil
        }),
    )

    if err != nil {
        return "", err
    }

    if logPrefix != "" {
        log.Printf("[%s] Total chunks: %d, Total bytes: %d", logPrefix, chunkCount, response.Len())
    }
    return response.String(), nil
}

func (b *BaseAgent) simpleLLM(
    ctx context.Context,
    content []llms.MessageContent,
    temp float64,
    maxTokens int,
) (string, error) {
    const maxRetries = 3
    var lastErr error

    for i := 0; i < maxRetries; i++ {
        resp, err := b.llm.GenerateContent(ctx, content,
            llms.WithTemperature(temp),
            llms.WithMaxTokens(maxTokens),
        )
        if err == nil {
            if len(resp.Choices) == 0 {
                return "", ErrNoResponse
            }
            return resp.Choices[0].Content, nil
        }

        lastErr = err
        if i < maxRetries-1 {
            log.Printf("[LLM] Retry %d/%d after error: %v", i+1, maxRetries, err)
            time.Sleep(time.Second * time.Duration(i+1))
        }
    }

    return "", fmt.Errorf("failed after %d retries: %w", maxRetries, lastErr)
}

Error Handling

Structured errors make debugging easier. Each agent failure basically includes context about which agent failed and in what phase:

// filename: errors.go
package agents

import (
    "errors"
    "fmt"
)

// Agent errors for better error handling
type Error struct {
    Agent string
    Phase string
    Cause error
}

func (e *Error) Error() string {
    return fmt.Sprintf("%s agent failed during %s: %v", e.Agent, e.Phase, e.Cause)
}

func (e *Error) Unwrap() error {
    return e.Cause
}

// Common errors
var (
    ErrNoResponse       = errors.New("LLM returned no response")
    ErrContextCancelled = errors.New("operation cancelled")
    ErrSearchFailed     = errors.New("search tool failed")
)

// Result holds agent output with metadata
type Result struct {
    Content   string
    CharCount int
    Phase     string
}

func NewResult(content, phase string) Result {
    return Result{
        Content:   content,
        CharCount: len(content),
        Phase:     phase,
    }
}

Configuration Options

Functional options pattern for agent configuration. Pretty standard Go approach:

// filename: options.go
package agents

import "github.com/tmc/langchaingo/llms"

// Option configures an Agent
type Option func(*config)

type config struct {
    temperature  float64
    maxTokens    int
    systemPrompt string
    logPrefix    string
}

func defaultConfig() config {
    return config{
        temperature: 0.5,
        maxTokens:   2000,
        logPrefix:   "Agent",
    }
}

func WithTemperature(t float64) Option {
    return func(c *config) {
        c.temperature = t
    }
}

func WithMaxTokens(n int) Option {
    return func(c *config) {
        c.maxTokens = n
    }
}

func WithSystemPrompt(p string) Option {
    return func(c *config) {
        c.systemPrompt = p
    }
}

func WithLogPrefix(p string) Option {
    return func(c *config) {
        c.logPrefix = p
    }
}

// LLMCaller handles LLM interactions with configuration
type LLMCaller struct {
    llm    llms.Model
    config config
}

func NewLLMCaller(llm llms.Model, opts ...Option) LLMCaller {
    cfg := defaultConfig()
    for _, opt := range opts {
        opt(&cfg)
    }
    return LLMCaller{llm: llm, config: cfg}
}

The Search Tool

The researcher needs search. I use Brave Search API because it returns clean results without the SEO noise of Google. Honestly, it's way better than dealing with Google's API nonsense:

// filename: search.go
package tools

import (
    "context"
    "encoding/json"
    "fmt"
    "io"
    "log"
    "net/http"
    "net/url"
    "os"
    "time"
)

type Search struct {
    client *http.Client
    apiKey string
}

func NewSearch() *Search {
    return &Search{
        client: &http.Client{Timeout: 10 * time.Second},
        apiKey: os.Getenv("BRAVE_API_KEY"),
    }
}

func (s *Search) Name() string { return "web_search" }

func (s *Search) Description() string {
    return "Search the web using Brave Search API. Returns titles, descriptions, and URLs."
}

type BraveResponse struct {
    Web struct {
        Results []struct {
            Title       string `json:"title"`
            URL         string `json:"url"`
            Description string `json:"description"`
        } `json:"results"`
    } `json:"web"`
}

func (s *Search) Call(ctx context.Context, input string) (string, error) {
    if s.apiKey == "" {
        return s.mockResults(input), nil
    }
    return s.braveSearch(ctx, input)
}

func (s *Search) braveSearch(ctx context.Context, input string) (string, error) {
    params := url.Values{"q": {input}, "count": {"10"}}
    apiURL := "https://api.search.brave.com/res/v1/web/search?" + params.Encode()

    req, err := http.NewRequestWithContext(ctx, "GET", apiURL, nil)
    if err != nil {
        return "", fmt.Errorf("create request: %w", err)
    }

    req.Header.Set("X-Subscription-Token", s.apiKey)
    req.Header.Set("Accept", "application/json")

    resp, err := s.client.Do(req)
    if err != nil {
        return "", fmt.Errorf("search request: %w", err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)

    if resp.StatusCode != http.StatusOK {
        return "", fmt.Errorf("API status %d: %s", resp.StatusCode, string(body))
    }

    log.Printf("[Brave] Response: %.300s...", string(body))

    var braveResp BraveResponse
    if err := json.Unmarshal(body, &braveResp); err != nil {
        return "", fmt.Errorf("parse results: %w", err)
    }

    return s.formatResults(input, braveResp.Web.Results), nil
}

func (s *Search) formatResults(
    query string,
    results []struct {
        Title       string `json:"title"`
        URL         string `json:"url"`
        Description string `json:"description"`
    },
) string {
    formatted := make([]map[string]string, len(results))
    for i, r := range results {
        formatted[i] = map[string]string{
            "title":   r.Title,
            "snippet": r.Description,
            "source":  r.URL,
        }
    }

    output := map[string]interface{}{
        "query":   query,
        "results": formatted,
    }

    data, _ := json.Marshal(output)
    return string(data)
}

func (s *Search) mockResults(input string) string {
    return fmt.Sprintf(`{"query": "%s", "results": [
        {"title": "Guide to %s", "snippet": "Key facts about %s", "source": "example.com"},
        {"title": "%s - Wikipedia", "snippet": "Encyclopedia article", "source": "wikipedia.org"},
        {"title": "Research on %s", "snippet": "Academic papers", "source": "research-db.example"}
    ]}`, input, input, input, input, input)
}

The Agents

Each specialized agent basically embeds BaseAgent and focuses on its specific task:

// filename: simple_researcher.go
package agents

import (
    "context"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    langchaintools "github.com/tmc/langchaingo/tools"
)

type SimpleResearcherAgent struct {
    BaseAgent
    tools []langchaintools.Tool
}

func NewSimpleResearcher(llm llms.Model, tools []langchaintools.Tool) *SimpleResearcherAgent {
    return &SimpleResearcherAgent{
        BaseAgent: NewBaseAgent(llm),
        tools:     tools,
    }
}

func (r *SimpleResearcherAgent) findSearchTool() langchaintools.Tool {
    for _, t := range r.tools {
        if t.Name() == "web_search" {
            return t
        }
    }
    return nil
}

func (r *SimpleResearcherAgent) ExecuteWithStream(
    ctx context.Context,
    topic string,
    handler StreamHandler,
) (string, error) {
    prompt := r.buildPrompt(ctx, topic)
    return r.callLLM(ctx, prompt, "You are a research specialist.", 0.5, 2000, handler, "Researcher")
}

func (r *SimpleResearcherAgent) buildPrompt(ctx context.Context, topic string) string {
    searchResults := r.getSearchResults(ctx, topic)

    if searchResults == "" {
        return fmt.Sprintf(`Research "%s". Provide comprehensive notes with facts, statistics, trends, and sources.`, topic)
    }

    return fmt.Sprintf(`Research "%s" based on these search results:

%s

Provide comprehensive notes with facts, statistics, trends, and sources.`, topic, searchResults)
}

func (r *SimpleResearcherAgent) getSearchResults(ctx context.Context, topic string) string {
    searchTool := r.findSearchTool()
    if searchTool == nil {
        return ""
    }

    results, err := searchTool.Call(ctx, topic)
    if err != nil {
        log.Printf("[Search] Failed: %v", err)
        return ""
    }

    log.Printf("[Search] Got %d chars", len(results))
    return results
}

Writer and Editor follow basically the same pattern with different prompts and temperatures:

// filename: writer.go
package agents

import (
    "context"
    "fmt"

    "github.com/tmc/langchaingo/llms"
)

type WriterAgent struct {
    BaseAgent
}

func NewWriter(llm llms.Model) *WriterAgent {
    return &WriterAgent{BaseAgent: NewBaseAgent(llm)}
}

func (w *WriterAgent) ExecuteWithStream(
    ctx context.Context,
    research string,
    handler StreamHandler,
) (string, error) {
    prompt := fmt.Sprintf(`Transform these research notes into an engaging article:

%s

Write with compelling introduction, clear section headings (Markdown ##), factual accuracy, strong conclusion, and professional tone.`, research)

    return w.callLLM(ctx, prompt, "You are a professional writer.", 0.7, 2000, handler, "Writer")
}

// filename: editor.go
package agents

import (
    "context"
    "fmt"

    "github.com/tmc/langchaingo/llms"
)

type EditorAgent struct {
    BaseAgent
}

func NewEditor(llm llms.Model) *EditorAgent {
    return &EditorAgent{BaseAgent: NewBaseAgent(llm)}
}

func (e *EditorAgent) ExecuteWithStream(
    ctx context.Context,
    draft string,
    handler StreamHandler,
) (string, error) {
    prompt := fmt.Sprintf(`Edit this article for grammar, clarity, structure, and tone:

%s

Output the polished version directly. No commentary.`, draft)

    return e.callLLM(ctx, prompt, "You are a meticulous editor.", 0.3, 2000, handler, "Editor")
}

Pipeline Runner

For more complex workflows, the PipelineRunner abstraction basically lets you compose agents declaratively:

// filename: pipeline.go
package agents

import (
    "context"
    "fmt"
)

// PipelineStep represents one stage in the pipeline
type PipelineStep struct {
    Name   string
    Agent  Agent
    Prompt func(string) string // Transform previous output into prompt
}

// PipelineRunner executes a sequence of agents
type PipelineRunner struct {
    steps []PipelineStep
}

func NewPipelineRunner(steps ...PipelineStep) *PipelineRunner {
    return &PipelineRunner{steps: steps}
}

// Run executes the pipeline sequentially
func (p *PipelineRunner) Run(
    ctx context.Context,
    initialInput string,
    onStep func(name, output string),
) (string, error) {
    input := initialInput

    for _, step := range p.steps {
        select {
        case <-ctx.Done():
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: ErrContextCancelled}
        default:
        }

        prompt := input
        if step.Prompt != nil {
            prompt = step.Prompt(input)
        }

        output, err := step.Agent.ExecuteWithStream(ctx, prompt, nil)
        if err != nil {
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: err}
        }

        if onStep != nil {
            onStep(step.Name, output)
        }

        input = output
    }

    return input, nil
}

// RunWithStream executes with streaming for the final step only
func (p *PipelineRunner) RunWithStream(
    ctx context.Context,
    initialInput string,
    handler StreamHandler,
) (string, error) {
    if len(p.steps) == 0 {
        return "", fmt.Errorf("no steps in pipeline")
    }

    input := initialInput

    // Run all but last step without streaming
    for _, step := range p.steps[:len(p.steps)-1] {
        output, err := step.Agent.ExecuteWithStream(ctx, input, nil)
        if err != nil {
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: err}
        }
        input = output
    }

    // Final step with streaming
    lastStep := p.steps[len(p.steps)-1]
    return lastStep.Agent.ExecuteWithStream(ctx, input, handler)
}

Pipeline Runner

For more complex workflows, the PipelineRunner abstraction basically lets you compose agents declaratively:

// filename: writer.go
package agents

import (
    "context"
    "fmt"

    "github.com/tmc/langchaingo/llms"
)

// PipelineStep represents one stage in the pipeline
type PipelineStep struct {
    Name   string
    Agent  Agent
    Prompt func(string) string // Transform previous output into prompt
}

// PipelineRunner executes a sequence of agents
type PipelineRunner struct {
    steps []PipelineStep
}

func NewPipelineRunner(steps ...PipelineStep) *PipelineRunner {
    return &PipelineRunner{steps: steps}
}

// Run executes the pipeline sequentially
func (p *PipelineRunner) Run(
    ctx context.Context,
    initialInput string,
    onStep func(name, output string),
) (string, error) {
    input := initialInput

    for _, step := range p.steps {
        select {
        case <-ctx.Done():
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: ErrContextCancelled}
        default:
        }

        prompt := input
        if step.Prompt != nil {
            prompt = step.Prompt(input)
        }

        output, err := step.Agent.ExecuteWithStream(ctx, prompt, nil)
        if err != nil {
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: err}
        }

        if onStep != nil {
            onStep(step.Name, output)
        }

        input = output
    }

    return input, nil
}

// RunWithStream executes with streaming for the final step only
func (p *PipelineRunner) RunWithStream(
    ctx context.Context,
    initialInput string,
    handler StreamHandler,
) (string, error) {
    if len(p.steps) == 0 {
        return "", fmt.Errorf("no steps in pipeline")
    }

    input := initialInput

    // Run all but last step without streaming
    for _, step := range p.steps[:len(p.steps)-1] {
        output, err := step.Agent.ExecuteWithStream(ctx, input, nil)
        if err != nil {
            return "", &Error{Agent: step.Name, Phase: "execution", Cause: err}
        }
        input = output
    }

    // Final step with streaming
    lastStep := p.steps[len(p.steps)-1]
    return lastStep.Agent.ExecuteWithStream(ctx, input, handler)
}

Main Orchestrator

The pipeline wires everything together with detailed logging. Here's what that looks like:

// filename: main.go
package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "time"

    "github.com/k1ng440/go-llm-demo/2-multi-agent-pipeline/internal/agents"
    "github.com/k1ng440/go-llm-demo/2-multi-agent-pipeline/internal/tools"
    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/ollama"
    langchaintools "github.com/tmc/langchaingo/tools"
)

type Pipeline struct {
    researcher *agents.SimpleResearcherAgent
    writer     *agents.WriterAgent
    editor     *agents.EditorAgent
}

func NewPipeline() (*Pipeline, error) {
    model := "minimax-m2.7:cloud" // Ollama Cloud; or "qwen3.5:9b" / "llama3.1:8b" for local

    llm, err := ollama.New(
        ollama.WithModel(model),
        ollama.WithPredictMirostat(0),
    )
    if err != nil {
        return nil, err
    }

    testCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    _, err = llm.Call(testCtx, "Hi", llms.WithTemperature(0.1))
    if err != nil {
        return nil, fmt.Errorf("model %s not available (run: ollama pull %s): %w", model, model, err)
    }

    searchTools := []langchaintools.Tool{tools.NewSearch()}

    return &Pipeline{
        researcher: agents.NewSimpleResearcher(llm, searchTools),
        writer:     agents.NewWriter(llm),
        editor:     agents.NewEditor(llm),
    }, nil
}

func (p *Pipeline) Run(ctx context.Context, topic string) (string, error) {
    log.Println("[1/3] Research...")
    log.Println("[Streaming] Starting...")
    research, err := p.researcher.ExecuteWithStream(ctx, topic, func(chunk string) {
        fmt.Print(chunk)
        os.Stdout.Sync()
    })
    fmt.Println()
    if err != nil {
        return "", fmt.Errorf("research: %w", err)
    }
    log.Printf("      -> %d chars", len(research))
    log.Printf("      -> Preview: %.100s...", research)

    log.Println("")
    log.Println("[2/3] Writing...")
    log.Printf("      -> Input: %d chars of research", len(research))
    log.Println("[Streaming] Starting...")
    draft, err := p.writer.ExecuteWithStream(ctx, research, func(chunk string) {
        fmt.Print(chunk)
        os.Stdout.Sync()
    })
    fmt.Println()
    if err != nil {
        return "", fmt.Errorf("writing: %w", err)
    }
    log.Printf("      -> %d chars", len(draft))
    log.Printf("      -> Preview: %.100s...", draft)

    log.Println("")
    log.Println("[3/3] Editing...")
    log.Printf("      -> Input: %d chars of draft", len(draft))
    log.Println("[Streaming] Starting...")
    final, err := p.editor.ExecuteWithStream(ctx, draft, func(chunk string) {
        fmt.Print(chunk)
        os.Stdout.Sync()
    })
    fmt.Println()
    if err != nil {
        return "", fmt.Errorf("editing: %w", err)
    }
    log.Printf("      -> %d chars", len(final))
    log.Printf("      -> Preview: %.100s...", final)

    return final, nil
}

func main() {
    log.Println("Multi-Agent Pipeline Demo")
    log.Println("=========================")

    pipeline, err := NewPipeline()
    if err != nil {
        log.Fatal(err)
    }

    topic := "The impact of AI on software engineering workflows"

    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    result, err := pipeline.Run(ctx, topic)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println("")
    fmt.Println("=========================")
    fmt.Println("FINAL OUTPUT")
    fmt.Println("=========================")
    fmt.Println(result)
    fmt.Printf("\nTotal: %d characters\n", len(result))
}

To actually run this:

# Get a Brave Search API key from https://brave.com/search/api/
export BRAVE_API_KEY="your-api-key-here"

# Pull a local model (skip if using Ollama Cloud)
ollama pull qwen3.5:9b

# Run the pipeline
go run main.go

For Ollama Cloud models like minimax-m2.7:cloud, you need a Pro or Max subscription. Local models like qwen3.5:9b or llama3.1:8b run on your own hardware.

Without BRAVE_API_KEY, the search tool basically returns mock data. Which is fine for testing.

Handling Failures

The naive pipeline fails on any error. Production basically needs fallback logic.

Retry the research step a few times before giving up. Here's how:

// filename: main.go
func (p *Pipeline) RunWithFallback(ctx context.Context, topic string) (string, error) {
    var research string
    var err error

    for i := 0; i < 3; i++ {
        research, err = p.researcher.ExecuteWithStream(ctx, topic, nil)
        if err == nil {
            break
        }
        log.Printf("[Pipeline] Research attempt %d failed: %v", i+1, err)
        time.Sleep(time.Second * 2)
    }

    if err != nil {
        log.Println("[Pipeline] Using fallback research")
        research = fmt.Sprintf("Basic information about: %s", topic)
    }

    // Continue with writer and editor...
}

Sometimes you basically want intermediate results even if later stages fail:

// filename: main.go
type PipelineResult struct {
    Research  string
    Draft     string
    Final     string
    Errors    []error
    Completed bool
}

func (p *Pipeline) RunPartial(ctx context.Context, topic string) PipelineResult {
    result := PipelineResult{}

    research, err := p.researcher.ExecuteWithStream(ctx, topic, nil)
    if err != nil {
        result.Errors = append(result.Errors, fmt.Errorf("research: %w", err))
        return result
    }
    result.Research = research

    draft, err := p.writer.ExecuteWithStream(ctx, research, nil)
    if err != nil {
        result.Errors = append(result.Errors, fmt.Errorf("writer: %w", err))
        result.Draft = "[Failed to generate draft]"
        result.Final = research // Return research as fallback
        return result
    }
    result.Draft = draft

    final, err := p.editor.ExecuteWithStream(ctx, draft, nil)
    if err != nil {
        result.Errors = append(result.Errors, fmt.Errorf("editor: %w", err))
        result.Final = draft // Return draft if editor fails
        return result
    }
    result.Final = final
    result.Completed = true

    return result
}

Parallel Execution

When agents can work independently, use goroutines. You'll need to import sync:

// filename: main.go
import "sync"

func (p *Pipeline) ResearchParallel(ctx context.Context, topics []string) ([]string, error) {
    results := make([]string, len(topics))
    errList := make([]error, len(topics))

    var wg sync.WaitGroup
    for i, topic := range topics {
        wg.Add(1)
        go func(idx int, t string) {
            defer wg.Done()
            results[idx], errList[idx] = p.researcher.ExecuteWithStream(ctx, t, nil)
        }(i, topic)
    }
    wg.Wait()

    var combined []string
    for i, err := range errList {
        if err != nil {
            log.Printf("[Parallel] Topic %d failed: %v", i, err)
            continue
        }
        combined = append(combined, results[i])
    }

    if len(combined) == 0 {
        return nil, fmt.Errorf("all parallel research tasks failed")
    }

    return combined, nil
}

Don't Let Context Bleed

The biggest trap in multi-agent systems is earlier agents' reasoning confusing later ones. When agents share memory, you basically get contamination.

Use explicit handoffs with structured data. Here's what I mean:

// filename: writer.go
func (w *WriterAgent) ExecuteFromResearch(
    ctx context.Context,
    research *ResearchOutput,
    handler StreamHandler,
) (string, error) {
    prompt := fmt.Sprintf(`
Write an article about "%s" using these research findings:

Key facts: %v
Sources: %v
Statistics: %v
Notes: %s`,
        research.Topic,
        research.KeyFacts,
        research.Sources,
        research.Statistics,
        research.RawNotes,
    )

    return w.callLLM(ctx, prompt, "You are a professional writer.", 0.7, 2000, handler, "Writer")
}

Debugging

When things break, you basically need to know which agent failed and why. The BaseAgent already logs extensively, but you can add a tracer wrapper:

// filename: tracer.go
package observability

import (
    "context"
    "log"
    "time"
)

type AgentTracer struct {
    AgentName string
}

func (t *AgentTracer) Trace(
    ctx context.Context,
    input string,
    fn func() (string, error),
) (string, error) {
    start := time.Now()
    log.Printf("[Trace] %s started | input: %d chars", t.AgentName, len(input))

    output, err := fn()

    duration := time.Since(start)
    if err != nil {
        log.Printf("[Trace] %s FAILED after %v | error: %v", t.AgentName, duration, err)
    } else {
        log.Printf("[Trace] %s completed in %v | output: %d chars", t.AgentName, duration, len(output))
    }

    return output, err
}

Picking Models (And Knowing When to Stop)

Not every agent needs the same model. Different models excel at different tasks.

Research is mostly factual lookup. Fast models (Minimax, Qwen3.5, Llama) do this well. Writing needs creativity and flow. Premium models (Claude Opus 4, GPT-4o) shine here. Editing is about following rules. Mid-tier models (Claude Sonnet, GPT-4o-mini) are basically sufficient.

The quality gap is task-dependent. A fast model extracting facts from Brave Search performs nearly as well as Opus 4. But ask it to write engaging prose and the gap is pretty obvious.

Start with fast models everywhere. Upgrade individual agents to premium only when you see specific quality gaps. Usually this means keeping research on fast models, upgrading writing to premium if the output needs to be engaging, and using mid-tier for editing, or skipping the editor entirely if the writer is good enough.

More agents is not always better. Agent overhead is real: each adds latency and debugging complexity grows quadratically with handoffs. I follow one rule: one agent per distinct kind of work. Research, writing, and editing are different. "Research part A" and "Research part B" are basically the same task. Use parallel execution, not separate agents.

Things That Will Bite You

The working code is on GitHub if you want to see the full implementation.

Before you try to productionize this, watch out for these issues:

Context Window Limits
Each agent passes its full output to the next. A 4K research output becomes a 6K draft, then an editor prompt that includes both. Cumulative growth adds up fast. Most local models have 32K-128K context windows, so you won't hit the hard limit immediately, but you pay for every token in latency and cost. Track your per-stage token counts or your pipeline will balloon unnecessarily.

Go Concurrency Traps
The parallel research example above shares the same llms.Model across goroutines. Most LLM providers rate-limit by API key, not by connection. If you spawn 10 parallel researchers against Ollama Cloud, you'll hit quota errors. Add a semaphore or pool your model instances.

Streaming and Timeouts
The ExecuteWithStream pattern looks clean but basically complicates error handling. If the LLM connection drops mid-stream, you get a partial response that looks like success. Check your response length against expected ranges, or validate the output has a proper ending marker before declaring victory.

Start with a 2-agent pipeline. Add the editor only when you see specific failure modes it would fix. Measure latency at each step. Fix the slowest agent first. That's basically it.