Tian AI Architecture Deep Dive: Building a Multi-Engine AI System

This post takes a deep technical look at the architecture of Tian AI — an open-source, self-evolving local AI system. If you haven't read the overview, check out Tian AI: The Self-Evolving AI System Powered by Qwen2.5.

Project Architecture Overview

Tian AI is organized as a multi-engine system with six core modules. Here's the complete architecture:

┌──────────────────────────────────────────────────────────────────┐
│                     CLI / Web / Gradio UI                         │
├──────────────────────────────────────────────────────────────────┤
│                        Flask API Layer                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐    │
│  │                    Thinker (LLM Engine)                    │    │
│  │  ┌────────────┐  ┌────────────┐  ┌──────────────────┐    │    │
│  │  │  Fast Mode  │  │  CoT Mode  │  │    Deep Mode     │    │    │
│  │  │ (single     │  │ (step-by-  │  │ (multi-view +    │    │    │
│  │  │  pass)      │  │  step)     │  │  reflection)     │    │    │
│  │  └────────────┘  └────────────┘  └──────────────────┘    │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                     Talker (Dialog)                       │    │
│  │  Short-term Memory  |  Long-term Memory  |  Emotion      │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │              Knowledge Retriever (RAG)                    │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ Concept      │  │ Pattern      │  │ LLM-Augment  │   │    │
│  │  │ Extraction   │  │ Matching     │  │ Generation   │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                  Agent Scheduler                          │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ TaskQueue    │  │ Priority     │  │ Security     │   │    │
│  │  │ (dependency  │  │ Scheduler    │  │ Whitelist    │   │    │
│  │  │  sorting)    │  │              │  │              │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                Self-Evolution System                      │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ AST Analysis  │  │ LLM Suggest │  │ Auto-Patch   │   │    │
│  │  │ (code scan)   │  │ (improvement)│  │ (backup +    │   │    │
│  │  │               │  │              │  │  verify)     │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  │  ┌──────────────┐  ┌──────────────┐                     │    │
│  │  │ XP System    │  │ Version      │                     │    │
│  │  │ (leveling)   │  │ Manager      │                     │    │
│  │  └──────────────┘  └──────────────┘                     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                   │
├──────────────────────────────────────────────────────────────────┤
│                   LLMManager (Process Lifecycle)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │ Process Spawn│  │ Health Check │  │ Auto-Restart │           │
│  └──────────────┘  └──────────────┘  └──────────────┘           │
├──────────────────────────────────────────────────────────────────┤
│              llama.cpp Backend (Qwen2.5-1.5B GGUF)               │
└──────────────────────────────────────────────────────────────────┘

The system uses Flask as the central web server, with Gradio as an alternative UI frontend. All communication between modules goes through Python function calls (direct in-process calls) rather than microservice RPC, keeping latency minimal on constrained devices.

1. Thinker — The Three-Layer Reasoning Engine

The Thinker is the most critical module. It wraps the local LLM with three distinct reasoning strategies, each optimized for different query types.

Fast Mode (Default)

For simple queries — greetings, fact lookup, straightforward questions. Single pass through the LLM with minimal context.

# Simplified Fast Mode flow
def fast_think(query: str, context: str = "") -> str:
    prompt = _build_simple_prompt(query, context)
    # Single LLM call, no chaining
    response = llm.generate(prompt, max_tokens=256, temperature=0.3)
    return response

Characteristics:

Single LLM call, no chaining
Low temperature (0.3) for deterministic answers
256 max tokens for speed
~1-3 seconds on mobile hardware

Chain-of-Thought Mode

For problems that benefit from step-by-step reasoning. The LLM is prompted to reason aloud before answering.

# Simplified CoT Mode flow
def cot_think(query: str) -> str:
    prompt = f"""Question: {query}

Let's solve this step by step:
1) First, I need to understand what's being asked...
2) Let me break down the key components...
3) Considering each part...
4) Therefore, the answer is:"""

    full_response = llm.generate(prompt, max_tokens=512, temperature=0.5)
    # Extract final answer from reasoning chain
    answer = _extract_final_answer(full_response)
    return answer

Key implementation details:

Higher temperature (0.5) allows creative reasoning paths
512 max tokens to accommodate the reasoning chain
Answer extraction uses regex patterns to find the final conclusion
Context window management: the reasoning process is truncated if it exceeds the model's limit

Deep Mode

For complex analysis requiring multi-perspective evaluation and reflection. This is the most sophisticated mode.

# Simplified Deep Mode flow
def deep_think(query: str) -> dict:
    # Step 1: Generate multiple perspectives
    perspectives = [
        _ask_perspective(query, "technical"),
        _ask_perspective(query, "ethical"),
        _ask_perspective(query, "practical")
    ]

    # Step 2: Cross-perspective synthesis
    synthesis_prompt = f"""
    Original query: {query}

    Perspectives gathered:
    1. Technical: {perspectives[0]}
    2. Ethical: {perspectives[1]}
    3. Practical: {perspectives[2]}

    Synthesize these perspectives into a comprehensive answer.
    Note areas of agreement and disagreement.
    Provide a balanced final assessment.
    """

    synthesis = llm.generate(synthesis_prompt, max_tokens=768, temperature=0.7)

    # Step 3: Self-reflection
    reflection_prompt = f"""
    Original query: {query}
    My synthesized answer: {synthesis}

    Critically evaluate your own answer. What might be missing?
    What assumptions were made? Is the reasoning sound?
    """

    reflection = llm.generate(reflection_prompt, max_tokens=256, temperature=0.4)

    return {
        "perspectives": perspectives,
        "synthesis": synthesis,
        "reflection": reflection,
        "final": _combine(synthesis, reflection)
    }

Implementation notes:

3-5 independent perspective generations using different prompt frames
Each perspective call can run independently (parallelizable)
Synthesis phase combines viewpoints and identifies conflicts
Reflection phase adds a meta-cognitive layer
Total: 4-6 LLM calls per deep query

The Thinker module also handles prompt caching via an LRU+TTL cache (PromptCache), which avoids regenerating responses for identical queries within a configurable time window.

2. Knowledge Retriever — RAG Implementation

The Knowledge Retriever implements a Retrieval-Augmented Generation (RAG) pipeline using a local SQLite database as the document store.

Database Architecture

The knowledge base is a pre-built SQLite database containing millions of entries:

-- Core tables
CREATE TABLE concepts (
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE,
    domain TEXT,
    description TEXT
);

CREATE TABLE qa_pairs (
    id INTEGER PRIMARY KEY,
    concept_id INTEGER REFERENCES concepts(id),
    pattern_id INTEGER,
    question TEXT,
    answer TEXT,
    confidence REAL DEFAULT 1.0
);

-- Full-text search index
CREATE VIRTUAL TABLE qa_fts USING fts5(
    question, answer, concept_name,
    content='qa_pairs',
    content_rowid='id'
);

Retrieval Flow

User Query
    ↓
[1] Concept Extraction (keyword matching + NER)
    ↓
[2] FTS5 Search on SQLite
    ↓
[3] Score & Rank Results (BM25 + confidence weighting)
    ↓
    ┌─────────────────┐
    │ confidence > 0.8 │──Yes──→ Return KB answer directly
    └────────┬────────┘
             │ No
             ↓
[4] Context Assembly (top-3 results as context)
    ↓
[5] LLM Augmented Generation
    ↓
    Final Response

Performance

Operation	Time	Notes
Concept extraction	~0.01s	Regex + keyword matching
FTS5 search	~0.02s	Indexed full-text search
Result ranking	~0.01s	BM25 scoring
Total retrieval	~0.04s	Without LLM call
LLM augmentation	~1-3s	Depends on context size

Key Design Decisions

SQLite over vector DB: No need for embeddings or vector similarity search. The structured QA pairs with FTS5 provide faster and more predictable results than embedding-based retrieval on a mobile device.
Confidence threshold of 0.8: Tuned empirically. Below this, the LLM augmentation adds significant value. Above it, the KB answer is already reliable.
30 question patterns per concept: Each concept has 30 pre-written question templates (e.g., "What is X?", "Explain X", "How does X work?"), ensuring flexible matching against diverse user inputs.

3. Agent Scheduler — TaskQueue + Security Whitelist

The Agent Scheduler is the orchestration layer that routes tasks between engines, manages concurrency, and enforces security policies.

TaskQueue with Dependency Sorting

class Task:
    def __init__(self, task_id, func, args, kwargs,
                 depends_on=None, priority=0, timeout=30):
        self.id = task_id
        self.func = func        # Callable
        self.args = args
        self.kwargs = kwargs
        self.depends_on = depends_on or []  # List of Task IDs
        self.priority = priority
        self.timeout = timeout
        self.status = "pending"  # pending → running → done/failed
        self.result = None

class TaskQueue:
    def __init__(self):
        self.tasks = {}          # task_id → Task
        self.results = {}        # task_id → result
        self._lock = threading.Lock()

    def add_task(self, task):
        self.tasks[task.id] = task

    def get_ready_tasks(self):
        """Return tasks whose dependencies are all met."""
        ready = []
        for tid, task in self.tasks.items():
            if task.status != "pending":
                continue
            deps_met = all(
                dep_id in self.results
                for dep_id in task.depends_on
            )
            if deps_met:
                ready.append(task)
        # Sort by priority (higher = first)
        ready.sort(key=lambda t: -t.priority)
        return ready

    def execute(self, max_workers=4):
        """Execute ready tasks with thread pool."""
        with ThreadPoolExecutor(max_workers=max_workers) as pool:
            futures = {}
            while self.tasks:
                ready = self.get_ready_tasks()
                if not ready:
                    if not futures:
                        break  # Deadlock or done
                    # Wait for some futures to complete
                    done, _ = wait(futures, return_when=FIRST_COMPLETED)
                    self._collect_results(done, futures)
                    continue

                for task in ready:
                    task.status = "running"
                    future = pool.submit(self._run_task, task)
                    futures[future] = task.id

                done, _ = wait(futures, return_when=FIRST_COMPLETED)
                self._collect_results(done, futures)

Security Whitelist

All Agent actions are filtered through a security whitelist that prevents unauthorized system operations:

class SecurityWhitelist:
    ALLOWED_FUNCTIONS = {
        "thinker.fast_think",
        "thinker.cot_think",
        "thinker.deep_think",
        "knowledge.search",
        "knowledge.query",
        "memory.store",
        "memory.retrieve",
        "evolution.add_xp",
        "evolution.check_level",
        "evolution.apply_patch",
    }

    ALLOWED_PATHS = {
        "/data/data/com.termux/files/home/miniGPT_project/Tian AI/",
        "/tmp/tian_ai/",
    }

    ALLOWED_IMPORTS = {
        "json", "os", "re", "datetime",
        "sqlite3", "threading", "subprocess"
    }

    @classmethod
    def validate_action(cls, action_name, path=None, imports=None):
        if action_name not in cls.ALLOWED_FUNCTIONS:
            raise SecurityError(f"Function {action_name} not allowed")
        if path and not any(path.startswith(p) for p in cls.ALLOWED_PATHS):
            raise SecurityError(f"Path {path} not allowed")
        if imports:
            bad = set(imports) - cls.ALLOWED_IMPORTS
            if bad:
                raise SecurityError(f"Import(s) not allowed: {bad}")
        return True

4. Self-Evolution System — AST Analysis + Auto-Patching

The Self-Evolution system is the most unique component. It enables Tian AI to analyze its own source code and apply improvements autonomously.

AST Analysis Pipeline

import ast
import asttokens  # Optional, for source-level annotations

class CodeAnalyzer:
    def __init__(self, project_root):
        self.root = project_root
        self.report = {
            "files": 0,
            "total_lines": 0,
            "functions": 0,
            "classes": 0,
            "complexity": {},
            "duplications": [],
            "issues": []
        }

    def analyze_file(self, filepath):
        with open(filepath) as f:
            source = f.read()

        tree = ast.parse(source)

        # Count functions and classes
        functions = [n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)]
        classes = [n for n in ast.walk(tree) if isinstance(n, ast.ClassDef)]

        # Calculate cyclomatic complexity per function
        for func in functions:
            complexity = self._calc_complexity(func)
            self.report["complexity"][f"{filepath}:{func.name}"] = complexity

        # Detect long functions (>50 lines)
        for func in functions:
            if func.end_lineno - func.lineno > 50:
                self.report["issues"].append({
                    "type": "long_function",
                    "file": filepath,
                    "function": func.name,
                    "lines": func.end_lineno - func.lineno
                })

        # Detect duplicate code blocks
        self._detect_duplicates(filepath, tree)

        self.report["files"] += 1
        self.report["total_lines"] += len(source.splitlines())
        self.report["functions"] += len(functions)
        self.report["classes"] += len(classes)

    def _calc_complexity(self, func_node):
        """McCabe cyclomatic complexity."""
        base = 1
        for node in ast.walk(func_node):
            if isinstance(node, (ast.If, ast.While, ast.For,
                                 ast.ExceptHandler, ast.With,
                                 ast.Assert)):
                base += 1
            elif isinstance(node, ast.BoolOp):
                base += len(node.values) - 1
        return base

Auto-Patching System

class PatchEngine:
    def __init__(self):
        self.backup_dir = "/data/data/com.termux/files/home/miniGPT_project/Tian AI/backups/"

    def apply_patch(self, filepath, old_code, new_code):
        """Apply a code patch with automatic backup."""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        backup_path = f"{self.backup_dir}{os.path.basename(filepath)}.{timestamp}.bak"

        # 1. Backup original
        shutil.copy2(filepath, backup_path)

        # 2. Apply the change
        with open(filepath, 'r') as f:
            full_source = f.read()

        if old_code not in full_source:
            raise PatchError("Old code not found — patch rejected")

        new_source = full_source.replace(old_code, new_code, 1)

        # 3. Verify syntax before saving
        try:
            compile(new_source, filepath, 'exec')
        except SyntaxError as e:
            raise PatchError(f"Syntax error in patch: {e}")

        # 4. Save
        with open(filepath, 'w') as f:
            f.write(new_source)

        return {
            "status": "applied",
            "backup": backup_path,
            "file": filepath
        }

The Full Evolution Loop

1. SCAN ──→ AST walk all .py files
                ↓
2. ANALYZE ──→ Complexity metrics, duplication detection, code smells
                ↓
3. SUGGEST ──→ Send analysis report to LLM with structured prompt
                ↓
4. DECIDE ──→ LLM returns patches (old code → new code)
                ↓
5. APPLY ──→ PatchEngine applies with backup + syntax verification
                ↓
6. VERIFY ──→ Run compile(), run basic assertions
                ↓
7. COMMIT ──→ If verified, git commit with auto-generated message

Each successful evolution cycle grants XP to the system, which contributes to leveling up and unlocking new capabilities.

5. LLMManager — Process Lifecycle Management

The LLMManager is responsible for starting, monitoring, and restarting the llama.cpp server process. This is critical because the LLM backend is a separate C++ process that can crash under memory pressure.

class LLMManager:
    def __init__(self, model_path, port=8080, threads=4, context=2048):
        self.model_path = model_path
        self.port = port
        self.threads = threads
        self.context = context
        self.process = None
        self._health_url = f"http://localhost:{port}/health"
        self.restart_count = 0
        self.max_restarts = 5

    def start(self):
        """Spawn the llama-server process."""
        cmd = [
            "llama-server",
            "-m", self.model_path,
            "--port", str(self.port),
            "-t", str(self.threads),
            "-c", str(self.context),
            "--mlock",  # Prevent swapping
            "--no-mmap",  # Use malloc instead of mmap
        ]
        self.process = subprocess.Popen(
            cmd,
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL
        )

    def health_check(self):
        """Check if the LLM process is responsive."""
        try:
            resp = requests.get(self._health_url, timeout=3)
            return resp.status_code == 200
        except:
            return False

    def wait_until_ready(self, timeout=60):
        """Poll health endpoint until the model is loaded."""
        start = time.time()
        while time.time() - start < timeout:
            if self.health_check():
                return True
            time.sleep(2)
        return False

    def restart(self):
        """Graceful restart with automatic retry."""
        if self.restart_count >= self.max_restarts:
            raise RuntimeError("Max restarts exceeded")
        self.stop()
        self.start()
        if self.wait_until_ready():
            self.restart_count += 1
            return True
        return False

    def auto_recover(self):
        """Monitor and auto-restart on crash."""
        while True:
            if not self.health_check():
                logger.warning("LLM process unresponsive — restarting...")
                if not self.restart():
                    logger.error("Failed to restart LLM process")
                    break
            time.sleep(10)  # Check every 10 seconds

    def stop(self):
        """Terminate the LLM process."""
        if self.process and self.process.poll() is None:
            self.process.terminate()
            try:
                self.process.wait(timeout=5)
            except subprocess.TimeoutExpired:
                self.process.kill()

Key design decisions:

--mlock + --no-mmap prevents the OS from swapping the model to disk, which would cause catastrophic slowdowns
Separate health check thread runs every 10 seconds
Max 5 restart attempts before giving up (prevents infinite crash loops)
Process is launched with stdout/stderr suppressed to avoid filling up logs on the phone

6. PromptCache — LRU + TTL Caching Strategy

To avoid regenerating responses for identical queries (common in multi-turn conversations), the system implements a combined LRU+TTL cache:

class PromptCache:
    def __init__(self, max_size=100, ttl_seconds=300):
        self.cache = OrderedDict()  # LRU ordering
        self.max_size = max_size
        self.ttl = ttl_seconds
        self.timestamps = {}  # key → timestamp

    def get(self, key):
        if key not in self.cache:
            return None
        # Check TTL
        if time.time() - self.timestamps[key] > self.ttl:
            self._evict(key)
            return None
        # Move to end (LRU update)
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        # Evict oldest if at capacity
        if len(self.cache) >= self.max_size:
            oldest = next(iter(self.cache))
            self._evict(oldest)
        self.cache[key] = value
        self.timestamps[key] = time.time()
        self.cache.move_to_end(key)

    def _evict(self, key):
        del self.cache[key]
        del self.timestamps[key]

Cache key composition: f"{mode}:{query}:{context_hash}" where mode is the thinking mode, query is the user input, and context_hash is a SHA256 of the conversation context.

Project Statistics

Here are the raw numbers from the codebase:

Metric	Value
Python files	770
Total lines of code	~170,041
Core modules	6 (Thinker, Talker, Knowledge, Agent, Evolution, LLMManager)
Extension languages	3 (C, C++, Java)
C files	1 (tian_hash — fast hashing)
C++ files	1 (tian_engine — performance engine)
Java files	1 (tian_tools — Android tooling)
Knowledge Base size	Millions of indexed concepts
Supported LLM	Qwen2.5-1.5B (GGUF quantized)
Backend framework	Flask
Alternative UI	Gradio

The project is backed by a self-hosted GitHub mirror at github.com/3969129510/tian-ai.

What's Next

The architecture is designed for extensibility. Future directions include:

Plugin system — Hot-loadable agent plugins with sandboxed execution
Multi-modal pipeline — Image/audio understanding via local models
Distributed agents — Multiple Tian AI instances collaborating over LAN
Federated evolution — Privacy-preserving code improvement across instances
Android APK — Standalone app packaging (no Termux required)

Getting Involved

Tian AI is fully open source. Contributions, issues, and forks are welcome.

git clone https://github.com/3969129510/tian-ai
cd tian-ai
# Explore the architecture
ls -la modules/
cat Thinker.py | head -100

Support development:
USDT (TRC-20): TNeUMpbwWFcv6v7tYHmkFkE7gC5eWzqbrs
BTC: bc1ph7qnaqkx4pkg4fmucvudlu3ydzgwnfmxy7dkv3nyl48wwa03kmnsvpc2xv

Tian AI — Open Source. Local. Self-Evolving.

GitHub: github.com/3969129510/tian-ai