Max · Personal Side Projects

Building LLM systems, end to end.

Agent harness、推論服務與路由、檢索、資料處理、評測、RL 對齊 —— 九個個人 side project，串成一條完整的 LLM pipeline。

scroll

Overview

一顆 token 的旅程。

九個 project、八個站

AGENT

多用戶 agent — 工具呼叫、瀏覽、沙盒執行

— picobot

SERVING · ROUTING

一個入口，路由與管理整個模型 fleet

— vLLMux

INFERENCE

TensorRT 加速與動態批次

— TensorRT Server

RETRIEVAL · RAG

單向量 / 多向量混合檢索與重排

— Tiny-RAGFlow

TOOLS

統一的 LLM / Embedding / Rerank 介面

— LLM Tools

DATA

把原始文件變成乾淨的模型輸入

— file2md
— SEC 10-K Extraction

EVALUATION

多任務評測與 LLM-as-judge

— llm-evals

ALIGNMENT · RL

用 RL 教模型何時回答、何時棄答

— BehaviorRL

AGENT

多用戶 agent — 工具呼叫、瀏覽、沙盒執行

— picobot

SERVING · ROUTING

一個入口，路由與管理整個模型 fleet

— vLLMux

INFERENCE

TensorRT 加速與動態批次

— TensorRT Server

RETRIEVAL · RAG

單向量 / 多向量混合檢索與重排

— Tiny-RAGFlow

TOOLS

統一的 LLM / Embedding / Rerank 介面

— LLM Tools

DATA

把原始文件變成乾淨的模型輸入

— file2md
— SEC 10-K Extraction

EVALUATION

多任務評測與 LLM-as-judge

— llm-evals

ALIGNMENT · RL

用 RL 教模型何時回答、何時棄答

— BehaviorRL

Work

九個 project，一條 pipeline。

從 agent 到推論服務、檢索、資料處理、評測、RL 對齊 —— 每個專案都是這條 LLM pipeline 上的一站。

Agent

picobot

Agent

多用戶 Web Agent — 對話、呼叫工具、操作 workspace、瀏覽網頁，每段對話都跑在隔離沙盒裡。

Serving · Routing

vLLMux

Serving · Routing

一站式部署、路由、監控、評測你的 vLLM 叢集。

Inference

TensorRT Inference Server

Inference

基於 TensorRT 的高效能推論伺服器 — Embedding / Reranker / NLI。

Retrieval · RAG

Tiny-RAGFlow

Retrieval · RAG

輕量 RAG 框架 — 單向量 / 多向量混合檢索，不依賴外部資料庫。

Tools

LLM Tools

Tools

LLM、Embedding、Reranker 的統一介面。

Data Processing

file2md

Data Processing

把各種檔案格式轉成 Markdown，餵給下游 LLM。

SEC 10-K Structured Extraction

Data Processing

把 SEC 10-K 年報解析成標準化 JSON — 零 LLM 成本，平均 < 1 秒。

Evaluation

llm-evals

Evaluation

跨任務的 LLM 評測框架 — QA、Tool Calling、G-Eval、RAG。

Alignment · RL

BehaviorRL-Hallucination

Alignment · RL

教 LLM 學會「何時回答、何時棄答」— 行為導向 RL，幻覺率降低 65%+。

About

把黑盒子拆開來看。

我在學習 LLM、檢索系統與 AI 基礎設施的路上，選擇不只停在理論 —— 而是動手打造真實系統：從推論伺服器、RAG pipeline 到 agent harness 框架。這些專案不求完美或 production-ready，它們誠實記錄我的學習過程與實驗。

What I'm exploring

Agent harness 框架如何運作（tools / memory / planning / orchestration）
如何高效地服務模型（GPU / TensorRT / batching）
如何路由與管理多個 LLM
檢索如何運作（dense / sparse / hybrid / multi-vector）
如何評測 LLM 輸出、降低幻覺
如何設計實用的 LLM 應用

Skills

Python
FastAPI
TypeScript
Vue 3
vLLM
SGLang
TensorRT
ONNX
FAISS
Qdrant
Prometheus
Grafana
PostgreSQL
Docker

Contact

milk333445@gmail.com

github / LLMSystems linkedin / tung-hui-kuo résumé