Personal AI
Training Flywheel

Turn your AI interaction traces into personalized training data. Import from Claude Code, ChatGPT, or any MCP tool — fine-tune a model that learns your patterns.

Try Web App GitHub

BashGym dashboard showing trace capture, model training progress, and deployment routing metrics

Self-host:

$ git clone https://github.com/GhostPeony/bashgym && cd bashgym && pip install -r requirements.txt

Your coding sessions are training data. We use them.

Every time you use Claude Code to write, debug, or refactor software, BashGym silently captures that session as a structured trace. Those traces feed an automated pipeline that fine-tunes a small model trained entirely on how you actually code — your conventions, your repos, your patterns. Over time that model takes over routine tasks from Claude, cutting your API costs and response times while getting more personalized every day.

STEP 01

Capture

Hooks installed into Claude Code record every tool call, file edit, and bash command as a structured execution trace — automatically, with no change to how you work.

STEP 02

Train

Traces are scored, sanitized, and fed into a fine-tuning pipeline. SFT, DPO, GRPO, RLVR, or distillation — you choose the strategy. The result is a small local model that knows your codebase.

STEP 03

Deploy

A confidence-based router progressively shifts traffic from Claude to your local model. Simple tasks go local (~50ms). Complex ones fall back to Claude. The flywheel keeps spinning.

Platform Walkthrough

See BashGym in action — from trace capture to model deployment.

11 Core Capabilities

Everything you need to capture traces, train models, and deploy your own coding assistant.

🔍

Trace Capture

Intercept Claude Code tool calls via hooks. Capture prompts, tool outputs, and reasoning traces automatically.

⚖️

Quality Framework

Multi-judge scoring with syntax, semantic, and execution validators. Only high-quality traces become training data.

🔒

Privacy by Design

PII detection, secret scrubbing, and path anonymization. Your code stays private throughout the pipeline.

🎯

Training Strategies

SFT, DPO, GRPO, RLVR, and distillation pipelines. Choose the right strategy for your model and data.

📦

Model Registry

Version, tag, and manage trained model artifacts. Track lineage from trace to deployed checkpoint.

🔄

Progressive Routing

Confidence-based routing between your local model and Claude. Your model handles what it knows, Claude handles the rest.

📊

Real-Time Dashboard

Monitor trace collection, training progress, model performance, and routing decisions in a live dashboard.

☁️

Multi-Cloud

Train on Lambda Labs, RunPod, Vast.ai, or your own GPUs. Cloud-agnostic infrastructure provisioning.

📈

Benchmarks

SWE-bench, HumanEval, and custom project-specific benchmarks. Measure real improvement on real tasks.

🛡️

Safety Guardrails

Harmful content filtering, bias detection, and output validation. Safe models from safe data.

🕸️

Orchestrator

Decompose a spec into a Task DAG, run parallel workers in isolated git worktrees, and feed results back into the training pipeline.

The Ouroboros Flywheel

A self-reinforcing loop: use Claude, capture traces, train your model, deploy it, repeat.

ACT

Use Claude Code normally

→

VERIFY

Judge trace quality

→

SYNTHESIZE

Build training data

→

TRAIN

Fine-tune your model

→

DEPLOY

Route to your model

→

REPEAT

Continuously improve

Live Training Monitor

Watch your model improve in real time. Track trace collection, training epochs, loss curves, and deployment status from a single dashboard.

Trace collection stats and quality scores
Training progress with loss and metric curves
Model registry with version comparison
Routing confidence and fallback rates
Benchmark results across model versions

BashGym Training

$ bashgym train --strategy sft

Loading 2,847 verified traces...

Model: codellama-7b-instruct

Epoch 1/3 ████████░░ 80% loss=0.42

Epoch 2/3 ██████████ 100% loss=0.31

Epoch 3/3 ██████████ 100% loss=0.24

Training complete. Checkpoint saved.

HumanEval: 48.2% (+12.1% vs base)

Three Steps to Your Own Model

Install Hooks

Install BashGym hooks into Claude Code. Traces are captured automatically as you work.

Use Claude Code Normally

Keep coding as usual. BashGym silently captures, scores, and curates high-quality training data.

Train Your Model

Launch training with one command. BashGym handles data prep, fine-tuning, evaluation, and deployment.

8-Layer Architecture

A modular system from trace capture to API serving.

Arena

Trace Capture Hook into Claude Code tool calls

Session Recording Full conversation context

Judge

Quality Scoring Multi-judge validation

PII Scrubbing Privacy-first filtering

Factory

Data Synthesis Trace to training format

Augmentation Expand dataset diversity

Gym

SFT / DPO Fine-tuning pipelines

Cloud Provisioning Multi-cloud GPU training

Models

Registry Version and tag checkpoints

Lineage Trace-to-model provenance

Observability

Dashboard Live training monitor

Benchmarks SWE-bench, HumanEval

Integrations

Claude Code Hook-based capture

BashBros Security middleware

API

Serving OpenAI-compatible endpoint

Routing Confidence-based fallback

Works With Your Stack

Claude Code

Ollama

HuggingFace

NVIDIA NeMo

BashBros

Docker

Start Training Your Own Model

Upload your traces, generate training examples, push to HuggingFace. The flywheel starts with one upload.

Try Web App View on GitHub

FAQ

Frequently Asked Questions

What is BashGym?

BashGym is a self-improving agentic dev gym that captures execution traces from Claude Code sessions and converts them into fine-tuning datasets. It trains smaller, personalized language models that learn your coding patterns, conventions, and workflows through the Ouroboros Flywheel.

How does the Ouroboros Flywheel work?

The Ouroboros Flywheel is a self-reinforcing loop with six stages: Act (use Claude Code normally), Verify (judge trace quality with multi-judge scoring), Synthesize (build training data from verified traces), Train (fine-tune your model via SFT, DPO, GRPO, RLVR, or distillation), Deploy (route tasks to your model with confidence-based routing), and Repeat (continuously improve as more traces are captured).

What models can I train with BashGym?

BashGym supports fine-tuning smaller open-source language models such as CodeLlama 7B. It integrates with HuggingFace, Ollama, and NVIDIA NeMo, and supports training on Lambda Labs, RunPod, Vast.ai, or your own GPUs.

Do I need ML expertise to use BashGym?

No. BashGym is designed for developers, not ML engineers. You install hooks into Claude Code, keep coding as usual, and launch training with one command. BashGym handles data preparation, quality scoring, fine-tuning, evaluation, and deployment automatically.

How does BashGym capture training data?

BashGym installs hooks into Claude Code that silently record every tool call, file edit, and bash command as a structured execution trace. These traces are automatically scored for quality, scrubbed for PII and secrets, and curated into training datasets — with no change to how you work.

Is BashGym free?

Yes. BashGym is open source under the MIT License. You can self-host it by cloning the GitHub repository, or use the hosted web app at bashgym.fly.dev. Cloud GPU costs for training are separate and depend on your chosen provider.

Personal AITraining Flywheel