Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights. Modern agent skills are usually hand-crafted, generated one-shot by a strong LLM, or evolved through loosely controlled self-revision — none of which behaves like a deep-learning optimizer for the skill itself, and none of which reliably improves over its starting point under feedback. SkillOpt treats the skill document as the trainable state of a frozen agent, and trains it with the discipline that makes weight-space optimization reproducible. A separate optimizer model turns scored rollouts into bounded add / delete / replace edits on a single skill document; a candidate edit is accepted only when it strictly improves a held-out v
Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights.
📖 For installation, data preparation, training/eval commands, the full configuration reference, and framework internals, see the Documentation & Reproduction Guide (rendered on GitHub Pages).
docs/sleep/README.md for what it is, how to use it, and results.pip install skillopt. This initial release includes the full training loop (rollout → reflect → aggregate → select → update → evaluate), multi-backend support (OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and WebUI dashboard.Modern agent skills are usually hand-crafted, generated one-shot by a strong LLM, or evolved through loosely controlled self-revision — none of which behaves like a deep-learning optimizer for the skill itself, and none of which reliably improves over its starting point under feedback.
SkillOpt treats the skill document as the trainable state of a frozen agent, and trains it with the discipline that makes weight-space optimization reproducible. A separate optimizer model turns scored rollouts into bounded add / delete / replace edits on a single skill document; a candidate edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, a rejected-edit buffer, and an epoch-wise slow / meta update make skill training stable while adding zero inference-time model calls at deployment.
The deployed artifact is a compact best_skill.md (typically 300–2,000
tokens) that runs against the unchanged target model. Across six
benchmarks, seven target models, and three execution harnesses (direct
chat, Codex CLI, Claude Code CLI), SkillOpt is best or tied-best on all
52 evaluated (model, benchmark, harness) cells and on GPT-5.5 lifts the
average no-skill accuracy by +23.5 points in direct chat, +24.8 inside
the Codex agentic loop, and +19.1 inside Claude Code. Optimized skill
artifacts transfer across model scales, between Codex and Claude Code
harnesses, and to nearby benchmarks without further optimization.
For the full method, ablations, and per-cell results see the paper; for a visual walkthrough of the loop see the project page; for deeper API / backend / benchmark docs see docs/.
https://github.com/user-attachments/assets/eb12d3bc-371c-467f-904d-91b61f339ed7
▶ Watch the full demo on YouTube
A backend = a chat / exec target (e.g. openai_chat, claude_chat,
qwen_chat, minimax_chat, codex_exec, claude_code_exec). See
docs/guide/new-backend.md for the full
contract; in short you add a skillopt/model/<name>_backend.py module,
register it in skillopt/model/common.py + backend_config.py, and wire
it through the router in skillopt/model/__init__.py. qwen_backend.py
and minimax_backend.py are good templates.
A benchmark = a skillopt/envs/<name>/ package with a dataloader.py, a
rollout.py, and an initial.md seed skill. See
docs/guide/new-benchmark.md for the full
contract; the simplest reference is skillopt/envs/searchqa/.
Launch the monitoring dashboard (optional):
pip install -e ".[webui]"
python -m skillopt_webui.app
| Flag | Default | Description |
|---|---|---|
--port | 7860 | Server port |
--host | 0.0.0.0 | Bind address |
--share | off | Create a public Gradio share link |
@misc{yang2026skilloptexecutivestrategyselfevolving,
title={SkillOpt: Executive Strategy for Self-Evolving Agent Skills},
author={Yifan Yang and Ziyang Gong and Weiquan Huang and Qihao Yang and Ziwei Zhou and Zisu Huang and Yan Li and Xuemei Gao and Qi Dai and Bei Liu and Kai Qiu and Yuqing Yang and Dongdong Chen and Xue Yang and Chong Luo},
year={2026},
eprint={2605.23904},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.23904}
}
“SkillOpt – Executive Strategy for Self-Evolving Agent Skills”
“SkillOpt from MSFT treats skills as trainable parameters”
“Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights - VentureBeat — VentureBeat”
“Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file - the-decoder.com — the-decoder.com”
“Microsoft Makes SkillOpt, AI Agent Can Learn Without Re-training Models - VOI.id — VOI.id”
AI
Companies use AI to filter candidates. I just gave candidates AI to choose companies. Career-Ops (career-ops.org, also known as careerops) turns any AI coding CLI into a full job search command center. Instead of manually tracking applications in a spreadsheet, you get an AI-powered pipeline that: Career-ops is agentic: Claude Code navigates career pages with Playwright, evaluates fit by reasoning about your CV vs the job description (not keyword matching), and adapts your resume per listing.
AI
CLI-Anything: Bridging the Gap Between AI Agents and the World's Software 🌐 CLI-Hub: pip install cli-anything-hub then cli-hub install — browse, install, and manage all community-built CLIs. Want to add your own? Open a PR — the hub updates instantly. 🎬 See Demos: Watch AI agents use generated CLIs plus preview, live preview, and trajectory loops to produce real artifacts — CAD builds, 3D scenes, diagrams, gameplay, subtitles, and more.
AI
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan. A full, hover-to-play tour lives on the landing page (docs/index.html). Defaults work out of the box: clone, run, then configure models/search/email inside Settings. Only edit .env for deployment-level overrides like APPBIND, APPPORT, AUTHENABLED, DATABASEURL, or a pre-seeded admin password.
AI
Most AI material teaches in scattered pieces. A paper here, a fine-tuning post there, a flashy agent demo somewhere else. The pieces rarely line up. You ship a chatbot but can't explain its loss curve. You hook a function to an agent but can't say what attention does inside the model that's calling it. This curriculum is the spine. 20 phases, 503 lessons, four languages: Python, TypeScript, Rust, Julia. Linear algebra at one end, autonomous swarms at the other. Every algorithm gets built from raw math first. Backprop. Tokenizer. Attention. Agent loop. By the time PyTorch shows up, you already know what it's doing under the hood. Each lesson runs the same loop: read the problem, derive the math, write the code, run the test, keep the artifact. No five-minute videos, no copy-paste deploys,