π Our paper is on arXiv β come read it! AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration researchclaw run --topic "Your research idea here" --mode co-pilot You think it. AutoResearchClaw writes it. You guide the key decisions.
Just chat with OpenClaw: "Research X" β done.
π Our paper is on arXiv β come read it! AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
π¨π³ δΈζ Β· π―π΅ ζ₯ζ¬θͺ Β· π°π· νκ΅μ΄ Β· π«π· FranΓ§ais Β· π©πͺ Deutsch Β· πͺπΈ EspaΓ±ol Β· π§π· PortuguΓͺs Β· π·πΊ Π ΡΡΡΠΊΠΈΠΉ Β· πΈπ¦ Ψ§ΩΨΉΨ±Ψ¨ΩΨ©
π Paper Showcase Β· π§ββοΈ Co-Pilot Guide Β· π Integration Guide Β· π¬ Discord Community
π§ͺ We're looking for testers! Try the pipeline with your own research idea β from any field β and tell us what you think. Your feedback directly shapes the next version. β Testing Guide | β δΈζζ΅θ―ζε | β ζ₯ζ¬θͺγγΉγγ¬γ€γ
experiments/arc_bench/, and also released on π€ Hugging Face. β Domain Integration Guidefull-auto, gate-only, checkpoint, step-by-step, co-pilot, custom), per-stage policies, and deep human-AI collaboration. Includes: Idea Workshop for hypothesis co-creation, Baseline Navigator for experiment design review, Paper Co-Writer for collaborative drafting, SmartPause (confidence-driven dynamic intervention), ALHF intervention learning, anti-hallucination claim verification, cost budget guardrails, pipeline branching for parallel hypothesis exploration, and CLI commands (attach/status/approve/reject/guide). β Full HITL Guideresearchclaw skills install or drop a SKILL.md into .claude/skills/. See Skills Library.--resume auto-detection, LLM retry hardening, and community-reported fixes.metaclaw_bridge.enabled: true), fully backward-compatible. See Integration Guide.# Fully autonomous β no human intervention
pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
# Co-Pilot mode β collaborate with AI at key decision points
researchclaw run --topic "Your research idea here" --mode co-pilot
You think it. AutoResearchClaw writes it. You guide the key decisions.
Drop a research topic β get back a full academic paper with real literature from OpenAlex, Semantic Scholar & arXiv, hardware-aware sandbox experiments (GPU/MPS/CPU auto-detected), statistical analysis, multi-agent peer review, and conference-ready LaTeX targeting NeurIPS/ICML/ICLR. Run it fully autonomous, or use Co-Pilot mode to guide the AI at critical decision points β choose research directions, review experiment designs, and co-write the paper. No hallucinated references.
| π | paper_draft.md | Full academic paper (Introduction, Related Work, Method, Experiments, Results, Conclusion) |
| π | paper.tex | Conference-ready LaTeX (NeurIPS / ICLR / ICML templates) |
| π | references.bib | Real BibTeX references from OpenAlex, Semantic Scholar and arXiv β auto-pruned to match inline citations |
| π | verification_report.json | 4-layer citation integrity + relevance verification (arXiv, CrossRef, DataCite, LLM) |
| π§ͺ | experiment runs/ | Generated code + sandbox results + structured JSON metrics |
| π | charts/ | Auto-generated condition comparison charts with error bars and confidence intervals |
| π | reviews.md | Multi-agent peer review with methodology-evidence consistency checks |
| 𧬠| evolution/ | Self-learning lessons extracted from each run |
| π¦ | deliverables/ | All final outputs in one folder β compile-ready for Overleaf |
The pipeline runs end-to-end β fully autonomous or with human-in-the-loop collaboration. When experiments fail, it self-heals. When hypotheses don't hold, it pivots. When citations are fake, it kills them. When you want to steer, it pauses and listens.
π Run it anywhere. AutoResearchClaw isn't locked to a single platform. Use it standalone via CLI, plug it into OpenClaw, or wire it up through any ACP-compatible agent β π€ Claude Code, π» Codex CLI, π Copilot CLI, β Gemini CLI, π Kimi CLI, you name it. And because OpenClaw bridges to messaging platforms, you can kick off a full research run from π¬ Discord, βοΈ Telegram, π¦ Lark (ι£δΉ¦), π WeChat, or wherever your team already hangs out. One topic in, one paper out β no matter where you type it.
# 1. Clone & install
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# 2. Setup (interactive β installs OpenCode beast mode, checks Docker/LaTeX)
researchclaw setup
# 3. Configure
researchclaw init # Interactive: choose LLM provider, creates config.arc.yaml
# Or manually: cp config.researchclaw.example.yaml config.arc.yaml
# 4. Run
export OPENAI_API_KEY="sk-..."
researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
Output β artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/ β compile-ready LaTeX, BibTeX, experiment code, charts.
project:
name: "my-research"
research:
topic: "Your research topic here"
llm:
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
primary_model: "gpt-4o"
fallback_models: ["gpt-4o-mini"]
experiment:
mode: "sandbox"
sandbox:
python_path: ".venv/bin/python"
| Capability | How It Works |
|---|---|
| π§ββοΈ Co-Pilot Mode | 6 intervention modes β from fully autonomous to step-by-step. Guide the AI at critical decisions (hypotheses, baselines, paper writing) or let it run free. SmartPause auto-detects when human input would help. |
| π PIVOT / REFINE Loop | Stage 15 autonomously decides: PROCEED, REFINE (tweak params), or PIVOT (new direction). Artifacts auto-versioned. |
| π€ Multi-Agent Debate | Hypothesis generation, result analysis, and peer review each use structured multi-perspective debate. |
| 𧬠Self-Learning | Lessons extracted per run (decision rationale, runtime warnings, metric anomalies) with 30-day time-decay. Future runs learn from past mistakes. |
| π Knowledge Base | Every run builds structured KB across 6 categories (decisions, experiments, findings, literature, questions, reviews). |
| π‘οΈ Sentinel Watchdog | Background quality monitor: NaN/Inf detection, paper-evidence consistency, citation relevance scoring, anti-fabrication guard. |
| π Claim Verification | Inline fact-checking: extracts claims from AI-generated text and cross-references against collected literature. Flags ungrounded citations and fabricated numbers. |
| πΏ Branch Exploration | Fork the pipeline to explore multiple research directions simultaneously, compare results side-by-side, and merge the best path forward. |
AutoResearchClaw is an OpenClaw-compatible service. Install it in OpenClaw and launch autonomous research with a single message β or use it standalone via CLI, Claude Code, or any AI coding assistant.
If you already use OpenClaw as your AI assistant:
1οΈβ£ Share the GitHub repo URL with OpenClaw
2οΈβ£ OpenClaw auto-reads RESEARCHCLAW_AGENTS.md β understands the pipeline
3οΈβ£ Say: "Research [your topic]"
4οΈβ£ Done β OpenClaw clones, installs, configures, runs, and returns results
That's it. OpenClaw handles git clone, pip install, config setup, and pipeline execution automatically. You just chat.
RESEARCHCLAW_AGENTS.md β learns the research orchestrator roleREADME.md β understands installation and pipeline structureconfig.researchclaw.example.yaml β config.yamlpip install -e . + researchclaw run --topic "..." --auto-approveFor deeper integration, AutoResearchClaw includes a bridge adapter system with 6 optional capabilities:
# config.arc.yaml
openclaw_bridge:
use_cron: true # β° Scheduled research runs
use_message: true # π¬ Progress notifications (Discord/Slack/Telegram)
use_memory: true # π§ Cross-session knowledge persistence
use_sessions_spawn: true # π Spawn parallel sub-sessions for concurrent stages
use_web_fetch: true # π Live web search during literature review
use_browser: false # π₯οΈ Browser-based paper collection
Each flag activates a typed adapter protocol. When OpenClaw provides these capabilities, the adapters consume them without code changes. See docs/integration-guide.md for full details.
AutoResearchClaw can use any ACP-compatible coding agent as its LLM backend β no API keys required. The agent communicates via acpx, maintaining a single persistent session across all 23 pipeline stages.
| Agent | Command | Notes |
|---|---|---|
| Claude Code | claude | Anthropic |
| Codex CLI | codex | OpenAI |
| Copilot CLI | gh | GitHub |
| Gemini CLI | gemini | |
| OpenCode | opencode | SST |
| Kimi CLI | kimi | Moonshot |
# config.yaml β ACP example
llm:
provider: "acp"
acp:
agent: "claude" # Any ACP-compatible agent CLI command
cwd: "." # Working directory for the agent
# No base_url or api_key needed β the agent handles its own auth.
# Just run β the agent uses its own credentials
researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
| Method | How |
|---|---|
| Standalone CLI | researchclaw run --topic "..." --auto-approve (autonomous) or --mode co-pilot (collaborative) |
| Python API | from researchclaw.pipeline import Runner; Runner(config).run() |
| Claude Code | Reads RESEARCHCLAW_CLAUDE.md β just say "Run research on [topic]" |
| Copilot CLI | researchclaw run --topic "..." with llm.acp.agent: "gh" |
| OpenCode | Reads .claude/skills/ β same natural language interface |
| Any AI CLI | Provide RESEARCHCLAW_AGENTS.md as context β agent auto-bootstraps |
Phase A: Research Scoping Phase E: Experiment Execution
1. TOPIC_INIT 12. EXPERIMENT_RUN
2. PROBLEM_DECOMPOSE 13. ITERATIVE_REFINE β self-healing
Phase B: Literature Discovery Phase F: Analysis & Decision
3. SEARCH_STRATEGY 14. RESULT_ANALYSIS β multi-agent
4. LITERATURE_COLLECT β real API 15. RESEARCH_DECISION β PIVOT/REFINE
5. LITERATURE_SCREEN [gate]
6. KNOWLEDGE_EXTRACT Phase G: Paper Writing
16. PAPER_OUTLINE
Phase C: Knowledge Synthesis 17. PAPER_DRAFT
7. SYNTHESIS 18. PEER_REVIEW β evidence check
8. HYPOTHESIS_GEN β debate 19. PAPER_REVISION
Phase D: Experiment Design Phase H: Finalization
9. EXPERIMENT_DESIGN [gate] 20. QUALITY_GATE [gate]
10. CODE_GENERATION 21. KNOWLEDGE_ARCHIVE
11. RESOURCE_PLANNING 22. EXPORT_PUBLISH β LaTeX
23. CITATION_VERIFY β relevance check
Gate stages (5, 9, 20) pause for human approval or auto-approve with
--auto-approve. On rejection, the pipeline rolls back.
Co-Pilot mode (
--mode co-pilot): Deep human-AI collaboration at Stages 7-8 (Idea Workshop), Stage 9 (Baseline Navigator), and Stages 16-17 (Paper Co-Writer). Other stages auto-execute with SmartPause monitoring.
Decision loops: Stage 15 can trigger REFINE (β Stage 13) or PIVOT (β Stage 8), with automatic artifact versioning.
| Phase | What Happens |
|---|---|
| A: Scoping | LLM decomposes the topic into a structured problem tree with research questions |
| A+: Hardware | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only), warns if local hardware is limited, adapts code generation accordingly |
| B: Literature | Multi-source search (OpenAlex β Semantic Scholar β arXiv) for real papers, screens by relevance, extracts knowledge cards |
| C: Synthesis | Clusters findings, identifies research gaps, generates testable hypotheses via multi-agent debate |
| D: Design | Designs experiment plan, generates hardware-aware runnable Python (GPU tier β package selection), estimates resource needs |
| E: Execution | Runs experiments in sandbox, detects NaN/Inf and runtime bugs, self-heals code via targeted LLM repair |
| F: Analysis | Multi-agent analysis of results; autonomous PROCEED / REFINE / PIVOT decision with rationale |
| G: Writing | Outlines β section-by-section drafting (5,000-6,500 words) β peer reviews (with methodology-evidence consistency) β revises with length guard |
| H: Finalization | Quality gate, knowledge archival, LaTeX export with conference template, citation integrity + relevance verification |
| Feature | Description |
|---|---|
| π Multi-Source Literature | Real papers from OpenAlex, Semantic Scholar & arXiv β query expansion, deduplication, circuit breaker with graceful degradation |
| π 4-Layer Citation Verification | arXiv ID check β CrossRef/DataCite DOI β Semantic Scholar title match β LLM relevance scoring. Hallucinated refs auto-removed. |
| π₯οΈ Hardware-Aware Execution | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts code generation, imports, and experiment scale accordingly |
| π¦Ύ OpenCode Beast Mode | Complex experiments auto-routed to OpenCode β generates multi-file projects with custom architectures, training loops, and ablation studies. Install via researchclaw setup. |
| π§ͺ Sandbox Experiments | AST-validated code, immutable harness, NaN/Inf fast-fail, self-healing repair, iterative refinement (up to 10 rounds), partial result capture |
| π Conference-Grade Writing | NeurIPS/ICML/ICLR templates, section-by-section drafting (5,000-6,500 words), anti-fabrication guard, revision length guard, anti-disclaimer enforcement |
| π Template Switching | neurips_2025, iclr_2026, icml_2026 β Markdown β LaTeX with math, tables, figures, cross-refs, \cite{} |
| π‘οΈ Anti-Fabrication | VerifiedRegistry enforces ground-truth experiment data in papers. Auto-diagnoses failed experiments and repairs them before writing. Unverified numbers sanitized. |
| π¦ Quality Gates | 3 human-in-the-loop gates (Stages 5, 9, 20) with rollback. Skip with --auto-approve. |
| π§ββοΈ HITL Co-Pilot | 6 intervention modes with per-stage policies. Idea Workshop, Baseline Navigator, Paper Co-Writer for deep collaboration. SmartPause, cost guardrails, escalation policies, and intervention learning for production safety. CLI/WebSocket/MCP adapters. |
| π° Cost Guardrails | Budget monitoring with configurable threshold alerts (50%/80%/100%). Pipeline auto-pauses when cost exceeds budget. |
| π Reproducibility | SHA256 checksums for all stage artifacts. Immutable manifests for verification. Multi-level undo with versioned snapshots. |
AutoResearchClaw v0.4.0 introduces a complete Human-in-the-Loop (HITL) system that transforms the pipeline from purely autonomous to a human-AI collaborative research engine. Choose your level of involvement:
| Mode | Command | What It Does |
|---|---|---|
| Full Auto | --auto-approve | Original behavior β no human intervention |
| Gate Only | --mode gate-only | Pause at 3 gate stages (5, 9, 20) for approval |
| Checkpoint | --mode checkpoint | Pause at each phase boundary (8 checkpoints) |
| Co-Pilot | --mode co-pilot | Deep collaboration at critical stages, auto elsewhere |
| Step-by-Step | --mode step-by-step | Pause after every stage β learn the pipeline |
| Express | --mode express | Quick review β only 3 most critical gates |
| Custom | --mode custom | Define per-stage policies via stage_policies config |
You: researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot
Pipeline runs Stages 1-7 automatically...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HITL | Stage 08: HYPOTHESIS_GEN β
β Post-stage review β
β β
β Hypotheses mentioned: 3 β
β Novelty score: 0.72 (moderate) β
β β
β [a] Approve [r] Reject [e] Edit [c] Collaborate β
β [i] Inject guidance [v] View output [q] Abort β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You: c (start collaborative chat)
You: Hypothesis 3 is interesting but needs Dropout/Label Smoothing as baselines
AI: Updated β added Dropout, Label Smoothing, MixUp, CutMix as baselines...
You: approve
Pipeline continues with your refined hypothesis...
# Start with HITL mode
researchclaw run --topic "..." --mode co-pilot
# Attach to a paused pipeline (from another terminal)
researchclaw attach artifacts/rc-2026-xxx
# Check pipeline and HITL status
researchclaw status artifacts/rc-2026-xxx
# Approve/reject from another terminal or script
researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
researchclaw reject artifacts/rc-2026-xxx --reason "Missing key baseline"
# Inject guidance for a stage (even before it runs)
researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Use ResNet-50 as primary baseline"
| Feature | Description |
|---|---|
| Idea Workshop | Brainstorm, evaluate, and refine hypotheses collaboratively (Stage 7-8) |
| Baseline Navigator | AI suggests baselines + human adds/removes + reproducibility checklist (Stage 9) |
| Paper Co-Writer | Section-by-section drafting with human editing and AI polishing (Stage 16-19) |
| SmartPause | Confidence-driven dynamic pausing β auto-detects when human input would help |
| Claim Verification | Inline fact-checking against collected literature β flags ungrounded claims |
| Cost Guardrails | Budget monitoring with 50%/80%/100% threshold alerts |
| Intervention Learning | ALHF β learns from your review patterns to optimize future pause decisions |
| Branch Exploration | Fork pipeline to explore multiple hypotheses, compare, merge the best |
| Escalation Policy | Tiered notification (terminal β Slack β email β auto-halt) when unattended |
| 3 Adapters | CLI (terminal), WebSocket (web dashboard), MCP (external agents) |
# config.arc.yaml
hitl:
enabled: true
mode: co-pilot # full-auto | gate-only | checkpoint | co-pilot | custom
cost_budget_usd: 50.0 # Pause when cost exceeds budget (0 = no limit)
notifications:
on_pause: true
on_quality_drop: true
channels: ["terminal"] # terminal | slack | webhook
timeouts:
default_human_timeout_sec: 86400 # 24h default wait
auto_proceed_on_timeout: false
collaboration:
max_chat_turns: 50
save_chat_history: true
# Per-stage custom policies (optional, for 'custom' mode)
stage_policies:
8: { require_approval: true, enable_collaboration: true }
9: { require_approval: true, allow_edit_output: true }
hitl.enabled: true or --mode, the pipeline behaves exactly as before.--auto-approve still works. It overrides HITL mode.AutoResearchClaw + MetaClaw = A pipeline that learns from every run.
MetaClaw adds cross-run knowledge transfer to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and warnings, converts them into reusable skills, and injects those skills into all 23 pipeline stages on subsequent runs β so the same mistakes are never repeated.
Run N executes β failures/warnings captured as Lessons
β
MetaClaw Lesson β Skill conversion
β
arc-* Skill files stored in ~/.metaclaw/skills/
β
Run N+1 β build_overlay() injects skills into every LLM prompt
β
LLM avoids known pitfalls β higher quality, fewer retries
# 1. Install MetaClaw (if not already)
pip install metaclaw
# 2. Enable in your config
# config.arc.yaml
metaclaw_bridge:
enabled: true
proxy_url: "http://localhost:30000" # MetaClaw proxy (optional)
skills_dir: "~/.metaclaw/skills" # Where skills are stored
fallback_url: "https://api.openai.com/v1" # Direct LLM fallback
fallback_api_key: "" # API key for fallback URL
lesson_to_skill:
enabled: true
min_severity: "warning" # Convert warnings + errors
max_skills_per_run: 3
# 3. Run as usual β MetaClaw works transparently
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
After each run, check ~/.metaclaw/skills/arc-*/SKILL.md to see the skills your pipeline has learned.
In controlled A/B experiments (same topic, same LLM, same configuration):
| Metric | Baseline | With MetaClaw | Improvement |
|---|---|---|---|
| Stage retry rate | 10.5% | 7.9% | -24.8% |
| Refine cycle count | 2.0 | 1.2 | -40.0% |
| Pipeline stage completion | 18/19 | 19/19 | +5.3% |
| Overall robustness score (composite) | 0.714 | 0.845 | +18.3% |
Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%).
metaclaw_bridge is absent or enabled: false, the pipeline behaves exactly as before.AutoResearchClaw now supports loading open-source and custom skills to further enhance your research experience. We also ship with 20 pre-loaded built-in skills (scientific writing, literature search, chemistry, biology, and more) as ready-to-use references, offering a high degree of flexibility out of the box. Disable any skill by adding enabled: false to its frontmatter.
Sample built-in skills:
| Category | Skill | Description |
|---|---|---|
| Writing | scientific-writing | IMRAD structure, citation formatting, reporting guidelines |
| Domain | chemistry-rdkit | Molecular analysis, SMILES, fingerprints, drug discovery |
| Experiment | literature-search | Systematic review, PRISMA methodology |
See all 20 skills with
researchclaw skills list.
# Option 1: Install a skill (persists across projects)
researchclaw skills install /path/to/my-skill/
# Option 2: Drop a SKILL.md into the project
mkdir -p .claude/skills/my-custom-skill
# Then create a SKILL.md with YAML frontmatter (name, description, trigger-keywords, applicable-stages)
# Option 3: Configure shared skill directories in config.arc.yaml
# skills:
# custom_dirs:
# - /path/to/team-shared-skills
Skills are loaded and injected into LLM prompts automatically β no manual activation needed. Use the CLI to inspect:
researchclaw skills list # Show all loaded skills with sources
researchclaw skills validate ./my-skill # Check SKILL.md format
Browse community skills: K-Dense-AI/claude-scientific-skills (150+ scientific skills across multiple disciplines).
# === Project ===
project:
name: "my-research" # Project identifier
mode: "docs-first" # docs-first | semi-auto | full-auto
# === Research ===
research:
topic: "..." # Research topic (required)
domains: ["ml", "nlp"] # Research domains for literature search
daily_paper_count: 8 # Target papers per search query
quality_threshold: 4.0 # Minimum quality score for papers
# === Runtime ===
runtime:
timezone: "America/New_York" # For timestamps
max_parallel_tasks: 3 # Concurrent experiment limit
approval_timeout_hours: 12 # Gate stage timeout
retry_limit: 2 # Retry count on stage failure
# === LLM ===
llm:
provider: "openai-compatible" # openai | openrouter | deepseek | minimax | acp | openai-compatible
base_url: "https://..." # API endpoint (required for openai-compatible)
api_key_env: "OPENAI_API_KEY" # Env var for API key (required for openai-compatible)
api_key: "" # Or hardcode key here
primary_model: "gpt-4o" # Primary model
fallback_models: ["gpt-4o-mini"] # Fallback chain
s2_api_key: "" # Semantic Scholar API key (optional, higher rate limits)
acp: # Only used when provider: "acp"
agent: "claude" # ACP agent CLI command (claude, codex, gemini, etc.)
cwd: "." # Working directory for the agent
# === Experiment ===
experiment:
mode: "sandbox" # simulated | sandbox | docker | ssh_remote
time_budget_sec: 300 # Max execution time per run (default: 300s)
max_iterations: 10 # Max optimization iterations
metric_key: "val_loss" # Primary metric name
metric_direction: "minimize" # minimize | maximize
sandbox:
python_path: ".venv/bin/python"
gpu_required: false
allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
max_memory_mb: 4096
docker:
image: "researchclaw/experiment:latest"
network_policy: "setup_only" # none | setup_only | pip_only | full
gpu_enabled: true
memory_limit_mb: 8192
auto_install_deps: true # Auto-detect imports β requirements.txt
ssh_remote:
host: "" # GPU server hostname
gpu_ids: [] # Available GPU IDs
remote_workdir: "/tmp/researchclaw_experiments"
opencode: # OpenCode Beast Mode (auto-installed via `researchclaw setup`)
enabled: true # Master switch (default: true)
auto: true # Auto-trigger without confirmation (default: true)
complexity_threshold: 0.2 # 0.0-1.0 β higher = only trigger on complex experiments
model: "" # Override model (empty = use llm.primary_model)
timeout_sec: 600 # Max seconds for OpenCode generation
max_retries: 1 # Retry count on failure
workspace_cleanup: true # Remove temp workspace after collection
code_agent: # CodeAgent v2 β multi-phase code generation
enabled: true # Use CodeAgent instead of legacy single-prompt codegen
architecture_planning: true # Generate deep implementation blueprint before coding
sequential_generation: true # Generate files one-by-one following dependency DAG
hard_validation: true # AST-based validation gates (blocks identical ablations, hardcoded metrics)
hard_validation_max_repairs: 2 # Max repair attempts when validation fails
exec_fix_max_iterations: 3 # Execution-in-the-loop fix attempts
exec_fix_timeout_sec: 60 # Timeout per exec-fix attempt
benchmark_agent: # BenchmarkAgent β automated dataset & baseline selection
enabled: true # Enable 4-agent benchmark pipeline (SurveyorβSelectorβAcquirerβValidator)
enable_hf_search: true # Search HuggingFace Datasets
enable_web_search: true # Search Google Scholar for benchmarks
tier_limit: 2 # Dataset tier filtering (1=small/cached, 2=medium, 3=large)
min_benchmarks: 1 # Minimum datasets required
min_baselines: 2 # Minimum baseline methods required
figure_agent: # FigureAgent β academic figure generation
enabled: true # Enable 5-agent figure pipeline (PlannerβCodeGenβRendererβCriticβIntegrator)
min_figures: 3 # Minimum figures to generate
max_figures: 8 # Maximum figures
max_iterations: 3 # Critic-driven refinement iterations
dpi: 300 # Output resolution
strict_mode: false # Fail pipeline if figure generation fails
repair: # Anti-fabrication experiment repair
enabled: true # Auto-diagnose and repair failed experiments
max_cycles: 3 # Repair retry loops
min_completion_rate: 0.5 # >=50% conditions must complete to proceed
min_conditions: 2 # At least 2 conditions for valid experiment
use_opencode: true # Route repairs through OpenCode Beast Mode
# === Web Search (Optional) ===
web_search:
enabled: true # Enable web-augmented literature search
tavily_api_key_env: "TAVILY_API_KEY" # Tavily API key env var (optional)
enable_scholar: true # Google Scholar search
enable_pdf_extraction: true # Extract text from PDFs
max_web_results: 10 # Max web results per query
# === Export ===
export:
target_conference: "neurips_2025" # neurips_2025 | iclr_2026 | icml_2026
authors: "Anonymous"
bib_file: "references"
# === Prompts ===
prompts:
custom_file: "" # Path to custom prompts YAML (empty = defaults)
# === HITL Co-Pilot (NEW in v0.4.0) ===
hitl:
enabled: false # Set to true to enable HITL
mode: co-pilot # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
cost_budget_usd: 0.0 # Cost limit in USD (0 = no limit)
notifications:
on_pause: true # Notify when pipeline pauses
on_quality_drop: true # Notify on quality issues
channels: ["terminal"] # terminal | slack | webhook
timeouts:
default_human_timeout_sec: 86400 # Wait up to 24h for human input
auto_proceed_on_timeout: false # If true, auto-approve on timeout
collaboration:
max_chat_turns: 50 # Max turns per collaboration session
save_chat_history: true # Persist chat logs
stage_policies: {} # Per-stage overrides (for 'custom' mode)
# === Security ===
security:
hitl_required_stages: [5, 9, 20] # Stages requiring human approval
allow_publish_without_approval: false
redact_sensitive_logs: true
# === Knowledge Base ===
knowledge_base:
backend: "markdown" # markdown | obsidian
root: "docs/kb"
# === Notifications ===
notifications:
channel: "console" # console | discord | slack
target: ""
# === MetaClaw Bridge (Optional) ===
metaclaw_bridge:
enabled: false # Set to true to enable cross-run learning
proxy_url: "http://localhost:30000" # MetaClaw proxy URL
skills_dir: "~/.metaclaw/skills" # Where arc-* skills are stored
fallback_url: "" # Direct LLM fallback when proxy is down
fallback_api_key: "" # API key for fallback endpoint
lesson_to_skill:
enabled: true # Auto-convert lessons to skills
min_severity: "warning" # Minimum severity to convert
max_skills_per_run: 3 # Max new skills per pipeline run
prm: # Process Reward Model quality gate (optional)
enabled: false # Use LLM-as-judge to score stage outputs
model: "gpt-5.4" # PRM judge model
votes: 3 # Majority vote count
gate_stages: [5, 9, 15, 20] # Stages to apply PRM gates
# === OpenClaw Bridge ===
openclaw_bridge:
use_cron: false # Scheduled research runs
use_message: false # Progress notifications
use_memory: false # Cross-session knowledge persistence
use_sessions_spawn: false # Spawn parallel sub-sessions
use_web_fetch: false # Live web search
use_browser: false # Browser-based paper collection
When project.profile=hep_ph and experiment.mode=collider_agent, the
pipeline routes Stage 12 through ColliderAgent (Lagrangian β FeynRules β
MadGraph5 β figures via Magnus cloud) instead of the default Python ML
sandbox.
Stage 10 (CODE_GENERATION) becomes a HITL gate. The pipeline pauses with
collider_plan.md open in $EDITOR so you can review or edit the physics
prompt before ColliderAgent runs. Reject sends control back to Stage 9
(EXPERIMENT_DESIGN); the hypothesis from Stage 8 stays intact.
--incremental-experiment)To add new mass points or analyses to a completed run without redoing
the heavy simulation, re-launch with --incremental-experiment and either
--from-stage CODE_GENERATION (also edit the prompt) or
--from-stage EXPERIMENT_RUN (reuse existing prompt):
python -m researchclaw run --profile hep_ph --output artifacts/<run_id> \
--from-stage CODE_GENERATION --incremental-experiment
The Stage 12 sandbox will:
stage-12/ tree to stage-12_v{N}/.collider_plan.md as collider_plan.prev.md.workspace_manifest.json listing reusable artifacts.CONTINUATION CONTEXT + PRIOR PLAN + your new delta.results.json with the snapshot's prior one
(metrics: new wins on collisions, old-only kept; artifact lists:
concat + dedupe). The merge is recorded in incremental_merge.json.Stage 13 then promotes the merged state to experiment_final/ as before.
Note: re-entering at Stage 13 alone is a no-op in collider mode and will NOT run any new physics β Stage 13 is a
shutil.copy2passthrough. PIVOT (Stage 15 decision) intentionally remains destructive because changing the hypothesis makes prior events invalid.
Inspired by:
MIT β see LICENSE for details.
If you find AutoResearchClaw useful, please cite:
@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
year={2026},
eprint={2605.20025},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.20025},
}
Built with π¦ by the AutoResearchClaw team
AI Designed Its Own Memory w/ AutoResearchClaw: OmniMEM
Discover AI Β· 6K views
AutoResearchClaw, the turbobullsh**ter
Andrea Telatin Β· 0K views
AutoResearchClaw: From Idea to Paper, Fully Autonomous
Alex To Go Eng Β· 0K views
βOne-command wrapper for AutoResearchClaw's 23-stage paper generation pipelineβ
βAs a former freelance translator (1986 to 2005, Japanese to English), I have much sympathy for the writer. But I wouldnβt be so confident that AI cannot do professional-level translation. She writes: βI adapt, I localizeβ¦β
AI
Companies use AI to filter candidates. I just gave candidates AI to choose companies. Career-Ops (career-ops.org, also known as careerops) turns any AI coding CLI into a full job search command center. Instead of manually tracking applications in a spreadsheet, you get an AI-powered pipeline that: Career-ops is agentic: Claude Code navigates career pages with Playwright, evaluates fit by reasoning about your CV vs the job description (not keyword matching), and adapts your resume per listing.
AI
CLI-Anything: Bridging the Gap Between AI Agents and the World's Software π CLI-Hub: pip install cli-anything-hub then cli-hub install β browse, install, and manage all community-built CLIs. Want to add your own? Open a PR β the hub updates instantly. π¬ See Demos: Watch AI agents use generated CLIs plus preview, live preview, and trajectory loops to produce real artifacts β CAD builds, 3D scenes, diagrams, gameplay, subtitles, and more.
AI
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan. A full, hover-to-play tour lives on the landing page (docs/index.html). Defaults work out of the box: clone, run, then configure models/search/email inside Settings. Only edit .env for deployment-level overrides like APPBIND, APPPORT, AUTHENABLED, DATABASEURL, or a pre-seeded admin password.
AI
Most AI material teaches in scattered pieces. A paper here, a fine-tuning post there, a flashy agent demo somewhere else. The pieces rarely line up. You ship a chatbot but can't explain its loss curve. You hook a function to an agent but can't say what attention does inside the model that's calling it. This curriculum is the spine. 20 phases, 503 lessons, four languages: Python, TypeScript, Rust, Julia. Linear algebra at one end, autonomous swarms at the other. Every algorithm gets built from raw math first. Backprop. Tokenizer. Attention. Agent loop. By the time PyTorch shows up, you already know what it's doing under the hood. Each lesson runs the same loop: read the problem, derive the math, write the code, run the test, keep the artifact. No five-minute videos, no copy-paste deploys,