Tools & models

Supported tools

Semtest supports 14 LLM CLI tools. Each requires its respective CLI installed and available on your PATH.

Tool CLI command Prompt delivery
Claude Code claude Command-line argument
Gemini CLI gemini stdin
Codex CLI codex stdin
Aider aider --message argument
OpenCode opencode stdin
Goose goose stdin
Crush crush stdin
Qwen qwen stdin
GitHub Copilot copilot stdin
Forge forge stdin
Plandex plandex stdin
OpenHands openhands stdin
Cursor agent stdin
Amp amp stdin

Model keys

Instead of choosing a runner and capability tier separately, you specify a single model key that identifies both the tool and the model. Set it in your config:

export default defineConfig({
  llm: "claude-code-sonnet-4-6",
});

Or override per-test via frontmatter:

---
llm: gemini-2.5-pro
---

Run semtest list to see all available model keys, or semtest list --json for machine-readable output.

Claude Code

Key Model
claude-code-opus-4-6 claude-opus-4-6
claude-code-sonnet-4-6 claude-sonnet-4-6
claude-code-sonnet-4-5 claude-sonnet-4-5-20250929
claude-code-haiku-4-5 claude-haiku-4-5-20251001

Gemini CLI

Key Model
gemini-2.5-pro gemini-2.5-pro
gemini-2.5-flash gemini-2.5-flash
gemini-2.5-flash-lite gemini-2.5-flash-lite
gemini-2.0-flash gemini-2.0-flash

Codex CLI

Key Model
codex-o3 o3
codex-o4-mini o4-mini
codex-gpt-4.1 gpt-4.1
codex-gpt-4.1-mini gpt-4.1-mini
codex-gpt-4.1-nano gpt-4.1-nano

Aider

Key Model
aider-claude-opus-4-6 claude-opus-4-6
aider-claude-sonnet-4-6 claude-sonnet-4-6
aider-claude-sonnet-4-5 claude-sonnet-4-5
aider-claude-haiku-4-5 claude-haiku-4-5
aider-gpt-4.1 gpt-4.1
aider-gpt-4.1-mini gpt-4.1-mini
aider-o3 o3
aider-o4-mini o4-mini
aider-gemini-2.5-pro gemini-2.5-pro
aider-gemini-2.5-flash gemini-2.5-flash
aider-deepseek-r1 deepseek-r1
aider-deepseek-v3 deepseek-v3

OpenCode

Key Model
opencode-claude-opus-4-6 claude-opus-4-6
opencode-claude-sonnet-4-6 claude-sonnet-4-6
opencode-claude-sonnet-4-5 claude-sonnet-4-5-20250929
opencode-claude-haiku-4-5 claude-haiku-4-5-20251001
opencode-gpt-4.1 gpt-4.1
opencode-gpt-4.1-mini gpt-4.1-mini
opencode-o3 o3
opencode-o4-mini o4-mini
opencode-gemini-2.5-pro gemini-2.5-pro
opencode-gemini-2.5-flash gemini-2.5-flash

Goose

Key Model
goose-claude-opus-4-6 claude-opus-4-6
goose-claude-sonnet-4-6 claude-sonnet-4-6
goose-claude-sonnet-4-5 claude-sonnet-4-5-20250929
goose-claude-haiku-4-5 claude-haiku-4-5-20251001
goose-gpt-4.1 gpt-4.1
goose-gpt-4.1-mini gpt-4.1-mini
goose-o3 o3
goose-o4-mini o4-mini
goose-gemini-2.5-pro gemini-2.5-pro
goose-gemini-2.5-flash gemini-2.5-flash

Crush

Key Model
crush-claude-opus-4-6 claude-opus-4-6
crush-claude-sonnet-4-6 claude-sonnet-4-6
crush-claude-sonnet-4-5 claude-sonnet-4-5-20250929
crush-claude-haiku-4-5 claude-haiku-4-5-20251001
crush-gpt-4.1 gpt-4.1
crush-gpt-4.1-mini gpt-4.1-mini
crush-o3 o3
crush-o4-mini o4-mini
crush-gemini-2.5-pro gemini-2.5-pro
crush-gemini-2.5-flash gemini-2.5-flash

Qwen

Key Model
qwen3-coder-plus qwen3-coder-plus
qwen3-coder qwen3-coder
qwen3-coder-fast qwen3-coder-fast

GitHub Copilot

Key Model
copilot-claude-opus-4-6 claude-opus-4-6
copilot-claude-sonnet-4-6 claude-sonnet-4-6
copilot-claude-sonnet-4-5 claude-sonnet-4-5-20250929
copilot-gpt-4.1 gpt-4.1
copilot-gpt-4.1-mini gpt-4.1-mini
copilot-o3 o3
copilot-o4-mini o4-mini
copilot-gemini-2.5-pro gemini-2.5-pro

Forge

Key Model
forge-claude-opus-4-6 claude-opus-4-6
forge-claude-sonnet-4-6 claude-sonnet-4-6
forge-claude-sonnet-4-5 claude-sonnet-4-5-20250929
forge-claude-haiku-4-5 claude-haiku-4-5-20251001
forge-gpt-4.1 gpt-4.1
forge-gpt-4.1-mini gpt-4.1-mini
forge-o3 o3
forge-o4-mini o4-mini
forge-gemini-2.5-pro gemini-2.5-pro
forge-gemini-2.5-flash gemini-2.5-flash

Plandex, OpenHands, Cursor, Amp

These tools use their default model configuration and do not accept a model parameter:

Key Tool
plandex-default Plandex
openhands-default OpenHands
cursor-default Cursor
amp-default Amp

Choosing a tool and model

Use whichever LLM CLI you already have installed. All tools receive the same prompt and are expected to produce the same JSON output format. The choice comes down to:

  • Which LLM provider you have access to
  • Which CLI tool is already on your machine
  • Model quality and cost preferences

As a general rule:

Model size Trade-off When to use
Large (Opus, o3, 2.5-pro) Most capable, slowest, most expensive Critical specs, complex codebases, highest accuracy needed
Medium (Sonnet, o4-mini, 2.5-flash, gpt-4.1) Good accuracy, moderate cost Daily development (default)
Small (Haiku, gpt-4.1-mini/nano, 2.5-flash-lite) Fastest, cheapest, least capable Large test suites, quick iteration, cost-sensitive environments

Permission bypass

LLM CLIs often prompt for user confirmation before executing actions. In automated contexts (CI, batch runs), these prompts block execution. The skipPermissionsIfPossible config option (or --skip-permissions-if-possible CLI flag) tells semtest to append the tool-specific permission-bypass flag when available. Tools without a bypass flag are silently unaffected.

Tool Flag appended Notes
Claude Code --dangerously-skip-permissions
Gemini CLI -y
Codex CLI --dangerously-bypass-approvals-and-sandbox
Aider --yes-always Added alongside existing --yes
Cursor --force
Amp (none) Auto-executes by default
OpenHands (none) Docker sandbox
Goose (none) Headless auto-execute
OpenCode, Crush, Qwen, Copilot, Forge, Plandex (none) No known flag — silently ignored

Prerequisites

Semtest does not install LLM CLIs for you. Each tool's CLI must be installed separately:

Verify a CLI is available:

which claude   # or gemini, codex, aider, opencode, goose, etc.

If the command is not found, semtest will exit with code 2 and an error message indicating the CLI is not available.