Supported tools
Semtest supports 14 LLM CLI tools. Each requires its respective CLI installed and available on your PATH.
| Tool |
CLI command |
Prompt delivery |
| Claude Code |
claude |
Command-line argument |
| Gemini CLI |
gemini |
stdin |
| Codex CLI |
codex |
stdin |
| Aider |
aider |
--message argument |
| OpenCode |
opencode |
stdin |
| Goose |
goose |
stdin |
| Crush |
crush |
stdin |
| Qwen |
qwen |
stdin |
| GitHub Copilot |
copilot |
stdin |
| Forge |
forge |
stdin |
| Plandex |
plandex |
stdin |
| OpenHands |
openhands |
stdin |
| Cursor |
agent |
stdin |
| Amp |
amp |
stdin |
Model keys
Instead of choosing a runner and capability tier separately, you specify a single model key that identifies both the tool and the model. Set it in your config:
export default defineConfig({
llm: "claude-code-sonnet-4-6",
});
Or override per-test via frontmatter:
---
llm: gemini-2.5-pro
---
Run semtest list to see all available model keys, or semtest list --json for machine-readable output.
Claude Code
| Key |
Model |
claude-code-opus-4-6 |
claude-opus-4-6 |
claude-code-sonnet-4-6 |
claude-sonnet-4-6 |
claude-code-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
claude-code-haiku-4-5 |
claude-haiku-4-5-20251001 |
Gemini CLI
| Key |
Model |
gemini-2.5-pro |
gemini-2.5-pro |
gemini-2.5-flash |
gemini-2.5-flash |
gemini-2.5-flash-lite |
gemini-2.5-flash-lite |
gemini-2.0-flash |
gemini-2.0-flash |
Codex CLI
| Key |
Model |
codex-o3 |
o3 |
codex-o4-mini |
o4-mini |
codex-gpt-4.1 |
gpt-4.1 |
codex-gpt-4.1-mini |
gpt-4.1-mini |
codex-gpt-4.1-nano |
gpt-4.1-nano |
Aider
| Key |
Model |
aider-claude-opus-4-6 |
claude-opus-4-6 |
aider-claude-sonnet-4-6 |
claude-sonnet-4-6 |
aider-claude-sonnet-4-5 |
claude-sonnet-4-5 |
aider-claude-haiku-4-5 |
claude-haiku-4-5 |
aider-gpt-4.1 |
gpt-4.1 |
aider-gpt-4.1-mini |
gpt-4.1-mini |
aider-o3 |
o3 |
aider-o4-mini |
o4-mini |
aider-gemini-2.5-pro |
gemini-2.5-pro |
aider-gemini-2.5-flash |
gemini-2.5-flash |
aider-deepseek-r1 |
deepseek-r1 |
aider-deepseek-v3 |
deepseek-v3 |
OpenCode
| Key |
Model |
opencode-claude-opus-4-6 |
claude-opus-4-6 |
opencode-claude-sonnet-4-6 |
claude-sonnet-4-6 |
opencode-claude-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
opencode-claude-haiku-4-5 |
claude-haiku-4-5-20251001 |
opencode-gpt-4.1 |
gpt-4.1 |
opencode-gpt-4.1-mini |
gpt-4.1-mini |
opencode-o3 |
o3 |
opencode-o4-mini |
o4-mini |
opencode-gemini-2.5-pro |
gemini-2.5-pro |
opencode-gemini-2.5-flash |
gemini-2.5-flash |
Goose
| Key |
Model |
goose-claude-opus-4-6 |
claude-opus-4-6 |
goose-claude-sonnet-4-6 |
claude-sonnet-4-6 |
goose-claude-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
goose-claude-haiku-4-5 |
claude-haiku-4-5-20251001 |
goose-gpt-4.1 |
gpt-4.1 |
goose-gpt-4.1-mini |
gpt-4.1-mini |
goose-o3 |
o3 |
goose-o4-mini |
o4-mini |
goose-gemini-2.5-pro |
gemini-2.5-pro |
goose-gemini-2.5-flash |
gemini-2.5-flash |
Crush
| Key |
Model |
crush-claude-opus-4-6 |
claude-opus-4-6 |
crush-claude-sonnet-4-6 |
claude-sonnet-4-6 |
crush-claude-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
crush-claude-haiku-4-5 |
claude-haiku-4-5-20251001 |
crush-gpt-4.1 |
gpt-4.1 |
crush-gpt-4.1-mini |
gpt-4.1-mini |
crush-o3 |
o3 |
crush-o4-mini |
o4-mini |
crush-gemini-2.5-pro |
gemini-2.5-pro |
crush-gemini-2.5-flash |
gemini-2.5-flash |
Qwen
| Key |
Model |
qwen3-coder-plus |
qwen3-coder-plus |
qwen3-coder |
qwen3-coder |
qwen3-coder-fast |
qwen3-coder-fast |
GitHub Copilot
| Key |
Model |
copilot-claude-opus-4-6 |
claude-opus-4-6 |
copilot-claude-sonnet-4-6 |
claude-sonnet-4-6 |
copilot-claude-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
copilot-gpt-4.1 |
gpt-4.1 |
copilot-gpt-4.1-mini |
gpt-4.1-mini |
copilot-o3 |
o3 |
copilot-o4-mini |
o4-mini |
copilot-gemini-2.5-pro |
gemini-2.5-pro |
Forge
| Key |
Model |
forge-claude-opus-4-6 |
claude-opus-4-6 |
forge-claude-sonnet-4-6 |
claude-sonnet-4-6 |
forge-claude-sonnet-4-5 |
claude-sonnet-4-5-20250929 |
forge-claude-haiku-4-5 |
claude-haiku-4-5-20251001 |
forge-gpt-4.1 |
gpt-4.1 |
forge-gpt-4.1-mini |
gpt-4.1-mini |
forge-o3 |
o3 |
forge-o4-mini |
o4-mini |
forge-gemini-2.5-pro |
gemini-2.5-pro |
forge-gemini-2.5-flash |
gemini-2.5-flash |
Plandex, OpenHands, Cursor, Amp
These tools use their default model configuration and do not accept a model parameter:
| Key |
Tool |
plandex-default |
Plandex |
openhands-default |
OpenHands |
cursor-default |
Cursor |
amp-default |
Amp |
Choosing a tool and model
Use whichever LLM CLI you already have installed. All tools receive the same prompt and are expected to produce the same JSON output format. The choice comes down to:
- Which LLM provider you have access to
- Which CLI tool is already on your machine
- Model quality and cost preferences
As a general rule:
| Model size |
Trade-off |
When to use |
| Large (Opus, o3, 2.5-pro) |
Most capable, slowest, most expensive |
Critical specs, complex codebases, highest accuracy needed |
| Medium (Sonnet, o4-mini, 2.5-flash, gpt-4.1) |
Good accuracy, moderate cost |
Daily development (default) |
| Small (Haiku, gpt-4.1-mini/nano, 2.5-flash-lite) |
Fastest, cheapest, least capable |
Large test suites, quick iteration, cost-sensitive environments |
Permission bypass
LLM CLIs often prompt for user confirmation before executing actions. In automated contexts (CI, batch runs), these prompts block execution. The skipPermissionsIfPossible config option (or --skip-permissions-if-possible CLI flag) tells semtest to append the tool-specific permission-bypass flag when available. Tools without a bypass flag are silently unaffected.
| Tool |
Flag appended |
Notes |
| Claude Code |
--dangerously-skip-permissions |
|
| Gemini CLI |
-y |
|
| Codex CLI |
--dangerously-bypass-approvals-and-sandbox |
|
| Aider |
--yes-always |
Added alongside existing --yes |
| Cursor |
--force |
|
| Amp |
(none) |
Auto-executes by default |
| OpenHands |
(none) |
Docker sandbox |
| Goose |
(none) |
Headless auto-execute |
| OpenCode, Crush, Qwen, Copilot, Forge, Plandex |
(none) |
No known flag — silently ignored |
Prerequisites
Semtest does not install LLM CLIs for you. Each tool's CLI must be installed separately:
Verify a CLI is available:
which claude
If the command is not found, semtest will exit with code 2 and an error message indicating the CLI is not available.