The what and why
What is a semtest?
Semtests, short for semantic tests, are natural-language test cases run against your codebase by an AI (LLM) of your choosing, on your machine, verifying that your code meets the specified assertions. Because the LLM can read and understand your code, has world knowledge including good practices and patterns, can run shell commands, and even search the web, semtests open the door to another layer of assurance.
# Authentication
## Expectation
All API routes under /api/admin require authentication middleware.
That's a semtest — a statement of what should be true about the code, evaluated by an LLM alongside your existing test suite.
Why this matters
Semtests add a new testing vector that complements what you already have. There are things about your codebase that matter deeply — security posture, structural consistency, content accuracy — that are difficult to express with conventional assertions. Semtest gives you a way to automate those checks too.
Because AI can understand our codebases, run shell commands, search the web, and much more, it can automate and repeat checks that were previously manual and often skipped — ensuring that as our codebases evolve, they maintain the rigidity, functionality, and overall quality we expect.
Semtest examples
Below are the semtest selling point. Examples that illustrate the value these can provide to any codebase.
# Security
No sensitive information is hardcoded. All secrets and config
values are pulled from environment variables or appConfig.
# Content accuracy
The contact information displayed in our application matches
the contact details published on our main site https://example.com.
# Internationalisation
Each of our provided translations are close in meaning to
each other and to the English source strings.
# Functional
The checkout flow applies the correct tax rate for each
supported region before processing payment.
# Copy quality
There are no spelling mistakes in any user-facing text
across the application.
# API Architecture
All API endpoints are built using the route, controller, service
pattern — are fully typesafe, use Zod to define request and response
schemas, are OpenAPI compliant, and are exposed through our
swagger/contract documentation.
These aren't hypothetical — they're the kind of requirements that teams already care about but previously had no way to enforce automatically. Semtest makes them repeatable, automated, and optionally part of your CI pipeline so that as your codebase grows, the quality standards you set on day one still hold.
Every run produces a clear verdict for each scenario — pass, fail, or error — along with the LLM's reasoning. Results are output as a human-readable Markdown report, a machine-readable JSON file for pipelines, and optionally JUnit XML for CI test reporters.
✔ auth.spec.md · 2 passed
✘ i18n.spec.md · 1 passed · 1 failed
✔ security.spec.md · 3 passed
5 passed · 1 failed · 3 specs
For a full breakdown of every output format and what's in it, see Reports & Output.
Semtests vs conventional tests
While semtests are easier to write and can overlap with some of the capabilities conventional tests have, they are not a replacement for unit tests, integration tests, type checking, or other established testing methods. Those still provide an excellent level of confidence and a manner of execution that semtests cannot match. Where unit, integration, and e2e testing methods can be applied, they should be applied. Semtests exist to cover the gaps they leave and test the wide surface area they do not cover at all, as illustrated in the examples above. Users find major quality improvements when using both in tandem.
How it works
Semtest is open source and doesn't connect you to any cloud service of ours. It orchestrates LLM CLI tools you already have installed locally — Claude Code, Gemini CLI, Codex, Aider, OpenCode, Goose, and 8 more. There is no middle layer; semtest just coordinates.
- You write spec files (
.spec.md) describing expectations in plain language semtest rundiscovers your specs and builds prompts from them- Your local LLM CLI runs in your project directory, reads the spec, examines the codebase, and returns pass/fail verdicts
- semtest parses the verdicts and generates reports (Markdown, JSON, JUnit XML)
Your code never passes through semtest. The LLM CLI runs as a subprocess in your project directory and reads files through its own workspace access — the same way it would in an interactive session. You use your own API keys, your own rate limits, and your own model configuration. Semtest just turns that into a structured testing workflow.
Key terms
- Spec file — A
.spec.*file containing one or more test scenarios. Written in Markdown, plain text, or any format an LLM can parse. - Scenario — A single testable expectation within a spec file. The LLM identifies scenarios from headings, numbered items, or other structural markers.
- Tool — The LLM CLI that evaluates your code. Semtest supports 14 tools including Claude Code, Gemini CLI, Codex, Aider, and more.
- Model key — A string like
claude-code-sonnet-4-6orgemini-2.5-prothat identifies both the tool and the model. Set via thellmconfig property or per-test frontmatter. Runsemtest listto see all options. - Status — The verdict for each scenario:
pass,fail,error,invalid, orskip.
The workflow
# Scaffold a new setup
semtest init
# Run all tests
semtest run
# Run specific tests
semtest run semtests/auth.spec.md
# Run with JUnit output for CI
semtest run --junit
Results go to semtest-results/ by default:
latest.md— Human-readable Markdown reportci-results.json— Machine-readable JSON for pipelinesjunit-results.xml— JUnit XML for CI test reporters (opt-in)