Semtest

What is a semtest?

Semtests, short for semantic tests, are natural-language test cases run against your codebase by an AI (LLM) of your choosing, on your machine, verifying that your code meets the specified assertions. Because the LLM can read and understand your code, has world knowledge including good practices and patterns, can run shell commands, and even search the web, semtests open the door to another layer of assurance.

# Authentication

## Expectation

All API routes under /api/admin require authentication middleware.

That's a semtest — a statement of what should be true about the code, evaluated by an LLM alongside your existing test suite.

Why this matters

Semtests add a new testing vector that complements what you already have. There are things about your codebase that matter deeply — security posture, structural consistency, content accuracy — that are difficult to express with conventional assertions. Semtest gives you a way to automate those checks too.

Because AI can understand our codebases, run shell commands, search the web, and much more, it can automate and repeat checks that were previously manual and often skipped — ensuring that as our codebases evolve, they maintain the rigidity, functionality, and overall quality we expect.

Semtest examples

Below are the semtest selling point. Examples that illustrate the value these can provide to any codebase.

# Security

No sensitive information is hardcoded. All secrets and config
values are pulled from environment variables or appConfig.

# Content accuracy

The contact information displayed in our application matches
the contact details published on our main site https://example.com.

# Internationalisation

Each of our provided translations are close in meaning to
each other and to the English source strings.

# Functional

The checkout flow applies the correct tax rate for each
supported region before processing payment.

# Copy quality

There are no spelling mistakes in any user-facing text
across the application.

# API Architecture

All API endpoints are built using the route, controller, service
pattern — are fully typesafe, use Zod to define request and response
schemas, are OpenAPI compliant, and are exposed through our
swagger/contract documentation.

These aren't hypothetical — they're the kind of requirements that teams already care about but previously had no way to enforce automatically. Semtest makes them repeatable, automated, and optionally part of your CI pipeline so that as your codebase grows, the quality standards you set on day one still hold.

Every run produces a clear verdict for each scenario — pass, fail, or error — along with the LLM's reasoning. Results are output as a human-readable Markdown report, a machine-readable JSON file for pipelines, and optionally JUnit XML for CI test reporters.

 ✔ auth.spec.md · 2 passed
 ✘ i18n.spec.md · 1 passed · 1 failed
 ✔ security.spec.md · 3 passed

 5 passed · 1 failed · 3 specs

For a full breakdown of every output format and what's in it, see Reports & Output.

Semtests vs conventional tests

While semtests are easier to write and can overlap with some of the capabilities conventional tests have, they are not a replacement for unit tests, integration tests, type checking, or other established testing methods. Those still provide an excellent level of confidence and a manner of execution that semtests cannot match. Where unit, integration, and e2e testing methods can be applied, they should be applied. Semtests exist to cover the gaps they leave and test the wide surface area they do not cover at all, as illustrated in the examples above. Users find major quality improvements when using both in tandem.

How it works

Semtest is open source and doesn't connect you to any cloud service of ours. It orchestrates LLM CLI tools you already have installed locally — Claude Code, Gemini CLI, Codex, Aider, OpenCode, Goose, and 8 more. There is no middle layer; semtest just coordinates.

You write spec files (.spec.md) describing expectations in plain language
semtest run discovers your specs and builds prompts from them
Your local LLM CLI runs in your project directory, reads the spec, examines the codebase, and returns pass/fail verdicts
semtest parses the verdicts and generates reports (Markdown, JSON, JUnit XML)

Your code never passes through semtest. The LLM CLI runs as a subprocess in your project directory and reads files through its own workspace access — the same way it would in an interactive session. You use your own API keys, your own rate limits, and your own model configuration. Semtest just turns that into a structured testing workflow.

Key terms

Spec file — A .spec.* file containing one or more test scenarios. Written in Markdown, plain text, or any format an LLM can parse.
Scenario — A single testable expectation within a spec file. The LLM identifies scenarios from headings, numbered items, or other structural markers.
Tool — The LLM CLI that evaluates your code. Semtest supports 14 tools including Claude Code, Gemini CLI, Codex, Aider, and more.
Model key — A string like claude-code-sonnet-4-6 or gemini-2.5-pro that identifies both the tool and the model. Set via the llm config property or per-test frontmatter. Run semtest list to see all options.
Status — The verdict for each scenario: pass, fail, error, invalid, or skip.

The workflow

# Scaffold a new setup
semtest init

# Run all tests
semtest run

# Run specific tests
semtest run semtests/auth.spec.md

# Run with JUnit output for CI
semtest run --junit

Results go to semtest-results/ by default:

latest.md — Human-readable Markdown report
ci-results.json — Machine-readable JSON for pipelines
junit-results.xml — JUnit XML for CI test reporters (opt-in)