Reports & output
Output directory
All reports are written to the configured output directory (default: semtest-results/).
semtest-results/
├── latest.md # Always generated
├── ci-results.json # Always generated
├── junit-results.xml # With --junit
├── 2026-03-03T10-30-45.md # With --timestamp
└── debug/ # With --debug
├── auth.spec.md.json
└── api.spec.md.json
Markdown report (latest.md)
Human-readable report generated on every run. Contains:
- Header — "Semantic Test Report" with run timestamp and project name
- Summary table — Total, passed, failed, errored, skipped, and invalid counts
- Test results — Grouped by directory, with per-test details
For failing tests, the report includes:
- Expectation — What the spec requires
- Observed — What the code actually does
- Location — The relevant file path
- Resolution — A suggestion to fix the issue
Passing tests are excluded by default. Use --include-passing to show them.
If validation ran and found issues, they appear in a dedicated section at the bottom.
JSON report (ci-results.json)
Machine-readable report generated on every run. Designed for CI pipelines and custom tooling.
{
"status": "pass",
"summary": {
"total": 4,
"passed": 4,
"failed": 0
},
"tests": [
{
"id": "session-expiry",
"sourceFile": "auth.spec.md",
"status": "pass",
"group": "auth"
},
{
"id": "password-hashing",
"sourceFile": "auth.spec.md",
"status": "pass",
"group": "auth"
}
]
}
Schema
| Field | Type | Description |
|---|---|---|
status |
"pass" | "fail" | "error" |
Overall run status |
summary.total |
number |
Total test count |
summary.passed |
number |
Passing test count |
summary.failed |
number |
Failing test count |
summary.errored |
number |
Error count (omitted if 0) |
summary.invalid |
number |
Invalid count (omitted if 0) |
summary.skipped |
number |
Skipped count (omitted if 0) |
tests[].id |
string |
Test scenario identifier |
tests[].sourceFile |
string |
Spec file name |
tests[].status |
string |
Scenario status |
tests[].group |
string |
Directory group (if applicable) |
tests[].location |
string |
Relevant file path (failures only) |
tests[].error |
string |
Error message (errors only) |
validation |
object |
Validation results (if validation ran) |
JUnit XML report (junit-results.xml)
Opt-in via --junit. Standard JUnit XML format compatible with GitHub Actions, GitLab CI, Jenkins, CircleCI, and other CI test reporters.
<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="4" failures="1" errors="0" skipped="0" timestamp="2026-03-03T10:30:45.000Z">
<testsuite name="auth" tests="2" failures="1" errors="0" skipped="0">
<testcase name="session-expiry" classname="auth.spec.md">
<failure message="Spec requirement not met">
Expected: Sessions expire after 30 minutes
Observed: No session timeout configured
</failure>
</testcase>
<testcase name="password-hashing" classname="auth.spec.md" />
</testsuite>
</testsuites>
Structure:
<testsuites>— Root element with aggregate counts<testsuite>— One per directory group<testcase>— One per scenario- Passing: self-closing tag
- Failing: contains
<failure>with details - Errored: contains
<error>with message - Skipped/invalid: contains
<skipped>
Timestamped reports
With --timestamp, a copy of the Markdown report is saved with an ISO-formatted filename:
semtest-results/2026-03-03T10-30-45.md
Useful for archiving results across runs. The latest.md file is always overwritten; timestamped copies are preserved.
Debug output
With --debug, a debug/ subdirectory is created containing per-test JSON files with raw LLM output.
{
"test": "auth.spec.md",
"attempts": [
{
"command": "claude",
"args": ["--model", "claude-sonnet-4-6", "--print", "..."],
"prompt": "You are a semantic test evaluator...",
"exitCode": 0,
"stdout": [{ "id": "session-expiry", "status": "pass" }],
"stderr": ""
}
]
}
Each entry in attempts represents one LLM invocation (including retries). This is useful for diagnosing parse failures, unexpected results, or LLM misbehavior.
The debug directory is cleared and recreated on each run.
Test statuses
| Status | Meaning | Exit code effect |
|---|---|---|
pass |
Spec requirement met | None (exit 0) |
fail |
Spec requirement not met | Exit 1 |
error |
LLM subprocess failure or timeout | Exit 2 |
invalid |
Not a testable specification | None (exit 2 with --strict) |
skip |
Explicitly marked to skip | None |
A run's overall status is determined by the worst status across all tests: any error means exit 2, any fail means exit 1, otherwise exit 0.