Semtest

Output directory

All reports are written to the configured output directory (default: semtest-results/).

semtest-results/
├── latest.md                      # Always generated
├── ci-results.json                # Always generated
├── junit-results.xml              # With --junit
├── 2026-03-03T10-30-45.md         # With --timestamp
└── debug/                         # With --debug
    ├── auth.spec.md.json
    └── api.spec.md.json

Markdown report (`latest.md`)

Human-readable report generated on every run. Contains:

Header — "Semantic Test Report" with run timestamp and project name
Summary table — Total, passed, failed, errored, skipped, and invalid counts
Test results — Grouped by directory, with per-test details

For failing tests, the report includes:

Expectation — What the spec requires
Observed — What the code actually does
Location — The relevant file path
Resolution — A suggestion to fix the issue

Passing tests are excluded by default. Use --include-passing to show them.

If validation ran and found issues, they appear in a dedicated section at the bottom.

JSON report (`ci-results.json`)

Machine-readable report generated on every run. Designed for CI pipelines and custom tooling.

{
  "status": "pass",
  "summary": {
    "total": 4,
    "passed": 4,
    "failed": 0
  },
  "tests": [
    {
      "id": "session-expiry",
      "sourceFile": "auth.spec.md",
      "status": "pass",
      "group": "auth"
    },
    {
      "id": "password-hashing",
      "sourceFile": "auth.spec.md",
      "status": "pass",
      "group": "auth"
    }
  ]
}

Schema

Field	Type	Description
`status`	`"pass"` \| `"fail"` \| `"error"`	Overall run status
`summary.total`	`number`	Total test count
`summary.passed`	`number`	Passing test count
`summary.failed`	`number`	Failing test count
`summary.errored`	`number`	Error count (omitted if 0)
`summary.invalid`	`number`	Invalid count (omitted if 0)
`summary.skipped`	`number`	Skipped count (omitted if 0)
`tests[].id`	`string`	Test scenario identifier
`tests[].sourceFile`	`string`	Spec file name
`tests[].status`	`string`	Scenario status
`tests[].group`	`string`	Directory group (if applicable)
`tests[].location`	`string`	Relevant file path (failures only)
`tests[].error`	`string`	Error message (errors only)
`validation`	`object`	Validation results (if validation ran)

JUnit XML report (`junit-results.xml`)

Opt-in via --junit. Standard JUnit XML format compatible with GitHub Actions, GitLab CI, Jenkins, CircleCI, and other CI test reporters.

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="4" failures="1" errors="0" skipped="0" timestamp="2026-03-03T10:30:45.000Z">
  <testsuite name="auth" tests="2" failures="1" errors="0" skipped="0">
    <testcase name="session-expiry" classname="auth.spec.md">
      <failure message="Spec requirement not met">
        Expected: Sessions expire after 30 minutes
        Observed: No session timeout configured
      </failure>
    </testcase>
    <testcase name="password-hashing" classname="auth.spec.md" />
  </testsuite>
</testsuites>

Structure:

<testsuites> — Root element with aggregate counts
<testsuite> — One per directory group
<testcase> — One per scenario
- Passing: self-closing tag
- Failing: contains <failure> with details
- Errored: contains <error> with message
- Skipped/invalid: contains <skipped>

Timestamped reports

With --timestamp, a copy of the Markdown report is saved with an ISO-formatted filename:

semtest-results/2026-03-03T10-30-45.md

Useful for archiving results across runs. The latest.md file is always overwritten; timestamped copies are preserved.

Debug output

With --debug, a debug/ subdirectory is created containing per-test JSON files with raw LLM output.

{
  "test": "auth.spec.md",
  "attempts": [
    {
      "command": "claude",
      "args": ["--model", "claude-sonnet-4-6", "--print", "..."],
      "prompt": "You are a semantic test evaluator...",
      "exitCode": 0,
      "stdout": [{ "id": "session-expiry", "status": "pass" }],
      "stderr": ""
    }
  ]
}

Each entry in attempts represents one LLM invocation (including retries). This is useful for diagnosing parse failures, unexpected results, or LLM misbehavior.

The debug directory is cleared and recreated on each run.

Test statuses

Status	Meaning	Exit code effect
`pass`	Spec requirement met	None (exit 0)
`fail`	Spec requirement not met	Exit 1
`error`	LLM subprocess failure or timeout	Exit 2
`invalid`	Not a testable specification	None (exit 2 with `--strict`)
`skip`	Explicitly marked to skip	None

A run's overall status is determined by the worst status across all tests: any error means exit 2, any fail means exit 1, otherwise exit 0.

Reports & output