Reports & output

Output directory

All reports are written to the configured output directory (default: semtest-results/).

semtest-results/
├── latest.md                      # Always generated
├── ci-results.json                # Always generated
├── junit-results.xml              # With --junit
├── 2026-03-03T10-30-45.md         # With --timestamp
└── debug/                         # With --debug
    ├── auth.spec.md.json
    └── api.spec.md.json

Markdown report (latest.md)

Human-readable report generated on every run. Contains:

  • Header — "Semantic Test Report" with run timestamp and project name
  • Summary table — Total, passed, failed, errored, skipped, and invalid counts
  • Test results — Grouped by directory, with per-test details

For failing tests, the report includes:

  • Expectation — What the spec requires
  • Observed — What the code actually does
  • Location — The relevant file path
  • Resolution — A suggestion to fix the issue

Passing tests are excluded by default. Use --include-passing to show them.

If validation ran and found issues, they appear in a dedicated section at the bottom.


JSON report (ci-results.json)

Machine-readable report generated on every run. Designed for CI pipelines and custom tooling.

{
  "status": "pass",
  "summary": {
    "total": 4,
    "passed": 4,
    "failed": 0
  },
  "tests": [
    {
      "id": "session-expiry",
      "sourceFile": "auth.spec.md",
      "status": "pass",
      "group": "auth"
    },
    {
      "id": "password-hashing",
      "sourceFile": "auth.spec.md",
      "status": "pass",
      "group": "auth"
    }
  ]
}

Schema

Field Type Description
status "pass" | "fail" | "error" Overall run status
summary.total number Total test count
summary.passed number Passing test count
summary.failed number Failing test count
summary.errored number Error count (omitted if 0)
summary.invalid number Invalid count (omitted if 0)
summary.skipped number Skipped count (omitted if 0)
tests[].id string Test scenario identifier
tests[].sourceFile string Spec file name
tests[].status string Scenario status
tests[].group string Directory group (if applicable)
tests[].location string Relevant file path (failures only)
tests[].error string Error message (errors only)
validation object Validation results (if validation ran)

JUnit XML report (junit-results.xml)

Opt-in via --junit. Standard JUnit XML format compatible with GitHub Actions, GitLab CI, Jenkins, CircleCI, and other CI test reporters.

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="4" failures="1" errors="0" skipped="0" timestamp="2026-03-03T10:30:45.000Z">
  <testsuite name="auth" tests="2" failures="1" errors="0" skipped="0">
    <testcase name="session-expiry" classname="auth.spec.md">
      <failure message="Spec requirement not met">
        Expected: Sessions expire after 30 minutes
        Observed: No session timeout configured
      </failure>
    </testcase>
    <testcase name="password-hashing" classname="auth.spec.md" />
  </testsuite>
</testsuites>

Structure:

  • <testsuites> — Root element with aggregate counts
  • <testsuite> — One per directory group
  • <testcase> — One per scenario
    • Passing: self-closing tag
    • Failing: contains <failure> with details
    • Errored: contains <error> with message
    • Skipped/invalid: contains <skipped>

Timestamped reports

With --timestamp, a copy of the Markdown report is saved with an ISO-formatted filename:

semtest-results/2026-03-03T10-30-45.md

Useful for archiving results across runs. The latest.md file is always overwritten; timestamped copies are preserved.


Debug output

With --debug, a debug/ subdirectory is created containing per-test JSON files with raw LLM output.

{
  "test": "auth.spec.md",
  "attempts": [
    {
      "command": "claude",
      "args": ["--model", "claude-sonnet-4-6", "--print", "..."],
      "prompt": "You are a semantic test evaluator...",
      "exitCode": 0,
      "stdout": [{ "id": "session-expiry", "status": "pass" }],
      "stderr": ""
    }
  ]
}

Each entry in attempts represents one LLM invocation (including retries). This is useful for diagnosing parse failures, unexpected results, or LLM misbehavior.

The debug directory is cleared and recreated on each run.


Test statuses

Status Meaning Exit code effect
pass Spec requirement met None (exit 0)
fail Spec requirement not met Exit 1
error LLM subprocess failure or timeout Exit 2
invalid Not a testable specification None (exit 2 with --strict)
skip Explicitly marked to skip None

A run's overall status is determined by the worst status across all tests: any error means exit 2, any fail means exit 1, otherwise exit 0.