Test Strategy Configuration

This tutorial covers how to configure the test_execution section of project-config.yaml, how to structure your test suite using the test pyramid, when to enable the Test Execution Bridge, and how to split tests into tiers so your CI stays fast.

Why test configuration matters

Without explicit test configuration, GAIA cannot run your tests, and /gaia-dev-story cannot verify that implementations pass before committing. The test_execution section tells GAIA what commands to run, where to run them, and what "passing" looks like.

The test_execution section

The test_execution block in .gaia/config/project-config.yaml defines how tests are discovered and executed for each stack in your project.

# .gaia/config/project-config.yaml
test_execution:
  default_command: npm test
  timeout_seconds: 300
  coverage:
    enabled: true
    threshold: 80

default_command is the fallback test command. If a stack does not specify its own test command, this one is used. timeout_seconds kills tests that hang. coverage.threshold sets the minimum coverage percentage for a pass.

Edit this section directly or use /gaia-config-test, which preserves comments and formatting in your YAML files.

The test pyramid

The test pyramid is a guideline for how many tests of each type to write. More tests at the bottom (fast, cheap) and fewer at the top (slow, expensive).

Level What it tests Speed Typical ratio
Unit Individual functions and classes in isolation Milliseconds 70%
Integration Interactions between components, database queries, API calls Seconds 20%
End-to-end Full user workflows through the real system Minutes 10%

These ratios are guidelines, not rules. A CLI tool might have 90% unit tests and 10% integration tests with no E2E. A web application might shift more weight toward integration and E2E. The goal is to keep your fast tests catching most bugs and your slow tests catching only what fast tests cannot.

Pyramid health check

If your E2E tests take longer than your unit + integration tests combined, your pyramid is inverted. This slows CI and makes failures harder to diagnose. Consider converting some E2E tests to integration tests with mocked external dependencies.

Per-stack test commands

In a multi-stack project, each stack likely uses a different test runner. Specify the test command per stack:

# .gaia/config/project-config.yaml
stacks:
  - name: frontend
    path: packages/web
    language: typescript
    test_command: npm test
  - name: backend
    path: packages/api
    language: python
    test_command: pytest --cov=src --cov-report=term-missing
  - name: mobile
    path: packages/mobile
    language: dart
    test_command: flutter test

When /gaia-dev-story runs tests, it uses the test_command for the stack relevant to the current story. If a story touches multiple stacks, all relevant test commands run.

Edit stack configuration with /gaia-config-stack.

The Test Execution Bridge

The Test Execution Bridge lets GAIA execute your test suite and interpret the results programmatically. When enabled, GAIA can parse test output, identify which tests failed and why, and use that information to suggest fixes during /gaia-dev-story.

When to enable it

  • You want GAIA to run tests automatically during story implementation.
  • You want GAIA to interpret test failures and suggest code fixes.
  • You have a test-environment.yaml file defining your test setup.

When to leave it disabled

  • You run tests manually or through a separate CI system.
  • Your test suite requires specific hardware or network access that GAIA cannot provide.
  • You are in early planning phases and have no tests yet.

Enabling the bridge

/gaia-bridge-enable

This sets test_execution_bridge.bridge_enabled: true in your project-config.yaml. The change takes effect immediately -- no restart or rebuild is needed. Disable it with /gaia-bridge-disable.

See the test-environment.yaml Reference for the full schema of the bridge manifest file.

Scaffolding with /gaia-test-strategy

If you are starting a new project or adding tests to an existing one, use /gaia-test-strategy to generate a test plan and scaffold your test framework.

# Design a test plan (analyzes your project and proposes test coverage)
/gaia-test-strategy --plan

# Scaffold the test framework (creates config files, directories, example tests)
/gaia-test-strategy --scaffold

The --plan mode reads your architecture and stories, then proposes which tests to write and where. The --scaffold mode creates the actual test framework configuration -- jest.config.js, pytest.ini, playwright.config.ts, or whatever your stack needs.

Tagging slow tests

Not all tests should run on every PR. Tag slow tests so they can be excluded from the fast PR tier and included in nightly runs.

# Jest example: tag with .slow.test.ts suffix
# jest.config.js (PR tier)
testPathIgnorePatterns: ['.*\\.slow\\.test\\.ts$']

# Pytest example: use markers
# pytest.ini
[pytest]
markers =
    slow: marks tests as slow (deselect with '-m "not slow"')

Then configure your CI to use different commands per tier:

# .gaia/config/project-config.yaml
test_execution:
  tiers:
    pr:
      command: pytest -m "not slow"
      timeout_seconds: 120
    nightly:
      command: pytest
      timeout_seconds: 600

Nightly vs PR-tier strategy

Split your test suite into two tiers based on speed and value:

Tier Runs when Includes Time budget
PR tier Every PR and push to a PR branch Lint, unit tests, fast integration tests < 5 minutes
Nightly tier Once per night on the main branch All tests: E2E, performance, security, slow integration < 30 minutes

The PR tier gives developers fast feedback on every change. The nightly tier catches regressions that fast tests miss. If the nightly run fails, the team investigates first thing in the morning.

For the nightly tier, use a scheduled trigger in your CI platform rather than GAIA's trigger configuration. GAIA generates the workflow; the schedule is a CI platform concern.

Diagnostic questions

Use these questions to evaluate your current test strategy:

  • How long does your PR CI take? If more than 10 minutes, you need test tiering.
  • What percentage of CI failures are flaky? If more than 5%, you have test isolation problems. Consider quarantining flaky tests and fixing them separately.
  • Do you have more E2E tests than unit tests? If yes, your pyramid is inverted.
  • Can you run tests locally? If not, your feedback loop is too slow. Every test that runs in CI should also run locally.
  • Do test failures tell you what broke? If you need to read the full output to understand a failure, your test names and assertions need work.
  • When did you last add a test? If you only add tests when GAIA tells you to, consider integrating TDD into your development practice.

Test jobs under the layered CI model

Under the gaia- prefix contract, the canonical test jobs (bats-tests, skills-bats-tests, the per-cluster e2e suites) live in .github/workflows/gaia-*.yml as generated jobs -- rewritten on every /gaia-config-ci --regenerate. Adding a project-specific test job (coverage upload, custom suite, contract test) goes into gaia-ci.user-jobs.yml via the stitching engine:

# .github/workflows/gaia-ci.user-jobs.yml
jobs:
  coverage-upload:
    runs-on: ubuntu-latest
    needs: [bats-tests]
    steps:
      - uses: actions/checkout@v4
      - uses: codecov/codecov-action@v4

Per-job setup steps (e.g., language toolchain install) shared across managed test runs go into gaia-ci.user-steps.yml:

steps_before_gaia:
  - uses: actions/setup-node@v4
    with: { node-version: '22' }
steps_after_gaia:
  - run: echo "all managed test steps complete"

Protected test jobs (cannot disable)

Five test-adjacent jobs are protected from the ci_cd.template_overrides.disable: list: commitlint, boundary-guard, no-claude-attribution, secrets-scan, credential-audit. The schema rejects any attempt to disable them, AND the regen-time helper rejects hyphen+case-canonicalized bypass attempts (commit-lint, Commit-Lint) as defense in depth.

Migration from a legacy test surface

If your .github/workflows/ currently has unprefixed test-running files (e.g., ci.yml, test.yml), the first /gaia-config-ci --regenerate after upgrade fires the auto-rename flow: per-file, you choose to rename to gaia-{base}.yml + scaffold overlays, rename to user-{base}.yml, or skip and defer. Backup-first (.gaia-backup/ci-regen-{ts}/ with a sha256 manifest) so a misclassification is recoverable.

What to read next