Test Strategy Configuration
This tutorial covers how to configure the test_execution
section of project-config.yaml, how to structure your
test suite using the test pyramid, when to enable the Test Execution
Bridge, and how to split tests into tiers so your CI stays fast.
Why test configuration matters
Without explicit test configuration, GAIA cannot run your tests, and
/gaia-dev-story
cannot verify that implementations pass before committing. The
test_execution section tells GAIA what commands to run,
where to run them, and what "passing" looks like.
The test_execution section
The test_execution block in
.gaia/config/project-config.yaml defines how tests are
discovered and executed for each stack in your project.
# .gaia/config/project-config.yaml
test_execution:
default_command: npm test
timeout_seconds: 300
coverage:
enabled: true
threshold: 80
default_command is the fallback test command. If a stack
does not specify its own test command, this one is used.
timeout_seconds kills tests that hang.
coverage.threshold sets the minimum coverage percentage
for a pass.
Edit this section directly or use
/gaia-config-test,
which preserves comments and formatting in your YAML files.
The test pyramid
The test pyramid is a guideline for how many tests of each type to write. More tests at the bottom (fast, cheap) and fewer at the top (slow, expensive).
| Level | What it tests | Speed | Typical ratio |
|---|---|---|---|
| Unit | Individual functions and classes in isolation | Milliseconds | 70% |
| Integration | Interactions between components, database queries, API calls | Seconds | 20% |
| End-to-end | Full user workflows through the real system | Minutes | 10% |
These ratios are guidelines, not rules. A CLI tool might have 90% unit tests and 10% integration tests with no E2E. A web application might shift more weight toward integration and E2E. The goal is to keep your fast tests catching most bugs and your slow tests catching only what fast tests cannot.
Pyramid health check
If your E2E tests take longer than your unit + integration tests combined, your pyramid is inverted. This slows CI and makes failures harder to diagnose. Consider converting some E2E tests to integration tests with mocked external dependencies.
Per-stack test commands
In a multi-stack project, each stack likely uses a different test runner. Specify the test command per stack:
# .gaia/config/project-config.yaml
stacks:
- name: frontend
path: packages/web
language: typescript
test_command: npm test
- name: backend
path: packages/api
language: python
test_command: pytest --cov=src --cov-report=term-missing
- name: mobile
path: packages/mobile
language: dart
test_command: flutter test
When
/gaia-dev-story
runs tests, it uses the test_command for the stack
relevant to the current story. If a story touches multiple stacks,
all relevant test commands run.
Edit stack configuration with
/gaia-config-stack.
The Test Execution Bridge
The Test Execution Bridge lets GAIA execute your test suite and
interpret the results programmatically. When enabled, GAIA can parse
test output, identify which tests failed and why, and use that
information to suggest fixes during
/gaia-dev-story.
When to enable it
- You want GAIA to run tests automatically during story implementation.
- You want GAIA to interpret test failures and suggest code fixes.
- You have a
test-environment.yamlfile defining your test setup.
When to leave it disabled
- You run tests manually or through a separate CI system.
- Your test suite requires specific hardware or network access that GAIA cannot provide.
- You are in early planning phases and have no tests yet.
Enabling the bridge
/gaia-bridge-enable
This sets test_execution_bridge.bridge_enabled: true in
your project-config.yaml. The change takes effect
immediately -- no restart or rebuild is needed. Disable it with
/gaia-bridge-disable.
See the test-environment.yaml Reference for the full schema of the bridge manifest file.
Scaffolding with /gaia-test-strategy
If you are starting a new project or adding tests to an existing one,
use
/gaia-test-strategy
to generate a test plan and scaffold your test framework.
# Design a test plan (analyzes your project and proposes test coverage)
/gaia-test-strategy --plan
# Scaffold the test framework (creates config files, directories, example tests)
/gaia-test-strategy --scaffold
The --plan mode reads your architecture and stories,
then proposes which tests to write and where. The
--scaffold mode creates the actual test framework
configuration -- jest.config.js,
pytest.ini, playwright.config.ts, or
whatever your stack needs.
Tagging slow tests
Not all tests should run on every PR. Tag slow tests so they can be excluded from the fast PR tier and included in nightly runs.
# Jest example: tag with .slow.test.ts suffix
# jest.config.js (PR tier)
testPathIgnorePatterns: ['.*\\.slow\\.test\\.ts$']
# Pytest example: use markers
# pytest.ini
[pytest]
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
Then configure your CI to use different commands per tier:
# .gaia/config/project-config.yaml
test_execution:
tiers:
pr:
command: pytest -m "not slow"
timeout_seconds: 120
nightly:
command: pytest
timeout_seconds: 600
Nightly vs PR-tier strategy
Split your test suite into two tiers based on speed and value:
| Tier | Runs when | Includes | Time budget |
|---|---|---|---|
| PR tier | Every PR and push to a PR branch | Lint, unit tests, fast integration tests | < 5 minutes |
| Nightly tier | Once per night on the main branch | All tests: E2E, performance, security, slow integration | < 30 minutes |
The PR tier gives developers fast feedback on every change. The nightly tier catches regressions that fast tests miss. If the nightly run fails, the team investigates first thing in the morning.
For the nightly tier, use a scheduled trigger in your CI platform rather than GAIA's trigger configuration. GAIA generates the workflow; the schedule is a CI platform concern.
Diagnostic questions
Use these questions to evaluate your current test strategy:
- How long does your PR CI take? If more than 10 minutes, you need test tiering.
- What percentage of CI failures are flaky? If more than 5%, you have test isolation problems. Consider quarantining flaky tests and fixing them separately.
- Do you have more E2E tests than unit tests? If yes, your pyramid is inverted.
- Can you run tests locally? If not, your feedback loop is too slow. Every test that runs in CI should also run locally.
- Do test failures tell you what broke? If you need to read the full output to understand a failure, your test names and assertions need work.
- When did you last add a test? If you only add tests when GAIA tells you to, consider integrating TDD into your development practice.
Test jobs under the layered CI model
Under the gaia- prefix contract, the canonical test jobs (bats-tests,
skills-bats-tests, the per-cluster e2e suites) live in
.github/workflows/gaia-*.yml as generated jobs -- rewritten on every
/gaia-config-ci --regenerate. Adding a project-specific test job (coverage
upload, custom suite, contract test) goes into gaia-ci.user-jobs.yml via the
stitching engine:
# .github/workflows/gaia-ci.user-jobs.yml
jobs:
coverage-upload:
runs-on: ubuntu-latest
needs: [bats-tests]
steps:
- uses: actions/checkout@v4
- uses: codecov/codecov-action@v4
Per-job setup steps (e.g., language toolchain install) shared across managed test runs go
into gaia-ci.user-steps.yml:
steps_before_gaia:
- uses: actions/setup-node@v4
with: { node-version: '22' }
steps_after_gaia:
- run: echo "all managed test steps complete"
Protected test jobs (cannot disable)
Five test-adjacent jobs are protected from the
ci_cd.template_overrides.disable: list:
commitlint, boundary-guard,
no-claude-attribution, secrets-scan,
credential-audit. The schema rejects any attempt to disable them,
AND the regen-time helper rejects hyphen+case-canonicalized bypass attempts
(commit-lint, Commit-Lint) as defense in depth.
Migration from a legacy test surface
If your .github/workflows/ currently has unprefixed test-running files (e.g.,
ci.yml, test.yml), the first
/gaia-config-ci --regenerate after upgrade fires the auto-rename flow:
per-file, you choose to rename to gaia-{base}.yml + scaffold overlays, rename
to user-{base}.yml, or skip and defer. Backup-first
(.gaia-backup/ci-regen-{ts}/ with a sha256 manifest) so a misclassification
is recoverable.
What to read next
- Configuring CI Pipelines -- how test execution fits into the broader CI configuration.
/gaia-test-strategy-- full command reference for test plan design and framework scaffolding./gaia-bridge-enable-- enabling the Test Execution Bridge./gaia-config-test-- editing test configuration.- test-environment.yaml Reference -- the bridge manifest file schema.
/gaia-test-gap-analysis-- finding gaps in your test coverage.