devopstutorialdatabases

CI/CD for Quantum Experiments: Integrating Database Migrations with ClickHouse

UUnknown

2026-02-04

10 min read

Practical tutorial: add ClickHouse schema migrations and reproducibility tests to GitHub Actions for robust quantum experiment pipelines.

Hook: Why CI/CD for quantum experiments needs database migrations

Reproducible quantum experiments depend on more than circuits and SDK versions — they depend on dependable, versioned storage for telemetry, calibration snapshots, and large measurement artifacts. Yet teams still struggle with fragmented workflows: dataset transfers that break, schema drift across collaborators, and brittle validation that catches errors too late.

This tutorial shows a practical, production-minded way to include ClickHouse schema migrations and data validation tests inside a GitHub Actions CI pipeline so that every pull request, branch, and release runs the same migrations and verification logic. By the end you’ll have a repeatable pattern for shipping reproducible quantum experiment pipelines in 2026.

Why ClickHouse in 2026 for quantum experiment telemetry?

ClickHouse has become a top choice for analytical workloads that need fast aggregation over high-volume time series and experiment traces. After large investments in late 2025 and broader cloud integrations in 2026, ClickHouse is widely adopted for experiment telemetry, calibration histories, and large result logs where efficient columnar storage and compression matter.

For quantum teams, ClickHouse fits three needs: efficient storage for millions of measurement rows, fast analytical queries for validation and drift detection, and integration options for S3/object storage used to archive raw shot outputs.

What you’ll build in this tutorial

A GitHub Actions CI workflow that spins up a ClickHouse service and runs migrations.
Idempotent SQL migrations applied in order and tracked in a schema_migrations table.
Automated validation tests (pytest + clickhouse-driver) that assert reproducibility invariants.
Practical tips for rollback, backups, and handling large archived artifacts with S3.

Design principles (quick)

Make migrations idempotent: prevent partial application from breaking CI.
Keep validation in CI: tests must run for PRs and releases to prevent schema and data regressions.
Keep raw artifacts external: store heavy shot outputs in S3 and reference them from ClickHouse.
Record provenance: store seeds, SDK versions, hardware IDs, and dataset hashes for reproducibility.

Repository layout (recommended)

Organize your repo so CI can find migrations and tests predictably.

.
  ├─ migrations/
  │  ├─ 001_create_schema.sql
  │  └─ 002_add_experiment_metadata.sql
  ├─ tests/
  │  ├─ test_schema.py
  │  └─ test_data_validation.py
  ├─ tools/
  │  └─ apply-migrations.sh
  └─ .github/workflows/ci.yml

Step 1 — Idempotent SQL migrations

Migrations are plain SQL files applied in lexical order. Use IF NOT EXISTS where possible, and maintain a schema_migrations table to record applied files and their checksum so CI can detect drift.

Sample migration: 001_create_schema.sql

-- migrations/001_create_schema.sql
  CREATE DATABASE IF NOT EXISTS quantum_experiments;

  CREATE TABLE IF NOT EXISTS quantum_experiments.experiments (
    experiment_id String,
    run_id String,
    sdk_version String,
    hardware_id String,
    seed UInt64,
    shot_count UInt32,
    start_time DateTime64(6),
    end_time DateTime64(6),
    result_checksum String,
    metadata JSON
  ) ENGINE = MergeTree()
  PARTITION BY toYYYYMM(start_time)
  ORDER BY (experiment_id, run_id);

  CREATE TABLE IF NOT EXISTS quantum_experiments.schema_migrations (
    id UInt64,
    filename String,
    sha256 String,
    applied_at DateTime64(6)
  ) ENGINE = TinyLog();

Notes

Use MergeTree with sensible partitioning (monthly for many teams).
schema_migrations here is TinyLog for simplicity in CI; in production you may prefer MergeTree and stronger guarantees.
Store a SHA256 of each SQL file so you can assert the migration content matches what CI expects. Combine this with lightweight instrumentation so you catch unexpected query or schema regressions early.

Step 2 — apply-migrations.sh (CI-friendly runner)

ClickHouse exposes a simple HTTP interface that is perfect for CI. The following script posts each SQL file in order and records the result in the migrations table.

#!/usr/bin/env bash
  set -euo pipefail

  CH_HOST=${CLICKHOUSE_HOST:-localhost}
  CH_PORT=${CLICKHOUSE_HTTP_PORT:-8123}
  DB=quantum_experiments

  for f in $(ls -1 migrations/*.sql | sort); do
    sha=$(sha256sum "$f" | awk '{print $1}')
    filename=$(basename "$f")

    # Check if already applied
    applied=$(curl -sS "http://$CH_HOST:$CH_PORT/?query=SELECT+count()%20FROM+$DB.schema_migrations+WHERE+filename='
    $filename'" | tr -d '\n') || applied=0

    if [ "$applied" -ge 1 ]; then
      echo "Skipping $filename (already applied)"
      continue
    fi

    echo "Applying $filename"
    # send SQL to ClickHouse HTTP API
    curl -sS --data-binary "@$f" "http://$CH_HOST:$CH_PORT/" || { echo "Migration failed: $f"; exit 1; }

    # record migration
    now=$(date -u +"%Y-%m-%d %H:%M:%S.%6N")
    insert="INSERT INTO $DB.schema_migrations (id, filename, sha256, applied_at) VALUES (toUInt64($(date +%s)), '$filename', '$sha', parseDateTime64BestEffort('$now'))"
    curl -sS --data-binary "$insert" "http://$CH_HOST:$CH_PORT/"
  done

Step 3 — GitHub Actions workflow (CI)

The CI job below runs on PRs and pushes. It starts a ClickHouse container, waits until the HTTP endpoint is ready, runs migrations with the script above, and then runs pytest-based tests to validate schema and data invariants.

# .github/workflows/ci.yml
  name: CI

  on:
    push:
      branches: [ main ]
    pull_request:

  jobs:
    ci:
      runs-on: ubuntu-latest
      services:
        clickhouse:
          image: clickhouse/clickhouse-server:latest
          ports:
            - 8123:8123
            - 9000:9000
          options: >-
            --health-cmd "curl -fsS http://localhost:8123/ || exit 1"
            --health-interval 5s
            --health-timeout 2s
            --health-retries 30

      steps:
        - name: Checkout
          uses: actions/checkout@v4

        - name: Wait for ClickHouse
          run: |
            for i in {1..30}; do
              if curl -fsS http://localhost:8123/ >/dev/null 2>&1; then
                echo "ClickHouse ready"; break
              fi
              echo "Waiting for ClickHouse... ($i)"; sleep 2
            done

        - name: Install deps
          run: |
            python -m pip install --upgrade pip
            pip install clickhouse-driver pytest

        - name: Run migrations
          env:
            CLICKHOUSE_HOST: localhost
            CLICKHOUSE_HTTP_PORT: 8123
          run: |
            chmod +x tools/apply-migrations.sh
            ./tools/apply-migrations.sh

        - name: Seed test data (CI small sample)
          run: |
            curl -sS --data-binary "INSERT INTO quantum_experiments.experiments FORMAT JSONEachRow {\"experiment_id\":\"exp-ci-001\",\"run_id\":\"r1\",\"sdk_version\":\"qdk-1.2\",\"hardware_id\":\"sim-1\",\"seed\":12345,\"shot_count\":1024,\"start_time\":'2026-01-01 00:00:00',\"end_time\":'2026-01-01 00:01:00',\"result_checksum\":\"abc123\",\"metadata\":{}}" http://localhost:8123/?query=INSERT+INTO+quantum_experiments.experiments+FORMAT+JSONEachRow

        - name: Run tests
          env:
            CLICKHOUSE_HOST: localhost
          run: |
            pytest -q

Step 4 — Data validation tests (pytest)

Tests assert both schema and reproducibility invariants. Below are two examples: a schema test and a data validation test that checks dataset checksums and deterministic seed behavior.

tests/test_schema.py

from clickhouse_driver import Client
  import os

  def test_tables_exist():
      host = os.getenv('CLICKHOUSE_HOST', 'localhost')
      client = Client(host=host)

      result = client.execute("SHOW TABLES FROM quantum_experiments")
      tables = {r[0] for r in result}
      assert 'experiments' in tables
      assert 'schema_migrations' in tables

tests/test_data_validation.py

from clickhouse_driver import Client
  import hashlib
  import os

  def test_experiment_checksum_matches():
      host = os.getenv('CLICKHOUSE_HOST', 'localhost')
      client = Client(host=host)

      rows = client.execute("SELECT experiment_id, result_checksum FROM quantum_experiments.experiments WHERE experiment_id='exp-ci-001'")
      assert len(rows) == 1
      experiment_id, checksum = rows[0]

      # Example: compute expected checksum from reproducible metadata (in practice compute on raw S3 artifact)
      expected = hashlib.sha256(b"sample-raw-output-123").hexdigest()[:6]
      assert checksum.startswith(expected)

  def test_seed_reproducibility():
      client = Client(host=host)
      rows = client.execute("SELECT seed FROM quantum_experiments.experiments WHERE experiment_id='exp-ci-001'")
      seed = rows[0][0]
      # deterministic-check: re-run a simulator with seed should reproduce count edges (this example asserts seed exists)
      assert isinstance(seed, int) and seed > 0

Actionable validation patterns for quantum experiments

Checksums: compute and store SHA256 of raw shot outputs (S3 objects). CI should re-download small CI fixtures and assert checksums match.
Schema drifting alerts: keep a hash of migration files in schema_migrations and fail CI if applied migrations differ from repository files.
Golden datasets: store small golden datasets in the repo or a protected bucket; CI loads them to assert end-to-end reproducibility for a small experiment run.
Stat-based checks: assert that mean, variance, or fidelities of results remain within expected bounds to detect noise model regressions. Combine these checks with lightweight instrumentation so you catch query hotspots early.

Advanced strategies (2026 trends)

In 2026, teams increasingly combine OLAP stores like ClickHouse with cloud object storage and model registries for reproducibility. Here are advanced patterns gaining traction:

External artifacts + references: Keep heavy shot outputs in S3 and store references (S3 URL, object SHA256) in ClickHouse. ClickHouse's S3 integrations and table functions make it easy to join metadata and analytics.
Continuous schema gating: Gate schema changes behind feature flags and require a database migration PR to include an automated rollback plan (dump + restore) for production migrations.
Schema migration runners on Cloud Run: Deploy a lightweight migration runner to Cloud Run (or similar) to apply migrations during scheduled maintenance windows with strong identity and logging.
Provenance-first tables: Always include SDK version, commit hash, and hardware calibration ID in every experiments row for traceability.

Rollback, backups and large-data handling

Even with robust CI, production migrations require safe rollback strategies and backups. Recommended practices:

Use clickhouse-backup: an open-source tool that snapshots table data to S3. Run before risky migrations.
Backfill with care: For column additions or denormalizations, write backfill jobs that operate in small windows and validate via CI before merging to main.
Non-blocking migrations: Add columns with defaults rather than altering existing data in-place. Create new tables and swap via renames for atomic transitions.

Security & governance considerations

Secrets in CI: Use GitHub Secrets for S3 credentials and production ClickHouse endpoints. For PRs from forks, you’ll want to disable secrets or use ephemeral test clusters.
Access control: Limit direct production migration rights—use a merge-and-run workflow where an ops service account runs the final migration job.
Audit logs: Persist migration logs and test results to object storage for traceability. Combine migration logs with your existing tagging and metadata strategy.

Real-world example: reproducible calibration history

A typical use case: collect nightly calibration snapshots from multiple devices, run migrations to add a new calibration column, and validate that historical calibrations can still be queried. With the CI pattern above you can:

Apply the migration to add new calibration metric column.
Run a validation test that loads the last 30 days and asserts no NULLs beyond a reasonable rate (e.g., less than 0.5%).
Checkpoint the dataset to S3 and compute checksums. If the validation fails, the migration is rolled back and the snapshot is restored.

Troubleshooting tips

If CI times out waiting for ClickHouse, increase health-retries or add a readiness loop that checks HTTP status and query results.
For flaky tests, isolate environment differences: use exact ClickHouse image tags and pin clickhouse-driver versions in your test environment.
When local dev differs, provide a docker-compose dev profile that mirrors CI and make it easy for developers to run the same apply-migrations.sh locally.

Takeaways & checklist

Include ClickHouse migrations in CI so every PR validates schema and reproducibility invariants.
Use idempotent SQL and a schema_migrations table with file checksums.
Automate data validation (checksums, statistical guards) with pytest and clickhouse-driver in GitHub Actions.
Archive heavy artifacts to S3 and reference them in ClickHouse; test against small golden fixtures in CI.
Adopt backup-before-migrate policies and maintain a rollback plan for production migrations. Consider using tools and patterns described in backup and docs toolkits when designing runbooks.

"In 2026, combining robust CI with analytical engines like ClickHouse is the fastest path to reproducible quantum experiment workflows." — qbitshare engineering

Next steps (practical)

Fork the repo layout above and add a small migration + test fixture.
Create the GitHub Actions CI file and run it on a branch. Fix any idempotency problems revealed by CI.
Integrate an S3-backed backup before running any production migration.

Conclusion & call to action

Automating ClickHouse schema migrations and embedding validation tests in your GitHub Actions pipeline reduces a major source of unreproducibility for quantum experiments: schema drift and unchecked data regressions. The patterns above are intentionally pragmatic — they work for small teams and scale to production when combined with backups and guarded rollouts.

Ready to adopt this in your workflow? Fork the recommended repo structure, add your migrations and golden fixtures, and start running the CI pipeline today. Share your migration patterns and test suites with the qbitshare community to help others build reproducible quantum experiment platforms.

Want a starter repository or an opinionated Cloud Run migration runner template? Visit qbitshare.com/tools (or join our developer Slack) and we’ll help scaffold your migration CI in under an hour.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.