Kontinuum Node — Testing Strategy

Стратегия тестирования для distributed P2P-системы с E2E-crypto, cert lifecycle, replication, anti-entropy.

Audience: node developers (начинающие имплементацию), QA, CI/CD setup.

Связанные документы:

architecture.md — overview / glossary / tier model
protocols.md — wire-level mechanics, которые тестируем
operations.md — §16.1 DR / chaos planning (production-level)
implementation-notes.md — known edge-cases, на которых сосредоточить test coverage

Обзор

Стандартная Rust-test пирамида (unit → integration) недостаточна для нашего use case: distributed system, eventually-consistent CRDT, multi-tier trust, security-critical crypto. Расширяем до 9 слоёв, каждый с конкретным tooling.

Слой	Что покрывает	Tooling	Уровень изоляции
1. Unit	Pure functions: crypto, CBOR roundtrip, Merkle, GC policy, quota math, bucket naming	`cargo test` (`#[test]`)	Single module
2. Property-based	Invariants под random inputs: DHT routing, mailbox GC, replication math, CBOR identity	`proptest`	Single function/module
3. Integration (in-proc)	Multi-actor flows: Handshake, DhtPut→DhtGet, replica push/pull, mailbox deposit/retrieve	`cargo test --test ...` + tokio	N nodes в одном processe
4. Multi-node simulation	Behavior сети при N=10..1000: convergence, replication, churn, partitions	`madsim` или `turmoil` (deterministic)	Single process, virtual time
5. Chaos / fault injection	Latency, drops, partitions, clock skew, node kills	`madsim` или `tc`/`iptables` в Docker	Docker compose / madsim
6. Fuzzing	Security-critical parsers: CBOR, cert validation, signature verify, auth-shim	`cargo-fuzz` (libfuzzer-sys)	Single binary
7. End-to-end	Реальные binaries: kontinuum-node + kontinuum-app, real network, real S3	Docker compose + scripted scenarios	Cluster
8. Performance benchmarks	Hot paths: DHT lookup latency, mailbox write rate, re-encryption throughput	`criterion`	Single process
9. Game days / chaos	Production-scale incident simulation (см. `operations.md` §16.1)	Manual scripts + monitoring	Production / staging

Покрытие по компонентам

DHT (Kademlia, anti-entropy)

Property tests для routing-table invariants: после N random insertions, k-closest всегда корректен.
Simulation (madsim/turmoil) для convergence: N nodes стартуют, через X секунд DHT consistent.
Anti-entropy correctness — атомарные transaction (Revoke + KeyRotation) должны eventually converge даже под message reorder.
Eclipse-resistance — simulate malicious node cluster, verify disjoint-path lookups работают.

Особенно важные scenarios:

Churn at scale (50% node turnover per hour) — DHT остаётся consistent.
Partition heal — split brain не происходит благодаря signed LWW.
Transaction quarantine — partial-transaction op не применяется до получения всей группы (см. implementation-notes.md §20.1, §20.3).

Storage layer (rustfs + auth-shim)

Integration против настоящего rustfs subprocess.
Mock S3 для quota tests — LocalStack или MinIO.
Auth-shim fuzz — random signature inputs, не должно crash'ить.
Quota enforcement under concurrent writes (race conditions).
Redirect mode (operations.md §13.6) — лукапы blob'а после cert lapse идут на external bucket, peer следует 307.

Mailbox / GC

Property tests для GC policy: для random sequences (deposit, cursor-update, expire) — entries удаляются только когда live_cursors all passed + ttl expired.
Atomic Revoke + KeyRotation — receiving node либо обе ops применяет, либо ни одной (under message loss).
Free-tier scenario — sender → offline receiver → 30-day TTL → drop.
Adaptive quorum для Direct Space (2 members) — kick работает даже когда friend offline (implementation-notes.md §20.2).

Cert lifecycle

Integration — cert valid → expire → CRL propagation → node read-only freeze → cold archive → tombstone → hard delete.
Multi-sig (v1.0+) — 3-of-5 signatures aggregate; partial signatures не достаточны.
Revocation race — gossip propagation под high latency (race window для downloads, см. §20.8).
Lapse timeline interaction — re-encryption job продолжается в read-only freeze (§5.2.3, §20.4).

Crypto / security-critical

Fuzzing — wire frame parsers, CBOR, cert verification, signature checks.
Test vectors — fixed input/output pairs для Ed25519, blake3, ChaCha20Poly1305. Предотвращает regression при upgrade crypto libs.
Forward secrecy — post-KeyRotation, old space_key cannot decrypt new ciphertext.
Auth-shim единообразие — admin API владельца ноды не имеет escape hatch для membership check (§20.7).

Inter-node protocol

Wire roundtrip property — encode(decode(X)) == X для all frame types.
Version compat — CBOR schema evolution, optional fields preserved, unknown fields skipped без error.
Inter-node integration — два nodes через localhost, проверка handshake + DhtPut + replication + anti-entropy gossip.

Replication & repair

Property test для placement-algorithm: при N=100 nodes c geo-zones, placement должен respect cross-region constraint для RF≥2.
Repair convergence — kill replica, anti-entropy detects gap, repair pushes new copy в течение X секунд.
Pin enforcement — pinned blob всегда имеет forced replica в owner's primary storage (LocalRelayFree + один OrgPaid/ExternalPRO).

Rendezvous

Property test rate-limit token bucket — per-IP limits enforced.
TTL correctness — token expires ровно через 60 секунд.
Toggle correctness — disabled identity не публикует, не lookup'ится.

Distributed-system-specific подходы

Deterministic simulation — killer feature

madsim или turmoil дают реальный tokio runtime + virtual time + simulated network. Один тестовый run = воспроизводимый сценарий, можно replay при regression.

rust

#[madsim::test]
async fn test_dht_convergence_under_churn() {
    let mut sim = Simulator::new();
    for i in 0..10 {
        sim.spawn_node(format!("node-{i}")).await;
    }
    sim.kill_random_node().await;
    sim.advance_time(Duration::from_secs(60)).await;
    assert!(sim.dht_consistent().await);
}

Покрывает:

Kademlia churn (random join/leave)
Anti-entropy convergence после partition
Cert revocation propagation через gossip
Partition recovery (heal scenarios)
Race window для membership ops (§20.8)

Choice between madsim и turmoil:

madsim — Madhouse-style simulator, hooks deep в tokio. Лучшая control над time/IO.
turmoil — Tokio Loom-style, простой API, меньше capabilities.

Рекомендация: madsim для primary simulation; turmoil — для quick smoke tests.

Jepsen-style тесты (long-term)

Aphyr-style: random workload + nemesis (network partitions, clock skew, process pauses). Для production-readiness — после v1.0.

Tooling: madsim достаточно для v1.0; full Jepsen — overkill пока. После v2.0 milestone — рассмотреть.

Cross-project integration

Mock node для kontinuum-app tests

kontinuum-node exports test fixtures как separate crate kontinuum-node-mock. App tests запускают mock node в-процесс, проверяют pairing, recovery, share flows.

Mock app для node tests

Reverse: kontinuum-app-mock представляет behaviour клиента (presign requests, mailbox cursor updates) для node integration tests.

Shared test fixtures

В kontinuum-core уже есть proptest + tempfile. Расширить до kontinuum-core/tests/fixtures.rs с reusable scenarios:

sample_identity() — Ed25519 keypair + identity_id
sample_cert(tier, validity) — Tier 0 signed cert
sample_space(members) — Personal/Shared Space с N invited identities
sample_blob(size) — encrypted blob с manifest

CI/CD

Per-PR (быстро, < 10 минут)

Команда	Что проверяет
`cargo fmt --check`	Formatting
`cargo clippy -- -D warnings`	Lints (уже в `project.json`)
`cargo test`	Unit + property + small integration
`cargo deny check`	License / advisory
`cargo audit`	CVE scan
`cargo test --test integration`	Integration tests (2-3 in-process nodes)

Nightly

Команда	Что проверяет
`cargo fuzz run ...`	Critical parsers, ~2 hours per target
`cargo test --test simulation`	Madsim suite — large N, long virtual time
`cargo bench`	Criterion vs baseline (regression detection)

Pre-release

E2E Docker compose с реальной сетью
Performance regression check vs prior release
Manual game day (см. operations.md §16.1)
Security review checklist

Конкретные dependencies

В workspace Cargo.toml:

toml

[dev-dependencies]
tokio-test = "0.4"
proptest = "1"          # уже в kontinuum-core
madsim = "0.2"          # deterministic sim
turmoil = "0.6"         # альтернатива madsim
criterion = "0.5"       # benchmarks
tempfile = "3"          # уже есть
mockall = "0.13"        # mock objects для traits
wiremock = "0.6"        # HTTP mock (для S3 / Tier 0 admin)
serial_test = "3"       # sequential tests для shared state

Отдельные test crates

kontinuum-node/
├── server/
│   ├── src/
│   └── tests/                         # integration tests
│       ├── two_node_handshake.rs
│       ├── dht_basic_flow.rs
│       └── mailbox_lifecycle.rs
├── fuzz/                              # cargo-fuzz targets
│   ├── fuzz_targets/
│   │   ├── wire_frame.rs
│   │   ├── cert_verify.rs
│   │   └── cbor_roundtrip.rs
│   └── Cargo.toml
└── benches/                           # criterion
    ├── dht_lookup.rs
    ├── mailbox_throughput.rs
    └── reencryption.rs

Priorities

Priority	Слой	Когда внедрять
P0	Unit + property	Постоянно во время разработки, с первого commit'а
P0	Integration (2-3 nodes)	После первого MVP компонента
P1	Multi-node simulation	После DHT + replication имплементации
P1	Fuzzing	После wire-protocol кода
P2	E2E Docker compose	Перед v1.0 launch
P2	Criterion benchmarks	Continuous после v1.0
P3	Chaos game days	Quarterly после v1.0 (см. `operations.md` §16.1)
P3	Jepsen-style	Long-term, post-v2.0

Покрытие edge-cases из implementation-notes.md

Каждый известный edge-case из implementation-notes.md должен иметь соответствующий test:

Edge-case	Test type	Файл
§20.1 `pending_propagation` field	Unit (local-only state)	`tests/membership_state.rs`
§20.2 DhtPut quarantine semantics	Integration	`tests/transaction_quarantine.rs`
§20.3 Anti-entropy + quarantined ops	Simulation	`tests/sim/anti_entropy_pending.rs`
§20.4 Re-encryption vs hard delete	Integration	`tests/cert_lifecycle.rs`
§20.5 Pins при downgrade	Integration	`tests/pin_lifecycle.rs`
§20.6 Ungraceful shutdown	Simulation + chaos	`tests/sim/ungraceful_shutdown.rs`
§20.7 Quota enforcement source-of-truth	Integration	`tests/quota_authority.rs`
§20.8 Propagation race window	Simulation	`tests/sim/revoke_race.rs`
§20.9 Direct Space SelfLeave owner	Unit + Integration	`tests/direct_space_ownership.rs`
§20.10 Mailbox cursor housekeeping classification	Integration	`tests/mailbox_lapse.rs`
§20.11 Re-encryption read speed cold archive	Performance benchmark	`benches/reencryption_cold_archive.rs`
§20.12 Bucket nonce migration	Integration	`tests/recovery_with_external_bucket.rs`

Метрики качества

Метрика	Target для v1.0
Line coverage (unit + integration)	≥ 80%
Branch coverage critical paths	≥ 90% (crypto, cert, GC)
Fuzz corpus minimum runtime	24h без crashes per target
Simulation: convergence time под churn	< 2 min для N=100 nodes
Benchmark regression threshold	< 10% slowdown vs baseline

Open questions для testing

Multi-tier integration tests — нужны ли real Tier 0 anchor instances для интеграционных тестов, или mock-cert issuer достаточен? Рекомендация: mock для CI, real для pre-release.
Chaos engineering в production — после v1.0 включаем continuous chaos (Netflix-style daily injection) или quarterly only? Решение перед launch.
Cross-version compatibility matrix — какой backward compatibility horizon (N-1 версия, N-2)? Связано с CBOR schema evolution.

Kontinuum Node — Testing Strategy ​

Обзор ​

Покрытие по компонентам ​

DHT (Kademlia, anti-entropy) ​

Storage layer (rustfs + auth-shim) ​

Mailbox / GC ​

Cert lifecycle ​

Crypto / security-critical ​

Inter-node protocol ​

Replication & repair ​

Rendezvous ​

Distributed-system-specific подходы ​

Deterministic simulation — killer feature ​

Jepsen-style тесты (long-term) ​

Cross-project integration ​

Mock node для kontinuum-app tests ​

Mock app для node tests ​

Shared test fixtures ​

CI/CD ​

Per-PR (быстро, < 10 минут) ​

Nightly ​

Pre-release ​

Конкретные dependencies ​

Отдельные test crates ​

Priorities ​

Покрытие edge-cases из implementation-notes.md ​

Метрики качества ​

Open questions для testing ​