Kontinuum Node — Testing Strategy
Стратегия тестирования для distributed P2P-системы с E2E-crypto, cert lifecycle, replication, anti-entropy.
Audience: node developers (начинающие имплементацию), QA, CI/CD setup.
Связанные документы:
architecture.md— overview / glossary / tier modelprotocols.md— wire-level mechanics, которые тестируемoperations.md— §16.1 DR / chaos planning (production-level)implementation-notes.md— known edge-cases, на которых сосредоточить test coverage
Обзор
Стандартная Rust-test пирамида (unit → integration) недостаточна для нашего use case: distributed system, eventually-consistent CRDT, multi-tier trust, security-critical crypto. Расширяем до 9 слоёв, каждый с конкретным tooling.
| Слой | Что покрывает | Tooling | Уровень изоляции |
|---|---|---|---|
| 1. Unit | Pure functions: crypto, CBOR roundtrip, Merkle, GC policy, quota math, bucket naming | cargo test (#[test]) | Single module |
| 2. Property-based | Invariants под random inputs: DHT routing, mailbox GC, replication math, CBOR identity | proptest | Single function/module |
| 3. Integration (in-proc) | Multi-actor flows: Handshake, DhtPut→DhtGet, replica push/pull, mailbox deposit/retrieve | cargo test --test ... + tokio | N nodes в одном processe |
| 4. Multi-node simulation | Behavior сети при N=10..1000: convergence, replication, churn, partitions | madsim или turmoil (deterministic) | Single process, virtual time |
| 5. Chaos / fault injection | Latency, drops, partitions, clock skew, node kills | madsim или tc/iptables в Docker | Docker compose / madsim |
| 6. Fuzzing | Security-critical parsers: CBOR, cert validation, signature verify, auth-shim | cargo-fuzz (libfuzzer-sys) | Single binary |
| 7. End-to-end | Реальные binaries: kontinuum-node + kontinuum-app, real network, real S3 | Docker compose + scripted scenarios | Cluster |
| 8. Performance benchmarks | Hot paths: DHT lookup latency, mailbox write rate, re-encryption throughput | criterion | Single process |
| 9. Game days / chaos | Production-scale incident simulation (см. operations.md §16.1) | Manual scripts + monitoring | Production / staging |
Покрытие по компонентам
DHT (Kademlia, anti-entropy)
- Property tests для routing-table invariants: после N random insertions, k-closest всегда корректен.
- Simulation (madsim/turmoil) для convergence: N nodes стартуют, через X секунд DHT consistent.
- Anti-entropy correctness — атомарные transaction (Revoke + KeyRotation) должны eventually converge даже под message reorder.
- Eclipse-resistance — simulate malicious node cluster, verify disjoint-path lookups работают.
Особенно важные scenarios:
- Churn at scale (50% node turnover per hour) — DHT остаётся consistent.
- Partition heal — split brain не происходит благодаря signed LWW.
- Transaction quarantine — partial-transaction op не применяется до получения всей группы (см.
implementation-notes.md§20.1, §20.3).
Storage layer (rustfs + auth-shim)
- Integration против настоящего rustfs subprocess.
- Mock S3 для quota tests —
LocalStackилиMinIO. - Auth-shim fuzz — random signature inputs, не должно crash'ить.
- Quota enforcement under concurrent writes (race conditions).
- Redirect mode (
operations.md§13.6) — лукапы blob'а после cert lapse идут на external bucket, peer следует 307.
Mailbox / GC
- Property tests для GC policy: для random sequences (deposit, cursor-update, expire) — entries удаляются только когда live_cursors all passed + ttl expired.
- Atomic Revoke + KeyRotation — receiving node либо обе ops применяет, либо ни одной (under message loss).
- Free-tier scenario — sender → offline receiver → 30-day TTL → drop.
- Adaptive quorum для Direct Space (2 members) — kick работает даже когда friend offline (
implementation-notes.md§20.2).
Cert lifecycle
- Integration — cert valid → expire → CRL propagation → node read-only freeze → cold archive → tombstone → hard delete.
- Multi-sig (v1.0+) — 3-of-5 signatures aggregate; partial signatures не достаточны.
- Revocation race — gossip propagation под high latency (race window для downloads, см. §20.8).
- Lapse timeline interaction — re-encryption job продолжается в read-only freeze (§5.2.3, §20.4).
Crypto / security-critical
- Fuzzing — wire frame parsers, CBOR, cert verification, signature checks.
- Test vectors — fixed input/output pairs для Ed25519, blake3, ChaCha20Poly1305. Предотвращает regression при upgrade crypto libs.
- Forward secrecy — post-KeyRotation, old
space_keycannot decrypt new ciphertext. - Auth-shim единообразие — admin API владельца ноды не имеет escape hatch для membership check (§20.7).
Inter-node protocol
- Wire roundtrip property —
encode(decode(X)) == Xдля all frame types. - Version compat — CBOR schema evolution, optional fields preserved, unknown fields skipped без error.
- Inter-node integration — два nodes через localhost, проверка handshake + DhtPut + replication + anti-entropy gossip.
Replication & repair
- Property test для placement-algorithm: при N=100 nodes c geo-zones, placement должен respect cross-region constraint для RF≥2.
- Repair convergence — kill replica, anti-entropy detects gap, repair pushes new copy в течение X секунд.
- Pin enforcement — pinned blob всегда имеет forced replica в owner's primary storage (LocalRelayFree + один OrgPaid/ExternalPRO).
Rendezvous
- Property test rate-limit token bucket — per-IP limits enforced.
- TTL correctness — token expires ровно через 60 секунд.
- Toggle correctness — disabled identity не публикует, не lookup'ится.
Distributed-system-specific подходы
Deterministic simulation — killer feature
madsim или turmoil дают реальный tokio runtime + virtual time + simulated network. Один тестовый run = воспроизводимый сценарий, можно replay при regression.
#[madsim::test]
async fn test_dht_convergence_under_churn() {
let mut sim = Simulator::new();
for i in 0..10 {
sim.spawn_node(format!("node-{i}")).await;
}
sim.kill_random_node().await;
sim.advance_time(Duration::from_secs(60)).await;
assert!(sim.dht_consistent().await);
}Покрывает:
- Kademlia churn (random join/leave)
- Anti-entropy convergence после partition
- Cert revocation propagation через gossip
- Partition recovery (heal scenarios)
- Race window для membership ops (§20.8)
Choice between madsim и turmoil:
madsim— Madhouse-style simulator, hooks deep в tokio. Лучшая control над time/IO.turmoil— Tokio Loom-style, простой API, меньше capabilities.
Рекомендация: madsim для primary simulation; turmoil — для quick smoke tests.
Jepsen-style тесты (long-term)
Aphyr-style: random workload + nemesis (network partitions, clock skew, process pauses). Для production-readiness — после v1.0.
Tooling: madsim достаточно для v1.0; full Jepsen — overkill пока. После v2.0 milestone — рассмотреть.
Cross-project integration
Mock node для kontinuum-app tests
kontinuum-node exports test fixtures как separate crate kontinuum-node-mock. App tests запускают mock node в-процесс, проверяют pairing, recovery, share flows.
Mock app для node tests
Reverse: kontinuum-app-mock представляет behaviour клиента (presign requests, mailbox cursor updates) для node integration tests.
Shared test fixtures
В kontinuum-core уже есть proptest + tempfile. Расширить до kontinuum-core/tests/fixtures.rs с reusable scenarios:
sample_identity()— Ed25519 keypair + identity_idsample_cert(tier, validity)— Tier 0 signed certsample_space(members)— Personal/Shared Space с N invited identitiessample_blob(size)— encrypted blob с manifest
CI/CD
Per-PR (быстро, < 10 минут)
| Команда | Что проверяет |
|---|---|
cargo fmt --check | Formatting |
cargo clippy -- -D warnings | Lints (уже в project.json) |
cargo test | Unit + property + small integration |
cargo deny check | License / advisory |
cargo audit | CVE scan |
cargo test --test integration | Integration tests (2-3 in-process nodes) |
Nightly
| Команда | Что проверяет |
|---|---|
cargo fuzz run ... | Critical parsers, ~2 hours per target |
cargo test --test simulation | Madsim suite — large N, long virtual time |
cargo bench | Criterion vs baseline (regression detection) |
Pre-release
- E2E Docker compose с реальной сетью
- Performance regression check vs prior release
- Manual game day (см.
operations.md§16.1) - Security review checklist
Конкретные dependencies
В workspace Cargo.toml:
[dev-dependencies]
tokio-test = "0.4"
proptest = "1" # уже в kontinuum-core
madsim = "0.2" # deterministic sim
turmoil = "0.6" # альтернатива madsim
criterion = "0.5" # benchmarks
tempfile = "3" # уже есть
mockall = "0.13" # mock objects для traits
wiremock = "0.6" # HTTP mock (для S3 / Tier 0 admin)
serial_test = "3" # sequential tests для shared stateОтдельные test crates
kontinuum-node/
├── server/
│ ├── src/
│ └── tests/ # integration tests
│ ├── two_node_handshake.rs
│ ├── dht_basic_flow.rs
│ └── mailbox_lifecycle.rs
├── fuzz/ # cargo-fuzz targets
│ ├── fuzz_targets/
│ │ ├── wire_frame.rs
│ │ ├── cert_verify.rs
│ │ └── cbor_roundtrip.rs
│ └── Cargo.toml
└── benches/ # criterion
├── dht_lookup.rs
├── mailbox_throughput.rs
└── reencryption.rsPriorities
| Priority | Слой | Когда внедрять |
|---|---|---|
| P0 | Unit + property | Постоянно во время разработки, с первого commit'а |
| P0 | Integration (2-3 nodes) | После первого MVP компонента |
| P1 | Multi-node simulation | После DHT + replication имплементации |
| P1 | Fuzzing | После wire-protocol кода |
| P2 | E2E Docker compose | Перед v1.0 launch |
| P2 | Criterion benchmarks | Continuous после v1.0 |
| P3 | Chaos game days | Quarterly после v1.0 (см. operations.md §16.1) |
| P3 | Jepsen-style | Long-term, post-v2.0 |
Покрытие edge-cases из implementation-notes.md
Каждый известный edge-case из implementation-notes.md должен иметь соответствующий test:
| Edge-case | Test type | Файл |
|---|---|---|
§20.1 pending_propagation field | Unit (local-only state) | tests/membership_state.rs |
| §20.2 DhtPut quarantine semantics | Integration | tests/transaction_quarantine.rs |
| §20.3 Anti-entropy + quarantined ops | Simulation | tests/sim/anti_entropy_pending.rs |
| §20.4 Re-encryption vs hard delete | Integration | tests/cert_lifecycle.rs |
| §20.5 Pins при downgrade | Integration | tests/pin_lifecycle.rs |
| §20.6 Ungraceful shutdown | Simulation + chaos | tests/sim/ungraceful_shutdown.rs |
| §20.7 Quota enforcement source-of-truth | Integration | tests/quota_authority.rs |
| §20.8 Propagation race window | Simulation | tests/sim/revoke_race.rs |
| §20.9 Direct Space SelfLeave owner | Unit + Integration | tests/direct_space_ownership.rs |
| §20.10 Mailbox cursor housekeeping classification | Integration | tests/mailbox_lapse.rs |
| §20.11 Re-encryption read speed cold archive | Performance benchmark | benches/reencryption_cold_archive.rs |
| §20.12 Bucket nonce migration | Integration | tests/recovery_with_external_bucket.rs |
Метрики качества
| Метрика | Target для v1.0 |
|---|---|
| Line coverage (unit + integration) | ≥ 80% |
| Branch coverage critical paths | ≥ 90% (crypto, cert, GC) |
| Fuzz corpus minimum runtime | 24h без crashes per target |
| Simulation: convergence time под churn | < 2 min для N=100 nodes |
| Benchmark regression threshold | < 10% slowdown vs baseline |
Open questions для testing
- Multi-tier integration tests — нужны ли real Tier 0 anchor instances для интеграционных тестов, или mock-cert issuer достаточен? Рекомендация: mock для CI, real для pre-release.
- Chaos engineering в production — после v1.0 включаем continuous chaos (Netflix-style daily injection) или quarterly only? Решение перед launch.
- Cross-version compatibility matrix — какой backward compatibility horizon (N-1 версия, N-2)? Связано с CBOR schema evolution.