Internal review

Task 2 — Key derivation review

Argon2id presets, HKDF info-string domain separation, ML-KEM-1024 keypair derivation, and master-secret handling.

Task 2 — Key derivation review

Date: 2026-05-11 Reviewer: Cho García Scope: crypto/spec/key-derivation.md, the @shieldfive/crypto KDF/HKDF/HMAC/identity/suite modules (vendored at 1.0.0-alpha.3, repo HEAD a5e99ac), and the web-side key handling (web/utils/keyring.ts, web/utils/keyCrypto.ts, web/utils/folderKeyCache.ts, web/utils/vaultKeyClient.ts, web/utils/metadataClient.ts, web/utils/sharePassword.ts, web/utils/keyHierarchyMoves.ts, web/utils/wrappingKeyStore.ts, web HEAD bfa41a57). Methodology: key-hierarchy reconstruction from spec and code, then spec-vs-implementation audit. Every hkdfSha256 call site in the library was grepped and tabulated (Step 3 below). The web app's parallel — and partly divergent — key tree was reconstructed from the keyring/vaultKeyClient/metadataClient call chains used by the encryption and folder-tree flows. Master-key in-memory lifecycle and session/auth boundaries were treated as out of crypto-layer scope.

Summary

The crypto library's per-file key derivation is sound: every chunk key, nonce prefix, and header MAC key is HKDF-SHA-256 with a distinct, purpose-named info string, file_id used as salt for content-key-keyed derivations, and content material that is fresh per file. The ML-KEM-1024 keypair is deterministically derivable from a 32-byte master secret via HKDF, which is the right tradeoff for backup ergonomics.

The integration story is weaker. The crypto spec describes a master_secret → envelope_key / metadata_key HKDF hierarchy in spec/key-derivation.md that the web app does not implement. The web instead uses a flat scheme: a 32-byte random "root key" (rk) generated at signup, stored on the server wrapped under (a) a PBKDF2-SHA-256 key derived from the user's password and (b) a 32-byte random recovery key. That same rk is then used directly in four distinct contexts (AES-GCM wrapping of folder keys, Argon2id input for filename-metadata keys, SHA-256 prefix-hash for share passwords, HMAC key for searchable name hashes) without HKDF domain separation. The crypto library's Argon2id implementation, which the spec implies for password stretching, is not consumed by the web app.

Highest-impact findings:

  • 2.1 — web app uses PBKDF2-SHA-256 (600 000 iterations) for password-wrapped vault-key derivation; the crypto library ships an Argon2id implementation (libsodium MODERATE: 3 ops, 256 MiB) that the web does not use. PBKDF2 meets OWASP 2023's minimum for that primitive but is materially weaker per dollar of attacker GPU/ASIC budget than Argon2id. Database-only decryption is still not possible, but offline password brute force becomes the cheapest path to root-key recovery.
  • 2.2 — the web app implements a flat envelope scheme rather than the spec's HKDF(master, "...envelope-key") / HKDF(master, "...metadata-key") hierarchy. rk is used directly as an AES-GCM wrapping key for folder keys, as the input to per-file Argon2id metadata-key derivation, as a hashed share-password seed, and as a searchable-hash MAC key. No concrete attack is enabled — the four consumers use different operations whose outputs are mutually computationally independent — but the spec's design intent (HKDF domain separation per purpose) is not honored, and the documented derivation tree does not describe the system that ships.
  • 2.4 — the spec's ml_kem_seed = HKDF(..., L=64) → ML-KEM-1024.KeyGen(seed) is silent on the (d, z) split convention. Test vectors pin one particular interpretation, but the spec text alone is not enough to reproduce the same keypair against a library that exposes KeyGen(d, z) rather than KeyGen(seed). Phase-3 prerequisite.

No Critical findings. Database-alone decryption is not possible, password alone fully recovers the root key (as designed), the recovery key has 256 bits of entropy, and Argon2id parameters in the library are well above OWASP minima. The findings are integration-layer hardening and spec/implementation alignment, not breaks.

Key hierarchy

The diagram below merges (a) the crypto spec's documented hierarchy with (b) what the web app actually does. Differences are flagged with (spec only — unimplemented) or (web only — not in spec).

┌─────────────────────────────────────────────────────────────────────┐
│  Password (user input, low entropy)                                 │
│        │                                                            │
│        │  PBKDF2-SHA-256, 600 000 iters, 16B random salt            │
│        │  (web/utils/vaultKeyClient.ts:79-91)                       │
│        │                                                            │
│        ▼                                                            │
│   uk (256-bit AES key, ephemeral)        ─── wraps ───►  rkWrappedByUk    │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  Recovery key (32 random bytes, generateVaultKey, shown once)       │
│        │                                                            │
│        ▼                                                            │
│   rec (256-bit AES key)                  ─── wraps ───►  rkWrappedByRec   │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  rk = "root key" (32 random bytes, generated client-side at signup) │
│      In spec language: this IS the master_secret.                   │
│      In web language: this is also the envelope_key, also the       │
│      metadata seed, also the share-password seed.                   │
│        │                                                            │
│        ├──(spec only — unimplemented)─►                             │
│        │       envelope_key  = HKDF(rk, "shieldfive/v1/envelope-key")│
│        │       metadata_key  = HKDF(rk, "shieldfive/v1/metadata-key")│
│        │                                                            │
│        ├──(web) AES-GCM wrap, IV random 12B, no AAD ───►            │
│        │       top-level folder_key → folders.fk_wrapped/fk_iv     │
│        │       (web/utils/keyCrypto.ts:52-68)                       │
│        │                                                            │
│        ├──(web) Argon2id(rk, 16B random salt per blob, INTERACTIVE) │
│        │   = 32B AES-GCM key for filename metadata                  │
│        │       (web/utils/metadataClient.ts:259-313)                │
│        │                                                            │
│        ├──(web) SHA-256("shieldfive:auto-share:v1:" || rk_b64)      │
│        │   truncated to 24B ASCII → auto-share password input to    │
│        │   wrapVaultKey (PBKDF2) for csk_pw_wrapped                 │
│        │       (web/utils/sharePassword.ts:18-32)                   │
│        │                                                            │
│        └──(web) HMAC-SHA-256 keyed by SHA-256(rk) →                 │
│               searchable folder/file name hash                      │
│               (web/utils/metadataClient.ts:134-145, 660-669)        │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  folder_key (32 random bytes per folder, generated client-side)     │
│        │                                                            │
│        ├──(web) wrapped by parent folder_key (or rk for top-level) │
│        │       AES-GCM, random IV → folders.fk_wrapped/fk_iv       │
│        │                                                            │
│        ├──(web) AES-GCM wrap → child folder_key, file CSK          │
│        │       (no AAD binding to folder_id)                        │
│        │                                                            │
│        └──(web) input to Argon2id per-file metadata key derivation  │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  CSK = content_key (32 random bytes per file, generateKeyB64)       │
│        │                                                            │
│        ├──(web) wrapped by parent folder_key                        │
│        │       AES-GCM, random IV → files.csk_wrapped/csk_iv        │
│        │                                                            │
│        ├──(web, optional, when shared) wrapped by PBKDF2(share pwd) │
│        │       → files.csk_pw_wrapped + csk_pw_{salt,iv,iterations} │
│        │                                                            │
│        └──(crypto lib) passed as contentKey into the AES-GCM-v1     │
│                        suite. Per-file derivations below.           │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  file_id (16 random bytes per file)                                 │
│  content_key = CSK above                                            │
│                                                                     │
│  header_mac_key = HKDF(content_key, salt=file_id,                  │
│                        info="shieldfive/v1/header-mac", L=32)       │
│                                                                     │
│  chunk_key     = HKDF(content_key, salt=file_id,                   │
│                       info="shieldfive/v1/<suite>/chunk-key", L=32) │
│                                                                     │
│  nonce_prefix  = HKDF(file_id,    salt=absent (32 zeros),           │
│                       info="shieldfive/v1/<suite>/nonce-prefix",    │
│                       L = 4 (AES-GCM) or 16 (XChaCha))              │
│                                                                     │
│  ────────────────────────────────────────────────────────────────   │
│                                                                     │
│  (Phase 3, PQ-hybrid; not yet wired into the web app)               │
│                                                                     │
│  ml_kem_seed  = HKDF(rk, salt=absent (32 zeros),                    │
│                       info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed",│
│                       L=64)                                         │
│  (pk_pq, sk_pq) = ML-KEM-1024.KeyGen(ml_kem_seed)                   │
│       where the noble library splits seed[0..32]=d, seed[32..64]=z  │
│       (Finding 2.4 — convention not in spec)                        │
│                                                                     │
│  K (combined)  = HKDF(classical_share || pq_share, salt=file_id,    │
│                       info="shieldfive/v1/pq-hybrid/combine", L=32) │
│                                                                     │
│  PQ-hybrid chunk_key  = HKDF(K, salt=file_id,                       │
│                              info="shieldfive/v1/xchacha/chunk-key",│
│                              L=32) — info reused with XChaCha       │
│                                                                     │
│  PQ-hybrid nonce_prefix = HKDF(file_id, salt=absent,                │
│                                info="shieldfive/v1/xchacha/nonce-prefix",│
│                                L=16) — info reused with XChaCha     │
└─────────────────────────────────────────────────────────────────────┘

Cross-context reachability check ("if I had this one key, what could I derive?"):

  • CSK for file A: unlocks file A only. Cannot derive CSK for file B (different random 32 bytes), cannot derive folder_key (CSKs are downstream of folder_keys), cannot derive rk.
  • folder_key for folder F: unlocks every CSK in folder F, every child folder_key, and recursively their CSKs. Cannot derive sibling or parent folder_keys (random per folder). Cannot derive rk.
  • rk: unlocks everything downstream. Cannot derive the password (PBKDF2 is one-way) or the recovery key (independent random bytes).
  • Recovery key: unwraps rkWrappedByRec, then everything downstream. Cannot derive the password.
  • Password: unwraps rkWrappedByUk (via PBKDF2), then everything downstream. Cannot derive the recovery key.
  • DB-only attacker (no password, no recovery key): sees only wrapped blobs whose keys are not in the DB. No decryption is possible. ✓
  • Phase-3 sk_pq (when shipped): unwraps PQ-hybrid suite_payloads encapsulated for this user; via HKDF determinism, sk_pq is recoverable from rk + the seed info string, so leaking sk_pq alone is no worse than leaking the derived PQ key — but holding sk_pq does NOT recover rk (HKDF is one-way).

Findings

Finding 2.1 — Web app uses PBKDF2-SHA-256 (600 000) instead of Argon2id for password-wrapped vault key

Severity: Medium Affects: web (signup, login, change-password, share with custom password) Component: implementation + spec drift

Description. The web app's vault-key wrapper, used at signup, login, password change, and per-file share-password wrapping, is web/utils/vaultKeyClient.ts:79-91:

return crypto.subtle.deriveKey(
  {
    name: 'PBKDF2',
    hash: 'SHA-256',
    salt: salt as Uint8Array<ArrayBuffer>,
    iterations,
  },
  material,
  { name: 'AES-GCM', length: 256 },
  ...
)

DEFAULT_ITERATIONS = 600_000 (vaultKeyClient.ts:16), MINIMUM_SECURE_ITERATIONS = 600_000 (vaultKeyClient.ts:22). The in-file comment cites OWASP 2023's 600 000-iteration recommendation for PBKDF2-SHA-256.

In parallel, the crypto library ships a vetted Argon2id implementation backed by libsodium-sumo (crypto/src/kdf/argon2id.ts):

'moderate' → opslimit=crypto_pwhash_OPSLIMIT_MODERATE (3 ops),
             memlimit=crypto_pwhash_MEMLIMIT_MODERATE (256 MiB)
'sensitive' → 4 ops / 1 GiB

crypto/spec/key-derivation.md's "Master secret" section names Argon2id as the typical mechanism ("…typically by stretching a user passphrase through Argon2id…"). The crypto library exports deriveMasterSecret (crypto/src/kdf/argon2id.ts:85-130). The web app does not consume this — libsodium is loaded for metadata encryption (web/utils/metadataClient.ts:1) but not for the vault-key path.

Why this matters at Medium severity, not Low:

  1. The vault key (rk) is the root of the entire client-side hierarchy. rkWrappedByUk and the user's PBKDF2 parameters live in the database (vault_keys row, served by /api/vault-key); a database breach exposes all of these to an offline attacker.
  2. PBKDF2-SHA-256 is not memory-hard. A modern GPU rig amortizes 600 000 SHA-256 iterations across thousands of guesses per second per GPU; consumer ASICs do better. Argon2id at MODERATE (3 ops, 256 MiB) needs ~256 MiB of high-bandwidth memory per guess, which collapses throughput by roughly two orders of magnitude on the same hardware budget.
  3. The user-facing impact of switching is bounded (Argon2id MODERATE is ~100-200 ms on contemporary CPUs, comparable to the current PBKDF2 target). The crypto library's helper handles wrap/unwrap symmetry, so the migration is straightforward: at next-password-change, store kdf: 'argon2id', argon2_ops, argon2_mem instead of iterations.
  4. There IS a documented migration path in the file — kdf and iterations are already persisted per-record (vaultKeyClient.ts:122-124), and unwrapVaultKey rejects non-pbkdf2-sha256 records explicitly (vaultKeyClient.ts:136-138). Extending the union to accept argon2id is a small change.

This is not Critical because the password is still the attacker's bottleneck:

  • DB-only without password/recovery key cannot decrypt (Critical threshold).
  • Password alone DOES recover rk (which is the design, see Step 6).
  • Recovery-key alone DOES recover rk.

Reproduction or evidence. Inspect wrapVaultKey and unwrapVaultKey at web/utils/vaultKeyClient.ts:100-147 and the comment at lines 3-15 documenting "OWASP 2023 recommends 600,000+ iterations for PBKDF2-SHA256". Cross-reference with crypto/src/kdf/argon2id.ts:85-130 showing the unused Argon2id helper, and crypto/spec/key-derivation.md § "Master secret" naming Argon2id as the typical mechanism.

Recommendation. Migrate the vault-key wrapper to Argon2id, lazily. Concrete plan:

  1. Extend the kdf union in vault_keys row to accept argon2id and add three columns (or one JSON column) for argon2_ops, argon2_mem, and argon2_salt — the salt is already 16 random bytes per wrapVaultKey invocation, so it can be reused.
  2. On password change for accounts with kdf = 'pbkdf2-sha256', derive the rk via the legacy path, then re-wrap with the Argon2id path and store the new fields. Old PBKDF2 records remain readable indefinitely.
  3. On signup, default to Argon2id MODERATE (3 ops, 256 MiB) — matches the crypto library's default and the spec's intent.
  4. Add a shouldUpgradeKdf(currentKdf) analogue to the existing shouldUpgradeKeyIterations so the UI can nudge legacy-PBKDF2 users to change their password.

Also apply the same change to the share-password wrap (useSharing.ts:141) which calls the same wrapVaultKey — this protects shared files' csk_pw_wrapped with the same KDF as the vault key itself.

Track as a Phase-2 hardening item; do not block on it. The migration is a database schema change (additive) plus a few code-paths, and benefits from being done before Phase 3 because the ML-KEM keypair is derived from rk and any rk-level brute force compromises post-quantum keys trivially too.


Finding 2.2 — Web app implements a flat envelope scheme rather than the spec's HKDF hierarchy; rk serves multiple roles directly

Severity: Medium Affects: web (entire vault), spec Component: spec + implementation drift

Description. crypto/spec/key-derivation.md § "Derivation tree" specifies:

master_secret
├── envelope_key  = HKDF(master, info="shieldfive/v1/envelope-key")
├── metadata_key  = HKDF(master, info="shieldfive/v1/metadata-key")
└── ml_kem_seed   = HKDF(master, info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed", L=64)

with the closing rule:

"Forbidden cross-context usage. These domain strings are consumed only by the crypto layer. The host application MUST NOT reuse any of them for unrelated purposes."

The web app's actual scheme uses rk (the 32-byte random vault key, generated by generateVaultKey(), vaultKeyClient.ts:93-98) in four distinct contexts without any HKDF derivation:

  1. As an AES-GCM key for wrapping top-level folder keysuseFolderTree.ts:255-258, encryptModal.tsx:443-446:

    const envelope = await wrapKeyB64({
      wrappingKeyB64: rootKey, // raw rk used as AES-GCM key
      keyToWrapB64: folderKey,
    })
    

    wrapKeyB64 (keyCrypto.ts:52-68) passes the b64-decoded bytes directly into crypto.subtle.importKey for AES-GCM.

  2. As the "secret" input to Argon2id-based metadata-key derivationuseFolderTree.ts:248: await deriveMetadataKey(rootKey). deriveMetadataKey itself is a no-op wrapper (metadataClient.ts:124-132); the real derivation is inside getOrCreateKeyForSalt (metadataClient.ts:270-313) which feeds rootKey as the password input to sodium.crypto_pwhash Argon2id.

  3. As a SHA-256-prefix input for the auto-share passwordsharePassword.ts:24-31:

    const encoded = new TextEncoder().encode(
      `${AUTO_SHARE_PASSWORD_CONTEXT}${rootKeyB64}`,
    )
    const digest = await subtle.digest('SHA-256', encoded)
    

    The base64 form of rk is concatenated with a fixed prefix and hashed; the truncated digest becomes the share-link password.

  4. As an HMAC-keyed input for the searchable name hashmetadataClient.ts:134-145, metadataClient.ts:660-669: subtle.digest('SHA-256', encoder.encode(secret)) produces the HMAC key, then HMAC-SHA-256 over the (lowercased) name.

Why this is Medium, not High:

  • No concrete attack is enabled. Each consumer uses a different primitive (AES-GCM encryption, Argon2id one-way function, SHA-256 hash, HMAC). Outputs are mutually computationally independent: knowing any one of them does not let an attacker derive rk or any of the others, modulo each primitive's security.
  • The spec's HKDF(master, "...envelope-key") derivation would, in isolation, produce 32 bytes computationally indistinguishable from rk itself given uniformly random rk. Skipping the HKDF step yields the SAME security against an outside attacker; the defense it provides is against future bugs that touch rk in yet-another context and create a collision.

Why this is Medium, not Low:

  • An independent implementer reading key-derivation.md and the crypto library's README will build a system that the deployed web app cannot interoperate with. The "Forbidden cross-context usage" list includes shieldfive/v1/envelope-key and shieldfive/v1/metadata-key — but the crypto library never derives them and the web never uses them.
  • The Phase-3 PQ-hybrid suite WILL bring HKDF-derived per-user keys into the picture (via deriveMlKemKeypair(rk), crypto/src/suites/pq-hybrid-v1/index.ts:117-126). The "rk is also used directly for X" pattern doesn't survive contact with that — the more contexts rk participates in, the harder it is to reason about exposure.
  • The spec is the canonical document for an audit-ready system. Drift between spec and implementation at this layer is itself the finding.

Reproduction or evidence. Read key-derivation.md § "Derivation tree" and § "Forbidden cross-context usage". Grep for the info strings in web/:

grep -rn "shieldfive/v1/envelope-key\|shieldfive/v1/metadata-key" web/

returns no hits — those derivations are not performed. Then read the four call sites listed above to confirm rk's direct use.

Recommendation. Two options; both require landing alongside a spec update.

Option A — implement the spec's hierarchy in the web app (preferred for audit readiness, larger change):

  1. On unlock, derive envelope_key = HKDF(rk, "shieldfive/v1/envelope-key", 32) and metadata_key_seed = HKDF(rk, "shieldfive/v1/metadata-key", 32). Keep both in memory alongside rk for the session.
  2. Replace wrappingKeyB64: rootKey with wrappingKeyB64: envelopeKey in the four call sites above. Top-level folder keys re-wrap during a one-shot migration.
  3. Replace deriveMetadataKey(rootKey) with deriveMetadataKey(metadata_key_seed) for top-level metadata; folder metadata continues to use the parent folder_key.
  4. The share-password derivation already uses a domain-prefixed SHA-256, which is acceptable; consider switching to HKDF for consistency.
  5. The searchable-hash MAC key should be a separate HKDF derivation (HKDF(rk, "shieldfive/v1/name-hash-mac", 32)) — this is currently ad-hoc (SHA-256 of rk as the HMAC key).

Option B — update the spec to describe what ships (smaller change, weaker outcome):

  1. Rewrite key-derivation.md § "Derivation tree" to say "the application's master secret is used directly as the envelope/wrap key; the application MAY domain-separate via HKDF but is not required to. The library's envelope_key/metadata_key info strings are reserved for future use."
  2. Remove shieldfive/v1/envelope-key and shieldfive/v1/metadata-key from the "Forbidden cross-context usage" list, or move them to a separate "Reserved for future use" subsection.

Option A is the right answer for Phase 3 launch credibility: an external auditor will ask "where is envelope_key in the code" and the answer should be a grep hit, not a git log apology. Option B is acceptable as a stopgap if Option A has to wait, but it makes the spec weaker, not stronger.

Land alongside a test vector pinning HKDF(rk, "...envelope-key", 32) output for rk = zeros(32) so future implementers can self-check.


Finding 2.3 — Web app uses Argon2id INTERACTIVE on already-high-entropy inputs (metadata key path)

Severity: Low Affects: web (filename / folder-name encryption) Component: implementation

Description. web/utils/metadataClient.ts's getOrCreateKeyForSalt (lines 270-313) runs the caller's secret through Argon2id at libsodium's INTERACTIVE preset:

const params =
  level === 'interactive'
    ? {
        ops: sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE,
        mem: sodium.crypto_pwhash_MEMLIMIT_INTERACTIVE,
      }
    : { ops: …OPSLIMIT_MODERATE, mem: …MEMLIMIT_MODERATE }

The secret argument, in every call site grepped, is a 32-byte random key in base64 form: the rk (useFolderTree.ts:248, useFiles.ts:337, 407), or a folder_key (encryptModal.tsx:432, useFolderTree.ts:122, 410, 485), or a share password (useSharing.ts:112 — this one IS low-entropy and the path is appropriate).

When the input is already 32 bytes of CSPRNG output, Argon2id buys nothing over HKDF — there is no password to stretch. The key-stretching work happens on every file/folder load (no cache across sessions because the per-blob salt is random and the cache is in-memory on the MetadataKeyHandle). On a mobile device this is the dominant cost in folder listing latency.

The level is INTERACTIVE explicitly "for cross-device compatibility" (comment at line 295) — i.e., to avoid 256 MiB allocations on phones. INTERACTIVE is libsodium's MIN (2 ops, 64 MiB). For a low-entropy password this would be borderline; for a 32-byte random key it is wasted work.

This is filed Low, not Medium, because:

  • The output is still a 256-bit key under the security of Argon2id, the per-blob salt prevents rainbow-table attacks, and the encrypted-output itself is AES-GCM with random IV. No confidentiality loss.
  • Replacing Argon2id with HKDF here is purely a performance/clarity improvement, not a security fix.

Reproduction or evidence. Read the four call sites cited above. Confirm that the secret parameter is either rootKey (32 random bytes, base64) or folderKey (32 random bytes, base64). The Argon2id path is unreachable from a low-entropy password input except via the share-password code path (useSharing.ts:112).

Recommendation. Split the metadata-key derivation by input type:

  • For folder-keyed metadata (filename, folder name within a known folder): replace Argon2id with HKDF(folderKey, salt=<16B per blob>, info="shieldfive/v1/metadata-key/folder", 32). Fast, no allocation.
  • For share-password-keyed metadata (the share name shown to the download-link recipient): keep Argon2id because the secret is a user-chosen password.

Apply a one-shot migration for existing folder-keyed metadata blobs; the V4 record format already carries kdf in the payload (metadataClient.ts:30-37), so adding kdf: 'hkdf-sha256' is additive. Land with decryptMetadataClient extended to handle both KDFs.

Pairs naturally with Finding 2.2's Option A — the metadata key for non-folder-scoped uses (filename hash, name search) would then be a proper HKDF derivation from rk.


Finding 2.4 — Spec underspecifies the ML-KEM-1024 seed (d, z) split convention

Severity: Low (Phase 3 prerequisite) Affects: spec Component: spec

Description. crypto/spec/key-derivation.md § "Derivation tree":

ml_kem_seed = HKDF(master, info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed", L=64)
└── (ml_kem_pk, ml_kem_sk) = ML-KEM-1024.KeyGen(ml_kem_seed)

The implementation (crypto/src/suites/pq-hybrid-v1/index.ts:117-126):

const seed = await hkdfSha256({
  ikm: masterSecret,
  info: 'shieldfive/v1/pq-hybrid/ml-kem-1024-seed',
  length: 64,
})
return ml_kem1024.keygen(seed)

passes the 64-byte seed to @noble/post-quantum's ml_kem1024.keygen, which splits it as d = seed.subarray(0, 32), z = seed.subarray(32) (node_modules/@noble/post-quantum/src/ml-kem.ts:281-289):

keygen: (seed = randomBytes(seedLen)) => {
  abytes(seed, seedLen, 'seed');
  const { publicKey, secretKey: sk } = KPKE.keygen(seed.subarray(0, 32));
  const publicKeyHash = HASH256(publicKey);
  // (dkPKE||ek||H(ek)||z)
  const secretKey = secretCoder.encode([sk, publicKey, publicKeyHash, seed.subarray(32)]);
  ...
}

That is, d = seed[0..32] is used as the K-PKE keygen seed (FIPS 203 § Algorithm 16 input), and z = seed[32..64] is the implicit-rejection seed concatenated into the secret-key blob.

The spec does not say this. An independent implementer using crystals-kyber-js's generateKeyPairDerand(d, z) API, or a Python binding, or a future noble version that changes the seed handling, has no spec-level evidence for which 32-byte half to feed where. The deterministic test vectors in tests/vectors/key-derivation.json (referenced by key-derivation.md § "Test vectors") will catch a wrong choice — but only if the implementer thinks to consult them, and only post-implementation.

The two related concerns:

  1. Spec correctness. A specification that fully determines its outputs is the bar for an audit-ready system. The current spec is one line short of meeting it.
  2. Phase 3 readiness. PQ-hybrid is the default suite for new files per format-v1.md § "Suite payloads / 0x03 (default)". When Phase 3 ships, identity-bundle interop with any reimplementation (mobile future port, a third-party client) becomes a deployment-blocking concern.

There is a related Task-2-instructions point about whether d and z should be HKDF-derived with separate info strings rather than via a single 64-byte expansion. By RFC 5869, HKDF-Expand with L=64 produces T(1) || T(2)[:32] where T(1) = HMAC(PRK, info || 0x01) and T(2) = HMAC(PRK, T(1) || info || 0x02) — i.e., d = T(1) and z = T(2) are computationally indistinguishable from independent random outputs under HKDF's PRF assumption. The single-expansion approach is standard practice (Signal Protocol, NaCl, many KEM implementations) and is not a finding. The spec gap is the split convention, not the single-vs-double expansion.

Reproduction or evidence. Read key-derivation.md § "Derivation tree". Independently, read FIPS 203 § Algorithm 16 (ML-KEM.KeyGen) — it takes (d, z) as two 32-byte inputs without prescribing an ordering or combined-seed convention. Read crypto/node_modules/@noble/post-quantum/src/ml-kem.ts:281-289 to see the noble library's choice.

Recommendation. Add one paragraph to key-derivation.md after the derivation-tree code block:

"The 64-byte ml_kem_seed is split into d = ml_kem_seed[0..32] and z = ml_kem_seed[32..64], with d consumed as ML-KEM-1024.KeyGen's random-coins input (FIPS 203 § Algorithm 16, parameter d) and z stored as the implicit-rejection seed (FIPS 203 § Algorithm 17, parameter z). Implementers using a library that exposes only a combined-seed API (e.g., KeyGen(seed)) MUST verify that the library matches this split convention; the deterministic test vectors at tests/vectors/key-derivation.json pin the expected pk[0..8] / sk[0..8] bytes for master_secret = zeros(32)."

Land alongside a test vector for master_secret = zeros(32)ml_kem_seed[0..16], pk[0..16], sk[0..16] so a third-party implementation can self-check independently of the noble library version. Status: Open, Phase 3 prerequisite.


Finding 2.5 — Two HKDF info strings are constructed inline at call sites, contradicting the library's own invariant

Severity: Low Affects: crypto library Component: implementation

Description. crypto/src/internal/hkdf.ts:2-7 states:

"Every domain-separated key the library derives from another key flows through this function. Domain strings are constants in internal/types.ts — never construct an info parameter ad-hoc at the call site."

Eight of the library's ten HKDF call sites do follow this — they use the HKDF_INFO constants from crypto/src/internal/types.ts:78-86.

Two do not:

Both strings are legitimate purpose-named info strings; the security property they provide (domain separation from other HKDF calls) is preserved. The issue is the documented invariant — a future contributor who reads hkdf.ts's docstring and then needs to add a new HKDF derivation has to choose between "follow the convention" and "follow the precedent." That ambiguity is how typos become collisions.

Reproduction or evidence. Grep the library for hkdfSha256 and filter to non-HKDF_INFO call sites:

grep -rn "hkdfSha256\b" crypto/src --include="*.ts" \
  | grep -v "HKDF_INFO" | grep -v "test"

Yields the two call sites above. Cross-reference with HKDF_INFO in types.ts to confirm the absence.

Recommendation. Add the two strings to HKDF_INFO:

export const HKDF_INFO = Object.freeze({
  HEADER_MAC: 'shieldfive/v1/header-mac',
  AES_GCM_CHUNK_KEY: 'shieldfive/v1/aes-gcm/chunk-key',
  AES_GCM_NONCE_PREFIX: 'shieldfive/v1/aes-gcm/nonce-prefix',
  XCHACHA_CHUNK_KEY: 'shieldfive/v1/xchacha/chunk-key',
  XCHACHA_NONCE_PREFIX: 'shieldfive/v1/xchacha/nonce-prefix',
  PQ_HYBRID_COMBINE: 'shieldfive/v1/pq-hybrid/combine',
  ML_KEM_1024_SEED: 'shieldfive/v1/pq-hybrid/ml-kem-1024-seed', // new
  ARGON2ID_SALT_COMPRESSION: 'shieldfive/v1/argon2id/salt-compression', // new
})

Update the two call sites to reference HKDF_INFO.ML_KEM_1024_SEED and HKDF_INFO.ARGON2ID_SALT_COMPRESSION. No behavior change.


Finding 2.6 — Spec's "Forbidden cross-context usage" list omits shieldfive/v1/argon2id/salt-compression

Severity: Low Affects: spec Component: spec

Description. crypto/spec/key-derivation.md § "Forbidden cross-context usage" enumerates nine info strings:

shieldfive/v1/header-mac
shieldfive/v1/aes-gcm/chunk-key
shieldfive/v1/aes-gcm/nonce-prefix
shieldfive/v1/xchacha/chunk-key
shieldfive/v1/xchacha/nonce-prefix
shieldfive/v1/pq-hybrid/combine
shieldfive/v1/pq-hybrid/ml-kem-1024-seed
shieldfive/v1/envelope-key
shieldfive/v1/metadata-key

The library uses a tenth string, shieldfive/v1/argon2id/salt-compression (crypto/src/kdf/argon2id.ts:106), to HKDF-compress an oversize Argon2id salt down to 16 bytes. This is not in the forbidden list. A host application could legitimately read the spec, see it's not reserved, and reuse the string — with no cross-context-reuse consequence today (host apps don't typically run through deriveMasterSecret), but the list is the canonical reference for which strings the crypto layer claims.

Reproduction or evidence. Diff the strings used in crypto/src/ HKDF calls against the spec's list.

Recommendation. Add to the spec's forbidden-list block. Same edit satisfies the analogous gap if Finding 2.5's HKDF_INFO additions land (both new constants should be in the spec list).

While editing, consider whether shieldfive/v1/envelope-key and shieldfive/v1/metadata-key belong on this list at all — they describe derivations the library never performs (see Finding 2.2). Either move them to a "Reserved for future use" subsection or remove them.


Finding 2.7 — Spec doesn't document that PQ-hybrid intentionally reuses the XChaCha HKDF info strings

Severity: Informational Affects: v1 (PQ-hybrid suite), spec Component: spec

Description. The PQ-hybrid suite's chunk-key and nonce-prefix derivations reuse the XChaCha suite's HKDF info strings, with the combined key K standing in for the content key:

crypto/src/suites/pq-hybrid-v1/index.ts:166-176:

// Reuse the xchacha derivation domain — semantically the same chunk-key role.
const chunkKey = await hkdfSha256({
  ikm: combinedKey,
  salt: fileId,
  info: HKDF_INFO.XCHACHA_CHUNK_KEY, // "shieldfive/v1/xchacha/chunk-key"
  length: 32,
})
const noncePrefix = await hkdfSha256({
  ikm: fileId,
  info: HKDF_INFO.XCHACHA_NONCE_PREFIX, // "shieldfive/v1/xchacha/nonce-prefix"
  length: 16,
})

key-derivation.md § "Per-file key derivation" describes this obliquely:

"For the PQ-hybrid suite, content_key is replaced by the combined key K = HKDF(classical_share || pq_share, salt=file_id, info="shieldfive/v1/pq-hybrid/combine")."

This is correct but does not explicitly say "the chunk-key and nonce-prefix info strings are the same as the XChaCha suite's, with K substituted for content_key." An implementer might reasonably expect a fresh info string of the form shieldfive/v1/pq-hybrid/chunk-key.

The reuse is cryptographically sound. The two callers — XChaCha-v1 with ikm = contentKey (per-file random 32 bytes), and PQ-hybrid with ikm = combinedKey (HKDF output, per-file 32 bytes via a different input space) — produce identical HKDF outputs only when the IKMs collide, which has 2⁻²⁵⁶ probability and is not exploitable. Both IKMs are 256-bit uniformly random / pseudorandom under HKDF.

But the spec should say so explicitly. An auditor reading the spec at face value will flag the same info string appearing for two distinct suites; the answer "this is intentional and safe because the IKMs are fresh per file" is correct but should not be a verbal answer.

Reproduction or evidence. Read key-derivation.md § "Per-file key derivation". Read crypto/src/suites/pq-hybrid-v1/index.ts:163-178 including the inline comment at line 165.

Recommendation. Replace the spec paragraph with:

"For the PQ-hybrid suite, content_key is replaced by the combined key K = HKDF(classical_share || pq_share, salt=file_id, info="shieldfive/v1/pq-hybrid/combine", L=32). The chunk-key and nonce-prefix derivations are then identical to the XChaCha suite's, with K in the role of content_key and the same info strings (shieldfive/v1/xchacha/chunk-key and shieldfive/v1/xchacha/nonce-prefix). Reuse of those info strings across the two suites is safe because the IKM space is partitioned: XChaCha uses a fresh random content_key, PQ-hybrid uses the file-bound HKDF output K; both are 32 bytes of uniformly random / pseudorandom material per file, so the probability of cross-suite output collision is 2⁻²⁵⁶."

No code change.


Finding 2.8 — Salt-absent convention inherits the Task-1 Finding 1.2 ambiguity across all HKDF call sites

Severity: Informational Affects: crypto library + spec Component: spec

Description. key-derivation.md § "Derivation tree" states "Salt is empty unless otherwise stated." Task 1 Finding 1.2 flagged the same phrasing in format-v1.md for nonce-prefix derivations: "empty" is ambiguous between RFC 5869's "absent salt" (HashLen zeros) and a literal zero-length string, with the implementation using the former (crypto/src/internal/hkdf.ts:20):

const salt = options.salt ?? new Uint8Array(32) // SHA-256 zero-salt fallback

key-derivation.md inherits this ambiguity. The affected derivations (salt absent ⇒ 32 zero bytes in implementation):

  • envelope_key (spec only — unimplemented)
  • metadata_key (spec only — unimplemented)
  • ml_kem_seed (pq-hybrid-v1/index.ts:120-124)
  • nonce prefixes for both XChaCha and AES-GCM (already filed in Task 1 Finding 1.2)

Not re-filing as a fresh finding — it is the same root issue, just appearing in a different spec file. Cross-referencing here for completeness.

Recommendation. When fixing Task 1 Finding 1.2, also update key-derivation.md § "Derivation tree" preamble:

"All derivations are HKDF-SHA-256. The info string is a printable domain separator. When the salt is omitted in the definitions below, it is zeros(32) per RFC 5869's 'absent salt' convention. Keys are 32 bytes unless otherwise stated."

Same test-vector posture as Task 1 Finding 1.2 — the existing test vectors pin the convention if any new derivation is added.


What I checked but did not find issues with

The following areas were reviewed against both spec and implementation; no drift was found or, where flagged, severity was Informational and the property held.

Argon2id parameters (crypto library)

  • Master-secret output length is 32 bytes (argon2id.ts:32), matching the spec.
  • Salt length is enforced at ≥16 bytes; oversize salts are HKDF-compressed to 16 bytes via a domain-separated info string (argon2id.ts:94-108). The compression is sound: HKDF-SHA-256 on a salt input produces a uniformly random 16-byte output, preserving Argon2id's salt-space size.
  • Empty passphrase is rejected (argon2id.ts:90-92).
  • verifyPassphrase uses constant-time comparison (argon2id.ts:158-163).
  • The 'interactive' preset is deliberately not exposed (argon2id.ts:18-29).
  • MODERATE preset: 3 ops, 256 MiB (libsodium's crypto_pwhash_OPSLIMIT_MODERATE / crypto_pwhash_MEMLIMIT_MODERATE). Above OWASP 2024 minimum for Argon2id (m=19 MiB, t=2).
  • SENSITIVE preset: 4 ops, 1 GiB. Above OWASP 2024 first-tier (m=46 MiB, t=1).
  • The chosen preset is persisted in the output (preset: Argon2idPreset, argon2id.ts:73-75) so re-derivation works without parameter ambiguity.
  • The implementation uses sodium.crypto_pwhash_ALG_ARGON2ID13 explicitly (argon2id.ts:122), pinning the Argon2id variant.

HKDF call sites (complete table, ten sites in crypto library)

#File:lineIKMSaltInfo stringL (bytes)Output role
1format/header.ts:76contentKey (32B)fileId (16B)shieldfive/v1/header-mac32header MAC key
2kdf/argon2id.ts:104oversize saltabsent (32 zeros)shieldfive/v1/argon2id/salt-compression16Argon2id 16-byte salt
3suites/aes-gcm-v1/index.ts:68contentKey (32B)fileId (16B)shieldfive/v1/aes-gcm/chunk-key32AES-GCM chunk key
4suites/aes-gcm-v1/index.ts:80fileId (16B)absent (32 zeros)shieldfive/v1/aes-gcm/nonce-prefix4AES-GCM nonce prefix
5suites/pq-hybrid-v1/index.ts:120masterSecret (32B)absent (32 zeros)shieldfive/v1/pq-hybrid/ml-kem-1024-seed64ML-KEM keygen seed (d‖z)
6suites/pq-hybrid-v1/index.ts:146classical‖pq (64B)fileId (16B)shieldfive/v1/pq-hybrid/combine32combined key K
7suites/pq-hybrid-v1/index.ts:166combinedKey K (32B)fileId (16B)shieldfive/v1/xchacha/chunk-key32PQ-hybrid chunk key (info reused with #9 — Finding 2.7)
8suites/pq-hybrid-v1/index.ts:172fileId (16B)absent (32 zeros)shieldfive/v1/xchacha/nonce-prefix16PQ-hybrid nonce prefix (info reused with #10 — Finding 2.7)
9suites/xchacha-v1/index.ts:65contentKey (32B)fileId (16B)shieldfive/v1/xchacha/chunk-key32XChaCha chunk key
10suites/xchacha-v1/index.ts:75fileId (16B)absent (32 zeros)shieldfive/v1/xchacha/nonce-prefix16XChaCha nonce prefix

Verified:

  • No accidental info-string collisions. The only repeated info strings are #7↔#9 and #8↔#10, both intentional (Finding 2.7), and the IKMs are partitioned by suite-specific derivation (combinedKey K is a HKDF output, contentKey is fresh random; collision probability 2⁻²⁵⁶).
  • No cross-version collisions. All info strings carry the v1 segment; the library has no v0 HKDF derivations (v0 uses random IV prefix, no key derivation).
  • Domain separators are version + suite + purpose in every case.
  • No cross-context reuse of file_id salt across keys. When fileId is the salt (HKDF-Extract input), it varies per file and is paired with different info strings per output; HKDF-Extract's PRF behavior partitions the outputs.
  • No HKDF call uses the master_secret as IKM AND a low-entropy salt together (sites 5 and 2 have absent salt = 32 zeros, which is the RFC 5869 default behavior and does not weaken the construction).
  • Output lengths are appropriate to their use (32 bytes for keys, 4 for AES-GCM nonce prefix, 16 for XChaCha nonce prefix, 16 for Argon2id salt, 64 for ML-KEM seed).

ML-KEM-1024 keypair derivation

  • Determinism verified: deriveMlKemKeypair(zeros(32)) produces the same (publicKey, secretKey) on every invocation by reading the implementation (no internal randomness past the HKDF expansion; ml_kem1024.keygen(seed) is a pure function of the seed per FIPS 203 Algorithm 16).
  • Domain separation from the other rk-derived keys: the info string shieldfive/v1/pq-hybrid/ml-kem-1024-seed is unique in HKDF_INFO (or would be once Finding 2.5 lands).
  • Output length 64 bytes matches the noble library's seedLen = 64 contract (node_modules/@noble/post-quantum/src/ml-kem.ts:276).
  • ML-KEM-1024 lengths match FIPS 203 (1568 public key, 3168 secret key, 1568 ciphertext) — see pq-hybrid-v1/index.ts:36-41.
  • The single-HKDF-then-split convention is filed as Finding 2.4 (spec drift), not as a derivation defect — both d and z ARE HKDF-derived from the master secret, just from the same expansion. Standard practice and cryptographically sound.
  • The classical share S_c for the PQ-hybrid suite is generated as randomBytes(32) per file (pq-hybrid-v1/index.ts:258) — fresh per encryption, not derived.

Content-key envelope (web app)

  • CSK generation: crypto.getRandomValues(new Uint8Array(32)) via generateKeyB64 (keyCrypto.ts:22-26). 256-bit uniformly random.
  • file_id generation: randomBytes(16) from the crypto library's runtime.ts (under the hood crypto.getRandomValues); see also the crypto library's generateContentMaterial (aes-gcm-v1/index.ts:112-120). 16-byte random ID is sufficient for collision resistance over a user's vault.
  • CSK wrap algorithm: AES-GCM with random 12-byte IV stored alongside (keyCrypto.ts:52-68). IV is fresh per wrap (called once per file at upload).
  • CSK wrap IV is crypto.getRandomValues(new Uint8Array(12)); with 2⁹⁶ IV space and per-wrap freshness, AES-GCM IV reuse is not a concern at this layer.
  • Password-wrapped CSKs (for shared files) use the same wrapVaultKey → PBKDF2-SHA-256 600 000 iter path as the vault key itself (useSharing.ts:141). Consistent with the vault path; Finding 2.1's Argon2id migration recommendation covers this too.
  • Folder-key wrap chain: each folder_key is wrapped by its parent's folder_key (or rk for top-level); fk_iv is random per wrap; no cross-folder reuse of the wrapped form is possible.
  • The wrap operations do not bind a per-file or per-folder AAD (wrapKeyB64 calls AES-GCM without additionalData). This is benign in practice — a swapped csk_wrapped would unwrap to a CSK that does not match the on-disk file's file_id-bound chunk-key derivation, so decryption would fail at the chunk layer. The structural binding lives one layer down.
  • Recovery-key generation: generateVaultKey() = 32 random bytes from crypto.getRandomValues. 256 bits of entropy, far above the 128-bit threshold. The user is shown the recovery key once and instructed to back it up.

Master-secret lifecycle (Step 6 critical questions)

The three critical recoverability questions from the Task 2 instructions, evaluated against the deployed web flow:

  • Database alone (no password, no recovery key) → can attacker decrypt? No. The DB has rkWrappedByUk (AES-GCM-wrapped under PBKDF2(password)), rkWrappedByRec (AES-GCM-wrapped under recovery key), wrapped folder keys, wrapped CSKs, and Argon2id-encrypted metadata. None of the wrapping keys are in the DB; all are downstream of either the password or the recovery key. ✓
  • Password alone (no recovery key) → full access? Yes (as designed). unwrapVaultKey({password, ...}) derives the AES-GCM key via PBKDF2 and unwraps rkWrappedByUk to recover rk; everything downstream then unwraps. ✓
  • Recovery key alone (no password) → full access? Yes (as designed). resetWithRecovery calls unwrapRootKeyWithRecovery(recoveryKey, rkWrappedByRec, recIv) (keyring.ts:293-305) and unwraps rkWrappedByRec directly. Resetting via recovery key also re-wraps under a new password. ✓

Additional checks:

  • The recovery-key path is independent of the password path: both wrap the same rk but with different keys and IVs, so compromising one does not leak the other.
  • Regenerating the recovery key (keyring.ts:613-629) re-wraps rk under a fresh recovery key without changing rk itself — files remain readable.
  • Changing the password (keyring.ts:541-577) re-wraps rk under a fresh PBKDF2(newPassword) key; rk itself is not rotated. No file re-wrap is required.

Spec-vs-implementation cross-checks

  • Every HKDF info string in code matches its spec definition byte-for-byte (HKDF_INFO constants vs. spec § "Derivation tree" and § "Forbidden cross-context usage", modulo the gap in Finding 2.6).
  • content_key is fresh per file (web: generateKeyB64(); crypto: generateContentMaterial()), never derived. Matches spec.
  • header_mac_key, chunk_key, nonce_prefix use the salt and info pattern documented in key-derivation.md § "Per-file key derivation".
  • For PQ-hybrid, the combined key K is computed as HKDF(S_c‖S_pq, salt=file_id, info="shieldfive/v1/pq-hybrid/combine") — matches spec.

Out of scope (deferred to other tasks)

  • Wire format / AAD construction — Task 1 (filed).
  • AEAD nonce derivation per-chunk (uniqueness, cross-file, cross-key rotation) — Task 4.
  • Streaming worker key handling beyond derivation (key zeroization, race conditions, worker isolation) — Task 5.
  • Server-side and web-application surfaces (share handling, service-role boundary, RLS, database constraints, RPC authorization, CSP, in-memory key handling — sessionStorage encrypted blob, IndexedDB wrap-key, V2 fallback for Safari, page-close behavior — session / auth flow tying the unlock to the server token, recovery-key UX, logging) are out of crypto-layer scope and covered by the separate server-side review, kept private.
  • Threat-model coverage — Task 3.

References