Task 2 — Key derivation review
Date: 2026-05-11
Reviewer: Cho García
Scope: crypto/spec/key-derivation.md, the @shieldfive/crypto
KDF/HKDF/HMAC/identity/suite modules (vendored at 1.0.0-alpha.3, repo
HEAD a5e99ac), and the web-side key handling
(web/utils/keyring.ts, web/utils/keyCrypto.ts,
web/utils/folderKeyCache.ts, web/utils/vaultKeyClient.ts,
web/utils/metadataClient.ts, web/utils/sharePassword.ts,
web/utils/keyHierarchyMoves.ts, web/utils/wrappingKeyStore.ts, web
HEAD bfa41a57).
Methodology: key-hierarchy reconstruction from spec and code, then
spec-vs-implementation audit. Every hkdfSha256 call site in the
library was grepped and tabulated (Step 3 below). The web app's
parallel — and partly divergent — key tree was reconstructed from the
keyring/vaultKeyClient/metadataClient call chains used by the
encryption and folder-tree flows. Master-key in-memory lifecycle and
session/auth boundaries were treated as out of crypto-layer scope.
Summary
The crypto library's per-file key derivation is sound: every chunk key, nonce prefix, and header MAC key is HKDF-SHA-256 with a distinct, purpose-named info string, file_id used as salt for content-key-keyed derivations, and content material that is fresh per file. The ML-KEM-1024 keypair is deterministically derivable from a 32-byte master secret via HKDF, which is the right tradeoff for backup ergonomics.
The integration story is weaker. The crypto spec describes a
master_secret → envelope_key / metadata_key HKDF hierarchy in
spec/key-derivation.md that the web app does not implement. The web
instead uses a flat scheme: a 32-byte random "root key" (rk) generated at
signup, stored on the server wrapped under (a) a PBKDF2-SHA-256 key
derived from the user's password and (b) a 32-byte random recovery key.
That same rk is then used directly in four distinct contexts (AES-GCM
wrapping of folder keys, Argon2id input for filename-metadata keys,
SHA-256 prefix-hash for share passwords, HMAC key for searchable name
hashes) without HKDF domain separation. The crypto library's Argon2id
implementation, which the spec implies for password stretching, is not
consumed by the web app.
Highest-impact findings:
- 2.1 — web app uses PBKDF2-SHA-256 (600 000 iterations) for password-wrapped vault-key derivation; the crypto library ships an Argon2id implementation (libsodium MODERATE: 3 ops, 256 MiB) that the web does not use. PBKDF2 meets OWASP 2023's minimum for that primitive but is materially weaker per dollar of attacker GPU/ASIC budget than Argon2id. Database-only decryption is still not possible, but offline password brute force becomes the cheapest path to root-key recovery.
- 2.2 — the web app implements a flat envelope scheme rather than the
spec's
HKDF(master, "...envelope-key")/HKDF(master, "...metadata-key")hierarchy.rkis used directly as an AES-GCM wrapping key for folder keys, as the input to per-file Argon2id metadata-key derivation, as a hashed share-password seed, and as a searchable-hash MAC key. No concrete attack is enabled — the four consumers use different operations whose outputs are mutually computationally independent — but the spec's design intent (HKDF domain separation per purpose) is not honored, and the documented derivation tree does not describe the system that ships. - 2.4 — the spec's
ml_kem_seed = HKDF(..., L=64) → ML-KEM-1024.KeyGen(seed)is silent on the(d, z)split convention. Test vectors pin one particular interpretation, but the spec text alone is not enough to reproduce the same keypair against a library that exposesKeyGen(d, z)rather thanKeyGen(seed). Phase-3 prerequisite.
No Critical findings. Database-alone decryption is not possible, password alone fully recovers the root key (as designed), the recovery key has 256 bits of entropy, and Argon2id parameters in the library are well above OWASP minima. The findings are integration-layer hardening and spec/implementation alignment, not breaks.
Key hierarchy
The diagram below merges (a) the crypto spec's documented hierarchy with
(b) what the web app actually does. Differences are flagged with (spec only — unimplemented) or (web only — not in spec).
┌─────────────────────────────────────────────────────────────────────┐
│ Password (user input, low entropy) │
│ │ │
│ │ PBKDF2-SHA-256, 600 000 iters, 16B random salt │
│ │ (web/utils/vaultKeyClient.ts:79-91) │
│ │ │
│ ▼ │
│ uk (256-bit AES key, ephemeral) ─── wraps ───► rkWrappedByUk │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ Recovery key (32 random bytes, generateVaultKey, shown once) │
│ │ │
│ ▼ │
│ rec (256-bit AES key) ─── wraps ───► rkWrappedByRec │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ rk = "root key" (32 random bytes, generated client-side at signup) │
│ In spec language: this IS the master_secret. │
│ In web language: this is also the envelope_key, also the │
│ metadata seed, also the share-password seed. │
│ │ │
│ ├──(spec only — unimplemented)─► │
│ │ envelope_key = HKDF(rk, "shieldfive/v1/envelope-key")│
│ │ metadata_key = HKDF(rk, "shieldfive/v1/metadata-key")│
│ │ │
│ ├──(web) AES-GCM wrap, IV random 12B, no AAD ───► │
│ │ top-level folder_key → folders.fk_wrapped/fk_iv │
│ │ (web/utils/keyCrypto.ts:52-68) │
│ │ │
│ ├──(web) Argon2id(rk, 16B random salt per blob, INTERACTIVE) │
│ │ = 32B AES-GCM key for filename metadata │
│ │ (web/utils/metadataClient.ts:259-313) │
│ │ │
│ ├──(web) SHA-256("shieldfive:auto-share:v1:" || rk_b64) │
│ │ truncated to 24B ASCII → auto-share password input to │
│ │ wrapVaultKey (PBKDF2) for csk_pw_wrapped │
│ │ (web/utils/sharePassword.ts:18-32) │
│ │ │
│ └──(web) HMAC-SHA-256 keyed by SHA-256(rk) → │
│ searchable folder/file name hash │
│ (web/utils/metadataClient.ts:134-145, 660-669) │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ folder_key (32 random bytes per folder, generated client-side) │
│ │ │
│ ├──(web) wrapped by parent folder_key (or rk for top-level) │
│ │ AES-GCM, random IV → folders.fk_wrapped/fk_iv │
│ │ │
│ ├──(web) AES-GCM wrap → child folder_key, file CSK │
│ │ (no AAD binding to folder_id) │
│ │ │
│ └──(web) input to Argon2id per-file metadata key derivation │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ CSK = content_key (32 random bytes per file, generateKeyB64) │
│ │ │
│ ├──(web) wrapped by parent folder_key │
│ │ AES-GCM, random IV → files.csk_wrapped/csk_iv │
│ │ │
│ ├──(web, optional, when shared) wrapped by PBKDF2(share pwd) │
│ │ → files.csk_pw_wrapped + csk_pw_{salt,iv,iterations} │
│ │ │
│ └──(crypto lib) passed as contentKey into the AES-GCM-v1 │
│ suite. Per-file derivations below. │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ file_id (16 random bytes per file) │
│ content_key = CSK above │
│ │
│ header_mac_key = HKDF(content_key, salt=file_id, │
│ info="shieldfive/v1/header-mac", L=32) │
│ │
│ chunk_key = HKDF(content_key, salt=file_id, │
│ info="shieldfive/v1/<suite>/chunk-key", L=32) │
│ │
│ nonce_prefix = HKDF(file_id, salt=absent (32 zeros), │
│ info="shieldfive/v1/<suite>/nonce-prefix", │
│ L = 4 (AES-GCM) or 16 (XChaCha)) │
│ │
│ ──────────────────────────────────────────────────────────────── │
│ │
│ (Phase 3, PQ-hybrid; not yet wired into the web app) │
│ │
│ ml_kem_seed = HKDF(rk, salt=absent (32 zeros), │
│ info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed",│
│ L=64) │
│ (pk_pq, sk_pq) = ML-KEM-1024.KeyGen(ml_kem_seed) │
│ where the noble library splits seed[0..32]=d, seed[32..64]=z │
│ (Finding 2.4 — convention not in spec) │
│ │
│ K (combined) = HKDF(classical_share || pq_share, salt=file_id, │
│ info="shieldfive/v1/pq-hybrid/combine", L=32) │
│ │
│ PQ-hybrid chunk_key = HKDF(K, salt=file_id, │
│ info="shieldfive/v1/xchacha/chunk-key",│
│ L=32) — info reused with XChaCha │
│ │
│ PQ-hybrid nonce_prefix = HKDF(file_id, salt=absent, │
│ info="shieldfive/v1/xchacha/nonce-prefix",│
│ L=16) — info reused with XChaCha │
└─────────────────────────────────────────────────────────────────────┘
Cross-context reachability check ("if I had this one key, what could I derive?"):
- CSK for file A: unlocks file A only. Cannot derive CSK for file B (different random 32 bytes), cannot derive folder_key (CSKs are downstream of folder_keys), cannot derive rk.
- folder_key for folder F: unlocks every CSK in folder F, every child folder_key, and recursively their CSKs. Cannot derive sibling or parent folder_keys (random per folder). Cannot derive rk.
- rk: unlocks everything downstream. Cannot derive the password (PBKDF2 is one-way) or the recovery key (independent random bytes).
- Recovery key: unwraps
rkWrappedByRec, then everything downstream. Cannot derive the password. - Password: unwraps
rkWrappedByUk(via PBKDF2), then everything downstream. Cannot derive the recovery key. - DB-only attacker (no password, no recovery key): sees only wrapped blobs whose keys are not in the DB. No decryption is possible. ✓
- Phase-3 sk_pq (when shipped): unwraps PQ-hybrid suite_payloads encapsulated for this user; via HKDF determinism, sk_pq is recoverable from rk + the seed info string, so leaking sk_pq alone is no worse than leaking the derived PQ key — but holding sk_pq does NOT recover rk (HKDF is one-way).
Findings
Finding 2.1 — Web app uses PBKDF2-SHA-256 (600 000) instead of Argon2id for password-wrapped vault key
Severity: Medium Affects: web (signup, login, change-password, share with custom password) Component: implementation + spec drift
Description. The web app's vault-key wrapper, used at signup, login,
password change, and per-file share-password wrapping, is
web/utils/vaultKeyClient.ts:79-91:
return crypto.subtle.deriveKey(
{
name: 'PBKDF2',
hash: 'SHA-256',
salt: salt as Uint8Array<ArrayBuffer>,
iterations,
},
material,
{ name: 'AES-GCM', length: 256 },
...
)
DEFAULT_ITERATIONS = 600_000
(vaultKeyClient.ts:16),
MINIMUM_SECURE_ITERATIONS = 600_000
(vaultKeyClient.ts:22). The
in-file comment cites OWASP 2023's 600 000-iteration recommendation for
PBKDF2-SHA-256.
In parallel, the crypto library ships a vetted Argon2id implementation
backed by libsodium-sumo
(crypto/src/kdf/argon2id.ts):
'moderate' → opslimit=crypto_pwhash_OPSLIMIT_MODERATE (3 ops),
memlimit=crypto_pwhash_MEMLIMIT_MODERATE (256 MiB)
'sensitive' → 4 ops / 1 GiB
crypto/spec/key-derivation.md's "Master secret" section names Argon2id
as the typical mechanism ("…typically by stretching a user passphrase
through Argon2id…"). The crypto library exports deriveMasterSecret
(crypto/src/kdf/argon2id.ts:85-130).
The web app does not consume this — libsodium is loaded for metadata
encryption (web/utils/metadataClient.ts:1)
but not for the vault-key path.
Why this matters at Medium severity, not Low:
- The vault key (rk) is the root of the entire client-side hierarchy.
rkWrappedByUkand the user's PBKDF2 parameters live in the database (vault_keysrow, served by/api/vault-key); a database breach exposes all of these to an offline attacker. - PBKDF2-SHA-256 is not memory-hard. A modern GPU rig amortizes 600 000 SHA-256 iterations across thousands of guesses per second per GPU; consumer ASICs do better. Argon2id at MODERATE (3 ops, 256 MiB) needs ~256 MiB of high-bandwidth memory per guess, which collapses throughput by roughly two orders of magnitude on the same hardware budget.
- The user-facing impact of switching is bounded (Argon2id MODERATE is
~100-200 ms on contemporary CPUs, comparable to the current PBKDF2
target). The crypto library's helper handles
wrap/unwrapsymmetry, so the migration is straightforward: at next-password-change, storekdf: 'argon2id',argon2_ops,argon2_meminstead ofiterations. - There IS a documented migration path in the file —
kdfanditerationsare already persisted per-record (vaultKeyClient.ts:122-124), andunwrapVaultKeyrejects non-pbkdf2-sha256records explicitly (vaultKeyClient.ts:136-138). Extending the union to acceptargon2idis a small change.
This is not Critical because the password is still the attacker's bottleneck:
- DB-only without password/recovery key cannot decrypt (Critical threshold).
- Password alone DOES recover rk (which is the design, see Step 6).
- Recovery-key alone DOES recover rk.
Reproduction or evidence. Inspect wrapVaultKey and unwrapVaultKey
at web/utils/vaultKeyClient.ts:100-147
and the comment at lines 3-15
documenting "OWASP 2023 recommends 600,000+ iterations for
PBKDF2-SHA256". Cross-reference with
crypto/src/kdf/argon2id.ts:85-130
showing the unused Argon2id helper, and crypto/spec/key-derivation.md
§ "Master secret" naming Argon2id as the typical mechanism.
Recommendation. Migrate the vault-key wrapper to Argon2id, lazily. Concrete plan:
- Extend the
kdfunion invault_keysrow to acceptargon2idand add three columns (or one JSON column) forargon2_ops,argon2_mem, andargon2_salt— the salt is already 16 random bytes perwrapVaultKeyinvocation, so it can be reused. - On password change for accounts with
kdf = 'pbkdf2-sha256', derive the rk via the legacy path, then re-wrap with the Argon2id path and store the new fields. Old PBKDF2 records remain readable indefinitely. - On signup, default to Argon2id MODERATE (3 ops, 256 MiB) — matches the crypto library's default and the spec's intent.
- Add a
shouldUpgradeKdf(currentKdf)analogue to the existingshouldUpgradeKeyIterationsso the UI can nudge legacy-PBKDF2 users to change their password.
Also apply the same change to the share-password wrap
(useSharing.ts:141)
which calls the same wrapVaultKey — this protects shared files'
csk_pw_wrapped with the same KDF as the vault key itself.
Track as a Phase-2 hardening item; do not block on it. The migration is a database schema change (additive) plus a few code-paths, and benefits from being done before Phase 3 because the ML-KEM keypair is derived from rk and any rk-level brute force compromises post-quantum keys trivially too.
Finding 2.2 — Web app implements a flat envelope scheme rather than the spec's HKDF hierarchy; rk serves multiple roles directly
Severity: Medium Affects: web (entire vault), spec Component: spec + implementation drift
Description. crypto/spec/key-derivation.md § "Derivation tree"
specifies:
master_secret
├── envelope_key = HKDF(master, info="shieldfive/v1/envelope-key")
├── metadata_key = HKDF(master, info="shieldfive/v1/metadata-key")
└── ml_kem_seed = HKDF(master, info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed", L=64)
with the closing rule:
"Forbidden cross-context usage. These domain strings are consumed only by the crypto layer. The host application MUST NOT reuse any of them for unrelated purposes."
The web app's actual scheme uses rk (the 32-byte random vault key,
generated by generateVaultKey(),
vaultKeyClient.ts:93-98)
in four distinct contexts without any HKDF derivation:
-
As an AES-GCM key for wrapping top-level folder keys —
useFolderTree.ts:255-258,encryptModal.tsx:443-446:const envelope = await wrapKeyB64({ wrappingKeyB64: rootKey, // raw rk used as AES-GCM key keyToWrapB64: folderKey, })wrapKeyB64(keyCrypto.ts:52-68) passes the b64-decoded bytes directly intocrypto.subtle.importKeyfor AES-GCM. -
As the "secret" input to Argon2id-based metadata-key derivation —
useFolderTree.ts:248:await deriveMetadataKey(rootKey).deriveMetadataKeyitself is a no-op wrapper (metadataClient.ts:124-132); the real derivation is insidegetOrCreateKeyForSalt(metadataClient.ts:270-313) which feedsrootKeyas the password input tosodium.crypto_pwhashArgon2id. -
As a SHA-256-prefix input for the auto-share password —
sharePassword.ts:24-31:const encoded = new TextEncoder().encode( `${AUTO_SHARE_PASSWORD_CONTEXT}${rootKeyB64}`, ) const digest = await subtle.digest('SHA-256', encoded)The base64 form of rk is concatenated with a fixed prefix and hashed; the truncated digest becomes the share-link password.
-
As an HMAC-keyed input for the searchable name hash —
metadataClient.ts:134-145,metadataClient.ts:660-669:subtle.digest('SHA-256', encoder.encode(secret))produces the HMAC key, then HMAC-SHA-256 over the (lowercased) name.
Why this is Medium, not High:
- No concrete attack is enabled. Each consumer uses a different primitive (AES-GCM encryption, Argon2id one-way function, SHA-256 hash, HMAC). Outputs are mutually computationally independent: knowing any one of them does not let an attacker derive rk or any of the others, modulo each primitive's security.
- The spec's
HKDF(master, "...envelope-key")derivation would, in isolation, produce 32 bytes computationally indistinguishable from rk itself given uniformly random rk. Skipping the HKDF step yields the SAME security against an outside attacker; the defense it provides is against future bugs that touch rk in yet-another context and create a collision.
Why this is Medium, not Low:
- An independent implementer reading
key-derivation.mdand the crypto library's README will build a system that the deployed web app cannot interoperate with. The "Forbidden cross-context usage" list includesshieldfive/v1/envelope-keyandshieldfive/v1/metadata-key— but the crypto library never derives them and the web never uses them. - The Phase-3 PQ-hybrid suite WILL bring HKDF-derived per-user keys
into the picture (via
deriveMlKemKeypair(rk),crypto/src/suites/pq-hybrid-v1/index.ts:117-126). The "rk is also used directly for X" pattern doesn't survive contact with that — the more contexts rk participates in, the harder it is to reason about exposure. - The spec is the canonical document for an audit-ready system. Drift between spec and implementation at this layer is itself the finding.
Reproduction or evidence. Read key-derivation.md § "Derivation
tree" and § "Forbidden cross-context usage". Grep for the info strings
in web/:
grep -rn "shieldfive/v1/envelope-key\|shieldfive/v1/metadata-key" web/
returns no hits — those derivations are not performed. Then read the four call sites listed above to confirm rk's direct use.
Recommendation. Two options; both require landing alongside a spec update.
Option A — implement the spec's hierarchy in the web app (preferred for audit readiness, larger change):
- On unlock, derive
envelope_key = HKDF(rk, "shieldfive/v1/envelope-key", 32)andmetadata_key_seed = HKDF(rk, "shieldfive/v1/metadata-key", 32). Keep both in memory alongside rk for the session. - Replace
wrappingKeyB64: rootKeywithwrappingKeyB64: envelopeKeyin the four call sites above. Top-level folder keys re-wrap during a one-shot migration. - Replace
deriveMetadataKey(rootKey)withderiveMetadataKey(metadata_key_seed)for top-level metadata; folder metadata continues to use the parent folder_key. - The share-password derivation already uses a domain-prefixed SHA-256, which is acceptable; consider switching to HKDF for consistency.
- The searchable-hash MAC key should be a separate HKDF derivation
(
HKDF(rk, "shieldfive/v1/name-hash-mac", 32)) — this is currently ad-hoc (SHA-256 of rk as the HMAC key).
Option B — update the spec to describe what ships (smaller change, weaker outcome):
- Rewrite
key-derivation.md§ "Derivation tree" to say "the application's master secret is used directly as the envelope/wrap key; the application MAY domain-separate via HKDF but is not required to. The library'senvelope_key/metadata_keyinfo strings are reserved for future use." - Remove
shieldfive/v1/envelope-keyandshieldfive/v1/metadata-keyfrom the "Forbidden cross-context usage" list, or move them to a separate "Reserved for future use" subsection.
Option A is the right answer for Phase 3 launch credibility: an external
auditor will ask "where is envelope_key in the code" and the answer
should be a grep hit, not a git log apology. Option B is acceptable as
a stopgap if Option A has to wait, but it makes the spec
weaker, not stronger.
Land alongside a test vector pinning HKDF(rk, "...envelope-key", 32)
output for rk = zeros(32) so future implementers can self-check.
Finding 2.3 — Web app uses Argon2id INTERACTIVE on already-high-entropy inputs (metadata key path)
Severity: Low Affects: web (filename / folder-name encryption) Component: implementation
Description. web/utils/metadataClient.ts's
getOrCreateKeyForSalt (lines 270-313)
runs the caller's secret through Argon2id at libsodium's INTERACTIVE
preset:
const params =
level === 'interactive'
? {
ops: sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE,
mem: sodium.crypto_pwhash_MEMLIMIT_INTERACTIVE,
}
: { ops: …OPSLIMIT_MODERATE, mem: …MEMLIMIT_MODERATE }
The secret argument, in every call site grepped, is a 32-byte random
key in base64 form: the rk (useFolderTree.ts:248,
useFiles.ts:337, 407), or a folder_key
(encryptModal.tsx:432, useFolderTree.ts:122, 410, 485), or a share
password (useSharing.ts:112 — this one IS low-entropy and the path is
appropriate).
When the input is already 32 bytes of CSPRNG output, Argon2id buys
nothing over HKDF — there is no password to stretch. The
key-stretching work happens on every file/folder load (no cache across
sessions because the per-blob salt is random and the cache is in-memory
on the MetadataKeyHandle). On a mobile device this is the dominant
cost in folder listing latency.
The level is INTERACTIVE explicitly "for cross-device compatibility" (comment at line 295) — i.e., to avoid 256 MiB allocations on phones. INTERACTIVE is libsodium's MIN (2 ops, 64 MiB). For a low-entropy password this would be borderline; for a 32-byte random key it is wasted work.
This is filed Low, not Medium, because:
- The output is still a 256-bit key under the security of Argon2id, the per-blob salt prevents rainbow-table attacks, and the encrypted-output itself is AES-GCM with random IV. No confidentiality loss.
- Replacing Argon2id with HKDF here is purely a performance/clarity improvement, not a security fix.
Reproduction or evidence. Read the four call sites cited above.
Confirm that the secret parameter is either rootKey (32 random
bytes, base64) or folderKey (32 random bytes, base64). The Argon2id
path is unreachable from a low-entropy password input except via the
share-password code path
(useSharing.ts:112).
Recommendation. Split the metadata-key derivation by input type:
- For folder-keyed metadata (filename, folder name within a known
folder): replace Argon2id with
HKDF(folderKey, salt=<16B per blob>, info="shieldfive/v1/metadata-key/folder", 32). Fast, no allocation. - For share-password-keyed metadata (the share name shown to the download-link recipient): keep Argon2id because the secret is a user-chosen password.
Apply a one-shot migration for existing folder-keyed metadata blobs; the
V4 record format already carries kdf in the payload
(metadataClient.ts:30-37),
so adding kdf: 'hkdf-sha256' is additive. Land with decryptMetadataClient
extended to handle both KDFs.
Pairs naturally with Finding 2.2's Option A — the metadata key for non-folder-scoped uses (filename hash, name search) would then be a proper HKDF derivation from rk.
Finding 2.4 — Spec underspecifies the ML-KEM-1024 seed (d, z) split convention
Severity: Low (Phase 3 prerequisite) Affects: spec Component: spec
Description. crypto/spec/key-derivation.md § "Derivation tree":
ml_kem_seed = HKDF(master, info="shieldfive/v1/pq-hybrid/ml-kem-1024-seed", L=64)
└── (ml_kem_pk, ml_kem_sk) = ML-KEM-1024.KeyGen(ml_kem_seed)
The implementation
(crypto/src/suites/pq-hybrid-v1/index.ts:117-126):
const seed = await hkdfSha256({
ikm: masterSecret,
info: 'shieldfive/v1/pq-hybrid/ml-kem-1024-seed',
length: 64,
})
return ml_kem1024.keygen(seed)
passes the 64-byte seed to @noble/post-quantum's ml_kem1024.keygen,
which splits it as d = seed.subarray(0, 32), z = seed.subarray(32)
(node_modules/@noble/post-quantum/src/ml-kem.ts:281-289):
keygen: (seed = randomBytes(seedLen)) => {
abytes(seed, seedLen, 'seed');
const { publicKey, secretKey: sk } = KPKE.keygen(seed.subarray(0, 32));
const publicKeyHash = HASH256(publicKey);
// (dkPKE||ek||H(ek)||z)
const secretKey = secretCoder.encode([sk, publicKey, publicKeyHash, seed.subarray(32)]);
...
}
That is, d = seed[0..32] is used as the K-PKE keygen seed (FIPS 203 §
Algorithm 16 input), and z = seed[32..64] is the implicit-rejection
seed concatenated into the secret-key blob.
The spec does not say this. An independent implementer using
crystals-kyber-js's generateKeyPairDerand(d, z) API, or a Python
binding, or a future noble version that changes the seed handling, has
no spec-level evidence for which 32-byte half to feed where. The
deterministic test vectors in tests/vectors/key-derivation.json
(referenced by key-derivation.md § "Test vectors") will catch a wrong
choice — but only if the implementer thinks to consult them, and only
post-implementation.
The two related concerns:
- Spec correctness. A specification that fully determines its outputs is the bar for an audit-ready system. The current spec is one line short of meeting it.
- Phase 3 readiness. PQ-hybrid is the default suite for new files
per
format-v1.md§ "Suite payloads / 0x03 (default)". When Phase 3 ships, identity-bundle interop with any reimplementation (mobile future port, a third-party client) becomes a deployment-blocking concern.
There is a related Task-2-instructions point about whether d and z
should be HKDF-derived with separate info strings rather than via a
single 64-byte expansion. By RFC 5869, HKDF-Expand with L=64 produces
T(1) || T(2)[:32] where T(1) = HMAC(PRK, info || 0x01) and
T(2) = HMAC(PRK, T(1) || info || 0x02) — i.e., d = T(1) and
z = T(2) are computationally indistinguishable from independent random
outputs under HKDF's PRF assumption. The single-expansion approach is
standard practice (Signal Protocol, NaCl, many KEM implementations) and
is not a finding. The spec gap is the split convention, not the
single-vs-double expansion.
Reproduction or evidence. Read key-derivation.md § "Derivation
tree". Independently, read FIPS 203 § Algorithm 16 (ML-KEM.KeyGen) — it
takes (d, z) as two 32-byte inputs without prescribing an ordering or
combined-seed convention. Read
crypto/node_modules/@noble/post-quantum/src/ml-kem.ts:281-289
to see the noble library's choice.
Recommendation. Add one paragraph to key-derivation.md after the
derivation-tree code block:
"The 64-byte
ml_kem_seedis split intod = ml_kem_seed[0..32]andz = ml_kem_seed[32..64], withdconsumed as ML-KEM-1024.KeyGen's random-coins input (FIPS 203 § Algorithm 16, parameterd) andzstored as the implicit-rejection seed (FIPS 203 § Algorithm 17, parameterz). Implementers using a library that exposes only a combined-seed API (e.g.,KeyGen(seed)) MUST verify that the library matches this split convention; the deterministic test vectors attests/vectors/key-derivation.jsonpin the expectedpk[0..8]/sk[0..8]bytes formaster_secret = zeros(32)."
Land alongside a test vector for master_secret = zeros(32) →
ml_kem_seed[0..16], pk[0..16], sk[0..16] so a third-party
implementation can self-check independently of the noble library
version. Status: Open, Phase 3 prerequisite.
Finding 2.5 — Two HKDF info strings are constructed inline at call sites, contradicting the library's own invariant
Severity: Low Affects: crypto library Component: implementation
Description. crypto/src/internal/hkdf.ts:2-7
states:
"Every domain-separated key the library derives from another key flows through this function. Domain strings are constants in
internal/types.ts— never construct aninfoparameter ad-hoc at the call site."
Eight of the library's ten HKDF call sites do follow this — they use the
HKDF_INFO constants from
crypto/src/internal/types.ts:78-86.
Two do not:
crypto/src/suites/pq-hybrid-v1/index.ts:122:info: 'shieldfive/v1/pq-hybrid/ml-kem-1024-seed',crypto/src/kdf/argon2id.ts:106:info: 'shieldfive/v1/argon2id/salt-compression',
Both strings are legitimate purpose-named info strings; the security
property they provide (domain separation from other HKDF calls) is
preserved. The issue is the documented invariant — a future contributor
who reads hkdf.ts's docstring and then needs to add a new HKDF
derivation has to choose between "follow the convention" and "follow
the precedent." That ambiguity is how typos become collisions.
Reproduction or evidence. Grep the library for hkdfSha256 and
filter to non-HKDF_INFO call sites:
grep -rn "hkdfSha256\b" crypto/src --include="*.ts" \
| grep -v "HKDF_INFO" | grep -v "test"
Yields the two call sites above. Cross-reference with HKDF_INFO in
types.ts to confirm the absence.
Recommendation. Add the two strings to HKDF_INFO:
export const HKDF_INFO = Object.freeze({
HEADER_MAC: 'shieldfive/v1/header-mac',
AES_GCM_CHUNK_KEY: 'shieldfive/v1/aes-gcm/chunk-key',
AES_GCM_NONCE_PREFIX: 'shieldfive/v1/aes-gcm/nonce-prefix',
XCHACHA_CHUNK_KEY: 'shieldfive/v1/xchacha/chunk-key',
XCHACHA_NONCE_PREFIX: 'shieldfive/v1/xchacha/nonce-prefix',
PQ_HYBRID_COMBINE: 'shieldfive/v1/pq-hybrid/combine',
ML_KEM_1024_SEED: 'shieldfive/v1/pq-hybrid/ml-kem-1024-seed', // new
ARGON2ID_SALT_COMPRESSION: 'shieldfive/v1/argon2id/salt-compression', // new
})
Update the two call sites to reference HKDF_INFO.ML_KEM_1024_SEED and
HKDF_INFO.ARGON2ID_SALT_COMPRESSION. No behavior change.
Finding 2.6 — Spec's "Forbidden cross-context usage" list omits shieldfive/v1/argon2id/salt-compression
Severity: Low Affects: spec Component: spec
Description. crypto/spec/key-derivation.md § "Forbidden
cross-context usage" enumerates nine info strings:
shieldfive/v1/header-mac
shieldfive/v1/aes-gcm/chunk-key
shieldfive/v1/aes-gcm/nonce-prefix
shieldfive/v1/xchacha/chunk-key
shieldfive/v1/xchacha/nonce-prefix
shieldfive/v1/pq-hybrid/combine
shieldfive/v1/pq-hybrid/ml-kem-1024-seed
shieldfive/v1/envelope-key
shieldfive/v1/metadata-key
The library uses a tenth string,
shieldfive/v1/argon2id/salt-compression
(crypto/src/kdf/argon2id.ts:106),
to HKDF-compress an oversize Argon2id salt down to 16 bytes. This is not
in the forbidden list. A host application could legitimately read the
spec, see it's not reserved, and reuse the string — with no
cross-context-reuse consequence today (host apps don't typically run
through deriveMasterSecret), but the list is the canonical reference
for which strings the crypto layer claims.
Reproduction or evidence. Diff the strings used in
crypto/src/ HKDF calls against the spec's list.
Recommendation. Add to the spec's forbidden-list block. Same edit satisfies the analogous gap if Finding 2.5's HKDF_INFO additions land (both new constants should be in the spec list).
While editing, consider whether shieldfive/v1/envelope-key and
shieldfive/v1/metadata-key belong on this list at all — they describe
derivations the library never performs (see Finding 2.2). Either move
them to a "Reserved for future use" subsection or remove them.
Finding 2.7 — Spec doesn't document that PQ-hybrid intentionally reuses the XChaCha HKDF info strings
Severity: Informational Affects: v1 (PQ-hybrid suite), spec Component: spec
Description. The PQ-hybrid suite's chunk-key and nonce-prefix
derivations reuse the XChaCha suite's HKDF info strings, with the
combined key K standing in for the content key:
crypto/src/suites/pq-hybrid-v1/index.ts:166-176:
// Reuse the xchacha derivation domain — semantically the same chunk-key role.
const chunkKey = await hkdfSha256({
ikm: combinedKey,
salt: fileId,
info: HKDF_INFO.XCHACHA_CHUNK_KEY, // "shieldfive/v1/xchacha/chunk-key"
length: 32,
})
const noncePrefix = await hkdfSha256({
ikm: fileId,
info: HKDF_INFO.XCHACHA_NONCE_PREFIX, // "shieldfive/v1/xchacha/nonce-prefix"
length: 16,
})
key-derivation.md § "Per-file key derivation" describes this
obliquely:
"For the PQ-hybrid suite,
content_keyis replaced by the combined keyK = HKDF(classical_share || pq_share, salt=file_id, info="shieldfive/v1/pq-hybrid/combine")."
This is correct but does not explicitly say "the chunk-key and
nonce-prefix info strings are the same as the XChaCha suite's, with K
substituted for content_key." An implementer might reasonably expect a
fresh info string of the form shieldfive/v1/pq-hybrid/chunk-key.
The reuse is cryptographically sound. The two callers — XChaCha-v1 with
ikm = contentKey (per-file random 32 bytes), and PQ-hybrid with
ikm = combinedKey (HKDF output, per-file 32 bytes via a different
input space) — produce identical HKDF outputs only when the IKMs
collide, which has 2⁻²⁵⁶ probability and is not exploitable. Both IKMs
are 256-bit uniformly random / pseudorandom under HKDF.
But the spec should say so explicitly. An auditor reading the spec at face value will flag the same info string appearing for two distinct suites; the answer "this is intentional and safe because the IKMs are fresh per file" is correct but should not be a verbal answer.
Reproduction or evidence. Read key-derivation.md § "Per-file key
derivation". Read
crypto/src/suites/pq-hybrid-v1/index.ts:163-178
including the inline comment at line 165.
Recommendation. Replace the spec paragraph with:
"For the PQ-hybrid suite,
content_keyis replaced by the combined keyK = HKDF(classical_share || pq_share, salt=file_id, info="shieldfive/v1/pq-hybrid/combine", L=32). The chunk-key and nonce-prefix derivations are then identical to the XChaCha suite's, withKin the role ofcontent_keyand the same info strings (shieldfive/v1/xchacha/chunk-keyandshieldfive/v1/xchacha/nonce-prefix). Reuse of those info strings across the two suites is safe because the IKM space is partitioned: XChaCha uses a fresh randomcontent_key, PQ-hybrid uses the file-bound HKDF outputK; both are 32 bytes of uniformly random / pseudorandom material per file, so the probability of cross-suite output collision is 2⁻²⁵⁶."
No code change.
Finding 2.8 — Salt-absent convention inherits the Task-1 Finding 1.2 ambiguity across all HKDF call sites
Severity: Informational Affects: crypto library + spec Component: spec
Description. key-derivation.md § "Derivation tree" states "Salt is
empty unless otherwise stated." Task 1 Finding 1.2 flagged the same
phrasing in format-v1.md for nonce-prefix derivations: "empty" is
ambiguous between RFC 5869's "absent salt" (HashLen zeros) and a literal
zero-length string, with the implementation using the former
(crypto/src/internal/hkdf.ts:20):
const salt = options.salt ?? new Uint8Array(32) // SHA-256 zero-salt fallback
key-derivation.md inherits this ambiguity. The affected derivations
(salt absent ⇒ 32 zero bytes in implementation):
envelope_key(spec only — unimplemented)metadata_key(spec only — unimplemented)ml_kem_seed(pq-hybrid-v1/index.ts:120-124)- nonce prefixes for both XChaCha and AES-GCM (already filed in Task 1 Finding 1.2)
Not re-filing as a fresh finding — it is the same root issue, just appearing in a different spec file. Cross-referencing here for completeness.
Recommendation. When fixing Task 1 Finding 1.2, also update
key-derivation.md § "Derivation tree" preamble:
"All derivations are HKDF-SHA-256. The
infostring is a printable domain separator. When the salt is omitted in the definitions below, it iszeros(32)per RFC 5869's 'absent salt' convention. Keys are 32 bytes unless otherwise stated."
Same test-vector posture as Task 1 Finding 1.2 — the existing test vectors pin the convention if any new derivation is added.
What I checked but did not find issues with
The following areas were reviewed against both spec and implementation; no drift was found or, where flagged, severity was Informational and the property held.
Argon2id parameters (crypto library)
- Master-secret output length is 32 bytes
(
argon2id.ts:32), matching the spec. - Salt length is enforced at ≥16 bytes; oversize salts are
HKDF-compressed to 16 bytes via a domain-separated info string
(
argon2id.ts:94-108). The compression is sound: HKDF-SHA-256 on a salt input produces a uniformly random 16-byte output, preserving Argon2id's salt-space size. - Empty passphrase is rejected
(
argon2id.ts:90-92). verifyPassphraseuses constant-time comparison (argon2id.ts:158-163).- The
'interactive'preset is deliberately not exposed (argon2id.ts:18-29). - MODERATE preset: 3 ops, 256 MiB (libsodium's
crypto_pwhash_OPSLIMIT_MODERATE/crypto_pwhash_MEMLIMIT_MODERATE). Above OWASP 2024 minimum for Argon2id (m=19 MiB, t=2). - SENSITIVE preset: 4 ops, 1 GiB. Above OWASP 2024 first-tier (m=46 MiB, t=1).
- The chosen preset is persisted in the output (
preset: Argon2idPreset,argon2id.ts:73-75) so re-derivation works without parameter ambiguity. - The implementation uses
sodium.crypto_pwhash_ALG_ARGON2ID13explicitly (argon2id.ts:122), pinning the Argon2id variant.
HKDF call sites (complete table, ten sites in crypto library)
| # | File:line | IKM | Salt | Info string | L (bytes) | Output role |
|---|---|---|---|---|---|---|
| 1 | format/header.ts:76 | contentKey (32B) | fileId (16B) | shieldfive/v1/header-mac | 32 | header MAC key |
| 2 | kdf/argon2id.ts:104 | oversize salt | absent (32 zeros) | shieldfive/v1/argon2id/salt-compression | 16 | Argon2id 16-byte salt |
| 3 | suites/aes-gcm-v1/index.ts:68 | contentKey (32B) | fileId (16B) | shieldfive/v1/aes-gcm/chunk-key | 32 | AES-GCM chunk key |
| 4 | suites/aes-gcm-v1/index.ts:80 | fileId (16B) | absent (32 zeros) | shieldfive/v1/aes-gcm/nonce-prefix | 4 | AES-GCM nonce prefix |
| 5 | suites/pq-hybrid-v1/index.ts:120 | masterSecret (32B) | absent (32 zeros) | shieldfive/v1/pq-hybrid/ml-kem-1024-seed | 64 | ML-KEM keygen seed (d‖z) |
| 6 | suites/pq-hybrid-v1/index.ts:146 | classical‖pq (64B) | fileId (16B) | shieldfive/v1/pq-hybrid/combine | 32 | combined key K |
| 7 | suites/pq-hybrid-v1/index.ts:166 | combinedKey K (32B) | fileId (16B) | shieldfive/v1/xchacha/chunk-key | 32 | PQ-hybrid chunk key (info reused with #9 — Finding 2.7) |
| 8 | suites/pq-hybrid-v1/index.ts:172 | fileId (16B) | absent (32 zeros) | shieldfive/v1/xchacha/nonce-prefix | 16 | PQ-hybrid nonce prefix (info reused with #10 — Finding 2.7) |
| 9 | suites/xchacha-v1/index.ts:65 | contentKey (32B) | fileId (16B) | shieldfive/v1/xchacha/chunk-key | 32 | XChaCha chunk key |
| 10 | suites/xchacha-v1/index.ts:75 | fileId (16B) | absent (32 zeros) | shieldfive/v1/xchacha/nonce-prefix | 16 | XChaCha nonce prefix |
Verified:
- No accidental info-string collisions. The only repeated info strings are #7↔#9 and #8↔#10, both intentional (Finding 2.7), and the IKMs are partitioned by suite-specific derivation (combinedKey K is a HKDF output, contentKey is fresh random; collision probability 2⁻²⁵⁶).
- No cross-version collisions. All info strings carry the
v1segment; the library has nov0HKDF derivations (v0 uses random IV prefix, no key derivation). - Domain separators are version + suite + purpose in every case.
- No cross-context reuse of
file_idsalt across keys. When fileId is the salt (HKDF-Extract input), it varies per file and is paired with different info strings per output; HKDF-Extract's PRF behavior partitions the outputs. - No HKDF call uses the master_secret as IKM AND a low-entropy salt together (sites 5 and 2 have absent salt = 32 zeros, which is the RFC 5869 default behavior and does not weaken the construction).
- Output lengths are appropriate to their use (32 bytes for keys, 4 for AES-GCM nonce prefix, 16 for XChaCha nonce prefix, 16 for Argon2id salt, 64 for ML-KEM seed).
ML-KEM-1024 keypair derivation
- Determinism verified:
deriveMlKemKeypair(zeros(32))produces the same(publicKey, secretKey)on every invocation by reading the implementation (no internal randomness past the HKDF expansion;ml_kem1024.keygen(seed)is a pure function of the seed per FIPS 203 Algorithm 16). - Domain separation from the other rk-derived keys: the info string
shieldfive/v1/pq-hybrid/ml-kem-1024-seedis unique inHKDF_INFO(or would be once Finding 2.5 lands). - Output length 64 bytes matches the noble library's
seedLen = 64contract (node_modules/@noble/post-quantum/src/ml-kem.ts:276). - ML-KEM-1024 lengths match FIPS 203 (1568 public key, 3168 secret key,
1568 ciphertext) — see
pq-hybrid-v1/index.ts:36-41. - The single-HKDF-then-split convention is filed as Finding 2.4 (spec
drift), not as a derivation defect — both
dandzARE HKDF-derived from the master secret, just from the same expansion. Standard practice and cryptographically sound. - The classical share
S_cfor the PQ-hybrid suite is generated asrandomBytes(32)per file (pq-hybrid-v1/index.ts:258) — fresh per encryption, not derived.
Content-key envelope (web app)
- CSK generation:
crypto.getRandomValues(new Uint8Array(32))viagenerateKeyB64(keyCrypto.ts:22-26). 256-bit uniformly random. - file_id generation:
randomBytes(16)from the crypto library'sruntime.ts(under the hoodcrypto.getRandomValues); see also the crypto library'sgenerateContentMaterial(aes-gcm-v1/index.ts:112-120). 16-byte random ID is sufficient for collision resistance over a user's vault. - CSK wrap algorithm: AES-GCM with random 12-byte IV stored alongside
(
keyCrypto.ts:52-68). IV is fresh per wrap (called once per file at upload). - CSK wrap IV is
crypto.getRandomValues(new Uint8Array(12)); with 2⁹⁶ IV space and per-wrap freshness, AES-GCM IV reuse is not a concern at this layer. - Password-wrapped CSKs (for shared files) use the same
wrapVaultKey→ PBKDF2-SHA-256 600 000 iter path as the vault key itself (useSharing.ts:141). Consistent with the vault path; Finding 2.1's Argon2id migration recommendation covers this too. - Folder-key wrap chain: each folder_key is wrapped by its parent's folder_key (or rk for top-level); fk_iv is random per wrap; no cross-folder reuse of the wrapped form is possible.
- The wrap operations do not bind a per-file or per-folder AAD
(
wrapKeyB64calls AES-GCM withoutadditionalData). This is benign in practice — a swappedcsk_wrappedwould unwrap to a CSK that does not match the on-disk file'sfile_id-bound chunk-key derivation, so decryption would fail at the chunk layer. The structural binding lives one layer down. - Recovery-key generation:
generateVaultKey()= 32 random bytes fromcrypto.getRandomValues. 256 bits of entropy, far above the 128-bit threshold. The user is shown the recovery key once and instructed to back it up.
Master-secret lifecycle (Step 6 critical questions)
The three critical recoverability questions from the Task 2 instructions, evaluated against the deployed web flow:
- Database alone (no password, no recovery key) → can attacker decrypt?
No. The DB has
rkWrappedByUk(AES-GCM-wrapped under PBKDF2(password)),rkWrappedByRec(AES-GCM-wrapped under recovery key), wrapped folder keys, wrapped CSKs, and Argon2id-encrypted metadata. None of the wrapping keys are in the DB; all are downstream of either the password or the recovery key. ✓ - Password alone (no recovery key) → full access?
Yes (as designed).
unwrapVaultKey({password, ...})derives the AES-GCM key via PBKDF2 and unwrapsrkWrappedByUkto recover rk; everything downstream then unwraps. ✓ - Recovery key alone (no password) → full access?
Yes (as designed).
resetWithRecoverycallsunwrapRootKeyWithRecovery(recoveryKey, rkWrappedByRec, recIv)(keyring.ts:293-305) and unwrapsrkWrappedByRecdirectly. Resetting via recovery key also re-wraps under a new password. ✓
Additional checks:
- The recovery-key path is independent of the password path: both wrap the same rk but with different keys and IVs, so compromising one does not leak the other.
- Regenerating the recovery key
(
keyring.ts:613-629) re-wraps rk under a fresh recovery key without changing rk itself — files remain readable. - Changing the password
(
keyring.ts:541-577) re-wraps rk under a fresh PBKDF2(newPassword) key; rk itself is not rotated. No file re-wrap is required.
Spec-vs-implementation cross-checks
- Every HKDF info string in code matches its spec definition byte-for-byte (HKDF_INFO constants vs. spec § "Derivation tree" and § "Forbidden cross-context usage", modulo the gap in Finding 2.6).
content_keyis fresh per file (web:generateKeyB64(); crypto:generateContentMaterial()), never derived. Matches spec.header_mac_key,chunk_key,nonce_prefixuse the salt and info pattern documented inkey-derivation.md§ "Per-file key derivation".- For PQ-hybrid, the combined key
Kis computed asHKDF(S_c‖S_pq, salt=file_id, info="shieldfive/v1/pq-hybrid/combine")— matches spec.
Out of scope (deferred to other tasks)
- Wire format / AAD construction — Task 1 (filed).
- AEAD nonce derivation per-chunk (uniqueness, cross-file, cross-key rotation) — Task 4.
- Streaming worker key handling beyond derivation (key zeroization, race conditions, worker isolation) — Task 5.
- Server-side and web-application surfaces (share handling, service-role boundary, RLS, database constraints, RPC authorization, CSP, in-memory key handling — sessionStorage encrypted blob, IndexedDB wrap-key, V2 fallback for Safari, page-close behavior — session / auth flow tying the unlock to the server token, recovery-key UX, logging) are out of crypto-layer scope and covered by the separate server-side review, kept private.
- Threat-model coverage — Task 3.
References
crypto/spec/key-derivation.mdcrypto/spec/format-v1.md(cross-references for nonce-prefix derivation, header MAC keying)@shieldfive/crypto1.0.0-alpha.3, repo HEADa5e99accrypto/src/kdf/argon2id.tscrypto/src/internal/hkdf.tscrypto/src/internal/hmac.tscrypto/src/internal/types.ts(HKDF_INFO constants)crypto/src/identity/index.tscrypto/src/format/header.ts(header_mac_key derivation)crypto/src/suites/aes-gcm-v1/index.tscrypto/src/suites/xchacha-v1/index.tscrypto/src/suites/pq-hybrid-v1/index.tscrypto/node_modules/@noble/post-quantum/src/ml-kem.ts(seed split convention)
- web app, repo HEAD
bfa41a57web/utils/keyring.tsweb/utils/keyCrypto.tsweb/utils/folderKeyCache.tsweb/utils/vaultKeyClient.tsweb/utils/metadataClient.tsweb/utils/sharePassword.tsweb/utils/keyHierarchyMoves.tsweb/utils/wrappingKeyStore.tsweb/app/files/hooks/useFolderTree.ts(rootKey usage, top-level folder wrap)web/app/files/hooks/useSharing.ts(share password wrap of CSK)web/app/files/components/modals/encryptModal.tsx(CSK generation + folder-key wrap on upload)
- OWASP Password Storage Cheat Sheet (2024 revision) — Argon2id and PBKDF2 parameter recommendations referenced for Findings 2.1 and 2.3.
- FIPS 203 (ML-KEM) — Algorithms 16/17 referenced for Finding 2.4's
(d, z)semantics.