Improve diskann-disk test coverage#1193
Conversation
Add 18 targeted unit tests covering previously untested error paths, control flow branches, and pure functions across 7 files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1193 +/- ##
==========================================
+ Coverage 89.95% 90.03% +0.07%
==========================================
Files 489 489
Lines 93127 93417 +290
==========================================
+ Hits 83772 84105 +333
+ Misses 9355 9312 -43
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR increases unit test coverage in the diskann-disk crate by adding targeted tests for previously untested branches and error paths across utilities, PQ scratch handling, sector graph reconfiguration, quantization generator validation, continuation helpers, and checkpoint record management.
Changes:
- Add unit tests for partition-count estimation edge cases (clamping, odd rounding, k_base multiplier).
- Add tests covering error-path and basic-success behavior in k-means clustering, PQScratch query validation, quant generator parameter validation, sector graph reconfiguration/defaulting, continuation processing behavior, and checkpoint invalidation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-disk/src/utils/partition.rs | Adds tests for estimate_initial_partition_count edge cases. |
| diskann-disk/src/utils/kmeans.rs | Adds tests for successful clustering output shape and cancellation error path. |
| diskann-disk/src/storage/quant/generator.rs | Adds test for missing compressed-file validation when resuming with offset. |
| diskann-disk/src/search/provider/disk_sector_graph.rs | Adds tests for reconfigure growth/no-op and block-size defaulting. |
| diskann-disk/src/search/pq/pq_scratch.rs | Adds tests for rejecting undersized queries and accepting oversized queries. |
| diskann-disk/src/build/chunking/continuation/utils.rs | Adds tests for early stop, yield-then-continue, and error propagation (sync/async). |
| diskann-disk/src/build/chunking/checkpoint/checkpoint_record_manager_with_file.rs | Adds tests for fresh-state behavior, missing-file completion state, and invalidation semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…crosoft/DiskANN into u/arrayka/diskann-disk-tests3
| #[test] | ||
| fn test_pq_scratch_set_accepts_oversized_query() { | ||
| let dim = 8; | ||
| let mut pq_scratch = PQScratch::new(64, dim, 4, 256).unwrap(); | ||
|
|
||
| // Query longer than dim should succeed (only first `dim` elements used) | ||
| let long_query: Vec<f32> = (1..=dim + 10).map(|i| i as f32).collect(); | ||
| pq_scratch.set(&long_query).unwrap(); | ||
|
|
||
| for (i, &val) in long_query.iter().enumerate().take(dim) { | ||
| assert_eq!(pq_scratch.query_scratch[i], val); | ||
| } | ||
| } |
There was a problem hiding this comment.
Is this expected behavior? Shouldn't the scratch fail for an incorrectly sized query?
There was a problem hiding this comment.
Yes, it is the expected behavior, according to the documentation of pq_scratch.set():
/// Copy the first `dim` elements of `query` into `query_scratch`.
///
/// `query` must already be in full-precision `f32` representation; quantized
/// inputs (e.g. `MinMaxElement`) should be decoded via `VectorRepr::as_f32`
/// at the caller boundary before invoking this method.
///
/// Accepts oversized `query` (only the first `dim` elements are used) for
/// backwards compatibility with callers that hold alignment-padded buffers.
/// Returns `DimensionMismatchError` if `query.len() < query_scratch.len()`.
pub fn set(&mut self, query: &[f32]) -> ANNResult<()>
There was a problem hiding this comment.
I followed the chain of calls to set() and new() and I don't think supporting larger input query dimension is actually needed here (the scratch length is derived from the dimension of the fp vectors in the PQ table, which is the same as the query dimension).
Can we fix this since we've found it? The reason I'm asking is cause this is actually incorrect behavior; I'm not actually sure how this ended up getting supported (It's probably my fault!). The f32 dimension is not larger than the dimension of minmax vectors, it's actually smaller :)
Adds 18 targeted unit tests to the
diskann-diskcrate covering previouslyuntested error paths, control flow branches, and pure functions.