Skip to content

(diversity) Refactor attribute diversity to use id trait#1186

Draft
narendatha wants to merge 64 commits into
mainfrom
u/narendatha/attribute_diversity_refactor
Draft

(diversity) Refactor attribute diversity to use id trait#1186
narendatha wants to merge 64 commits into
mainfrom
u/narendatha/attribute_diversity_refactor

Conversation

@narendatha

Copy link
Copy Markdown
Contributor

No description provided.

Mark Hildebrand and others added 30 commits April 28, 2026 14:47
Replace kind()-based string equality checks with explicit is_match()
and get() phase-shape helpers on plugin structs. This avoids fragile
ordering assumptions and makes each plugin responsible for recognising
its own phase shape.
Co-authored-by: Copilot <copilot@github.com>
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/benchmarks.rs
#	diskann-benchmark/src/backend/index/product.rs
#	diskann-benchmark/src/backend/index/scalar.rs
#	diskann-benchmark/src/backend/index/search/plugins.rs
#	diskann-benchmark/src/backend/index/spherical.rs
#	diskann-benchmark/src/inputs/graph_index.rs
#	diskann-benchmark/src/inputs/mod.rs
Co-authored-by: Copilot <copilot@github.com>
…ojection issue

This commit explores approaches to wire real candidate vectors into async determinant-diversity post-processing.

Current state: IN COMPILATION ERROR (intentional for analysis)

Attempted approaches:
1. Initial shim-trait FullPrecisionVectorAccessor with async get_full_precision_vector()
   - Resulted in 'implementation not general enough' at search_with() call

2. Removed explicit for<'a> post_processor::DeterminantDiversity bound
   - Still fails - the constraint is inherent in search_with() signature itself

Root cause analysis:
- search_with() requires: PP: for<'a> SearchPostProcess<S::SearchAccessor<'a>, T, O>
- This means post-processor must work for ANY accessor lifetime 'a
- But query = queries.row(query_idx) is borrowed for specific loop iteration lifetime
- These are fundamentally incompatible - a borrowed value can't satisfy for<'a> generically

Compiler errors (3 total):
- 'not general enough': implementation needed for or<'a> but found specific '0
- 'does not live long enough': queries lifetime too short for 'static requirement

Files modified:
- diskann-benchmark/src/backend/index/benchmarks.rs:
  * Removed explicit for<'a> post_processor::DeterminantDiversity constraint
  * Narrowed plugin impl to FullPrecisionProvider<f32>

- diskann-benchmark/src/backend/index/post_processor/determinant_diversity.rs:
  * Added shim trait FullPrecisionVectorAccessor
  * Async method get_full_precision_vector(&mut self, id) -> impl Future<...>

Next steps to investigate:
- Move determinant-diversity outside search_with() as post-processing reranking
- This avoids HRTB entirely by applying after candidates are returned
- Benchmark impact: measure recall/QPS with external reranking vs baseline

Related context:
- Disk index determinant-diversity works correctly (uses real vectors, shows 51-53% QPS cost)
- Shared algorithm fixed (distance-to-similarity scoring direction)
- Branch already merged with origin/main
Co-authored-by: Copilot <copilot@github.com>
…educe duplication

- Use for<'a, 'b> SearchStrategy bound (user-provided fix) to break HRTB lifetime
  projection issue in the search_with post-processor constraint
- Wire FullPrecisionVectorAccessor shim trait so async det-div post-processor fetches
  real candidate vectors instead of placeholder distances
- Populate QPS/latency metrics in async det-div benchmark path (previously all 'missing')
- Extract run_topk_timed helper to eliminate ~100 lines of duplicated loop/timing/recall
  machinery from DeterminantDiversity::run
- Update async-determinant-diversity.json example tag (async-index-build -> graph-index-build)
- Fix clippy::manual_async_fn in FullPrecisionVectorAccessor shim trait
- Use for<'a, 'b> SearchStrategy bound to resolve HRTB lifetime mismatch
- Wire FullPrecisionVectorAccessor shim trait for async det-div to fetch real vectors
- Implement DeterminantDiversity post-processor for async graph-index path
- Extract run_topk_timed helper to eliminate ~100 lines of code duplication
- Wire post_processor parameter to disk-index search pipeline
- Update search parameter handling and result counting for post-processed results
- Add TopkPostProcessor input type and necessary imports
- Populate QPS/latency metrics in async det-div benchmark path
Co-authored-by: Copilot <copilot@github.com>
- Add 4th type parameter PP to KNN with default of ()
- Add with_postprocessor() constructor for post-processor support
- Keep new() for standard KNN without post-processing
- Add accessor methods for index, queries, strategy, post_processor
- Update Search impl to work with generic KNN<DP, T, S, ()>
- Maintains backward compatibility - all existing code continues to work
- Provides foundation for DeterminantDiversity plugin refactoring
narendatha added 27 commits May 13, 2026 15:19
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/search/knn.rs
#	diskann-benchmark/src/inputs/graph_index.rs
#	diskann-benchmark/src/inputs/mod.rs
…rrors

Indented blocks in /// comments are interpreted as Rust doctests by rustdoc; the formula lines (e.g. 'alpha_i = ...', 'similarity(d) = ...') were being compiled and failed in CI. Wrap them in 	ext fences so they remain prose.
…plugins

# Conflicts:
#	diskann-disk/src/search/provider/disk_provider.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/inputs/graph_index.rs
…rix-based interface

- Updated diskann-providers determinant_diversity to accept MutMatrixView instead of Vec<(Id, f32, Vec<f32>)>
- Maintains identical algorithmic behavior and recall across all test datasets (SIFT Small, Enron)
- Improves memory layout efficiency with contiguous distance/ID arrays
- Updated all call sites: diskann-benchmark inmem and diskann-disk providers
- Verified with comprehensive A/B benchmarks:
  * Recall parity: 100% match vs archived baselines
  * SIFT Small (256 vectors): -45% QPS (small dataset regression)
  * Enron (46,750 vectors): -5% to +13% QPS (larger dataset improvement)
- All tests passing, no panics or runtime errors
Replaces the previous post-processor sub-enum pattern with a dedicated
TopkDeterminantDiversity variant on the SearchPhase enum, following the
established pattern for Range, BetaFilter, and MultiHopFilter variants.

- Add TopkDeterminantDiversityPhase struct with validate() that reuses
  DeterminantDiversityParams::new for power/eta validation
- Add TopkDeterminantDiversity variants to SearchPhase and SearchPhaseKind
- Wire plugins::DeterminantDiversity to use the new variant
- Remove now-redundant post_processor field from TopkSearchPhase
- Update example config (async-determinant-diversity.json) to new format
- DiskSearchPhase intentionally unchanged (separate code path)

Addresses PR #1011 review comment r3390077824.
…r dispatch

Replace the two Search impls on KNN (one for () default, one for explicit PP) and the Option<PP> + expect(...) panic with a single AsPostProcessor trait. Defaulted resolves to the strategy's default post-processor; Forwarded<PP> carries an explicit one. KNN::new returns KNN<.., Defaulted>; KNN::with_postprocessor returns KNN<.., Forwarded<PP>>. The ConfiguredPostProcessor marker trait is removed.
…bility

Remove the brittle hardcoded checks in JSON-input validation that bailed when topk-determinant-diversity was paired with non-Float32 / PQ / SQ / spherical / streaming graph builds. The plugin registry already provides the same compatibility guarantee at run time (only Float32 FullPrecision registers DeterminantDiversity), and produces a generic 'Unsupported search phase' error listing the kinds the backend actually supports.
Adds test_disk_search_determinant_diversity exercising the new SearchPostProcessorKind::DeterminantDiversity path through DiskIndexSearcher::search end-to-end. The test asserts: result count matches k, all selected ids are a subset of the L=20 candidate pool, no duplicates, top-1 matches the nearest neighbor (greedy invariant), pure-greedy (eta=0) preserves the candidate-pool subset invariant, and the vector_filter is honored by DeterminantDiversityAndFilter.
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/benchmarks.rs
#	diskann-benchmark/src/inputs/graph_index.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/inputs/graph_index.rs
…plugins

# Conflicts:
#	diskann-benchmark/src/disk_index/search.rs
#	diskann-benchmark/src/index/benchmarks.rs
#	diskann-benchmark/src/index/search/plugins.rs
…plugins

# Conflicts:
#	diskann-benchmark-core/src/search/graph/knn.rs
…rait

Replace the AttributeValueProvider trait with a new DiverseId trait that exposes the attribute directly from the id. DiverseNeighborQueue is now generic over I: DiverseId and stores ids directly, dropping the Arc provider field. DiverseSearchParams becomes non-generic and the Diverse search adds a DP::InternalId: DiverseId bound. Unit tests rewritten around a TestId type.
@narendatha narendatha changed the base branch from main to u/narendatha/det_div_plugins June 18, 2026 15:12
@narendatha narendatha force-pushed the u/narendatha/attribute_diversity_refactor branch from b06180f to d4930c1 Compare June 18, 2026 15:55
Base automatically changed from u/narendatha/det_div_plugins to main June 20, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants