Skip to content

feat(steal): framework-aware code indexing#41

Open
justi wants to merge 1 commit into
mainfrom
feat/framework-aware-indexing
Open

feat(steal): framework-aware code indexing#41
justi wants to merge 1 commit into
mainfrom
feat/framework-aware-indexing

Conversation

@justi

@justi justi commented May 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • Steal indexer now consults a per-repo profile (Rails / Ruby gem / Python web / Python lib / JS frontend / Node) detected from the manifest, and filters git ls-files through include/exclude rules before windowing. Repos with no recognisable manifest fall back to the previous "everything tracked" behaviour.
  • Excludes match anywhere in the path so migrations/ catches src/<pkg>/migrations/ for src-layout Django apps, not just root-level ones.
  • Side benefits: steal_no_results table for over-prune detection, diagnostic armillary scan --report-profiles flag, per-repo (profile, files_indexed, files_skipped) recorded to metadata so a custom layout indexing nothing is observable.
  • code_index schema_version bumps → drop-and-rebuild on first scan after upgrade.

Test plan

  • pytest -q — 491 passed
  • ruff check . and ruff format --check . — clean
  • Manual smoke: armillary scan --report-profiles on the local portfolio shows the expected breakdown across rails / ruby_gem / python_lib / js_frontend / node / unknown rows
  • Smoke run with intentionally flat-layout Python repo correctly falls back to unknown instead of whitelisting src/ and indexing zero files

🤖 Generated with Claude Code

Steal previously walked every tracked file in every repo, so a Rails
app's `db/migrate/2024_billing.rb` ranked alongside `app/models/billing.rb`
on a "billing" query — match-quality became a noise problem rather than
a ranking problem.

`code_block_service` now consults a per-repo profile detected from the
manifest (Gemfile, pyproject, package.json). The profile decides which
directories feed the 40-line window machinery: app/ + lib/ for Rails,
src/ for src-layout Python, etc. Excludes (`migrations/`, `db/migrate/`)
match anywhere in the path so they catch nested Django apps too. Repos
with no recognisable manifest fall back to `unknown` and keep the
previous "everything tracked" behaviour, so markdown- or shell-heavy
projects index normally.

Side benefits wired in alongside:

- `steal_no_results` table tracks zero-hit queries so over-pruning
  surfaces in days, not months
- diagnostic `armillary scan --report-profiles` aggregates the breakdown
- code_index SCHEMA_VERSION bump → drop-and-rebuild on first scan
  after upgrade
- per-repo (profile, files_indexed, files_skipped) recorded to
  metadata_json, observable when a custom layout indexes nothing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant