Skip to content

Refactor Isolation Forest builtin and add edge-case tests#2498

Draft
PeilinChen01 wants to merge 2 commits into
apache:mainfrom
PeilinChen01:lde-iforest-cleanup
Draft

Refactor Isolation Forest builtin and add edge-case tests#2498
PeilinChen01 wants to merge 2 commits into
apache:mainfrom
PeilinChen01:lde-iforest-cleanup

Conversation

@PeilinChen01

Copy link
Copy Markdown

Summary

This pull request contains the midterm progress for the Isolation Forest builtin implementation in SystemDS.

Main changes:

  • Refactored the Isolation Forest training and scoring logic.
  • Clamped subsampling_size to the actual number of input rows when necessary.
  • Added support for single-row scoring in outlierByIsolationForestApply.
  • Added support for single-tree models.
  • Added handling for constant data where no valid split is possible.
  • Made seeded training reproducible.
  • Added edge-case tests for the Isolation Forest builtin.

Tests

The following command passes locally:

mvn -Dtest=BuiltinIsolationForestTest test

Covered test cases include:

  • Basic model training and scoring
  • Hybrid execution mode
  • Subsampling-size clamping
  • Single-row apply
  • Single-tree model
  • Constant data
  • Seed reproducibility
  • Anomaly ranking

rbbozkurt and others added 2 commits January 29, 2026 20:10
This patch promotes the existing Isolation Forest algorithm implementation from
the staging phase to builtin status, with improvements. The implementation provides
two main builtins, outlierByIsolationForest for training iForest models and
outlierByIsolationForestApply for scoring samples based on trained models.
Specifically, we optimized the algorithm with vectorized harmonic number
computation for improved scalability. The patch extends test coverage in
`staging/isolationForestTest.dml` with comprehensive tests, and Python API
integration tests. Refer to JIRA for detailed discussions.

Related to apache#1980

Co-authored-by: keremaras1 <60196502+keremaras1@users.noreply.github.com>
Co-authored-by: denizzqq <denizdolanmaz@rocketmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants