Skip to content

pharo-containers/Containers-PersistentArray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Containers-PersistentArray

A purely functional Persistent Array providing guaranteed O(log N) time and space complexity per update. Every modification returns a new version handle without mutating prior state, enabling safe structural sharing across versions.

Pharo 14+ License MIT


What is a Persistent Array?

A Persistent Array is a fully functional data structure that preserves all previous versions of itself after any modification. Unlike a conventional mutable array (where an update silently overwrites a cell and destroys history), a Persistent Array applies Path Copying: only the O(log N) nodes along the root-to-leaf path of the modified index are reallocated. Every other node in the tree is shared by pointer with the previous version.

The underlying representation is a perfectly balanced, pointer-based binary tree. An array of logical size N is stored as a complete binary tree of height ⌈log₂ N⌉. Each leaf holds one element; internal nodes carry no data. An index lookup or update traverses the tree by inspecting successive bits of the index, reaching the target leaf in exactly ⌈log₂ N⌉ steps.

Version 0 (root0)                 Version 1 (root1) after update at index 3
       [*]                                           [*']
      /   \                                         /    \
   [*]    [*]      ──►     shared, untouched  →   [*]   [*']   
   / \    / \                                     / \    / \
  1   2  3   4                                   1   2  3'  4  ← shared, untouched
                                                        ↑ only this leaf is new

Untouched subtrees are shared between versions. Structural sharing is directly observable in Pharo using the identity operator (==):

v0 := CTPersistentArray new: 4.
v1 := v0 at: 3 put: 99. "Update index 3 (in the right subtree)"

(v0 leftSubtree)  == (v1 leftSubtree).    "=> true  — shared by pointer"
(v0 rightSubtree) == (v1 rightSubtree).   "=> false — new path was copied"

Loading

To install Containers-PersistentArray, open the Playground (Ctrl + O + W) in your Pharo image and execute the following Metacello script (select it and press Do-it or Ctrl+D):

Metacello new
  baseline: 'ContainersPersistentArray';
  repository: 'github://pharo-containers/Containers-PersistentArray/src';
  load.

If you want to depend on it

Add the following snippet to your own Metacello baseline or configuration:

spec
  baseline: 'ContainersPersistentArray'
  with: [ spec repository: 'github://pharo-containers/Containers-PersistentArray/src' ].

Why use Containers-PersistentArray?

Standard mutable Arrays execute incredibly fast O(1) updates, but they are destructive, the previous state is permanently lost. To achieve undo/redo capabilities, developers traditionally have to Copy the entire array before every mutation, introducing catastrophic O(N) time and memory overheads.

Containers-PersistentArray bridges this gap, providing full time-travel history without the massive O(N) copying penalty:

Operation Standard Array (No History) Array + Copy (With History) CTPersistentArray (Smart History)
Read at index O(1) O(1) O(log N)
Write at index O(1) destructive O(N) full copy O(log N) path copy
Memory per write 0 in-place O(N) O(log N)
Time-Travel ✘ Lost ✓ Yes ✓ Yes

Key Benefits

  • Guaranteed O(log N) Updates: Path Copying allocates exactly ⌈log₂ N⌉ new nodes per write, never more.
  • Full Version History: Every at:put: returns a new version handle. Old handles remain valid indefinitely.
  • Structural Sharing: Unmodified subtrees are reused across versions, verified with Pharo's == identity operator.
  • No Hidden O(N) Copies: Unlike copying the whole array, profiling confirms that Array>>new: accounts for ≤ 0.8% of CPU time under adversarial load.

Basic Usage

"Create a Persistent Array of size 8 -> all slots initialised to nil"
arr0 := CTPersistentArray new: 8.

"Write a value -> returns a NEW version `arr0 is unchanged`"
arr1 := arr0 at: 3 put: 'hello'.
arr2 := arr1 at: 6 put: 42.

"Read from any version independently"
arr0 at: 3.   "=> nil"
arr1 at: 3.   "=> 'hello'"
arr2 at: 3.   "=> 'hello'"
arr2 at: 6.   "=> 42"

"Verify structural sharing -> untouched subtrees are identical objects"
(arr0 rightSubtree) == (arr1 rightSubtree).   "=> true"

"load via a collection"
arr3 := CTPersistentArray withAll: #(10 20 30 40 50 60 70 80).

"Iterate in index order"
arr3 doWithIndex: [ :val :i | Transcript show: i printString , ' -> ' , val printString; nl ].

"Functional map -> returns a fresh persistent array"
doubled := arr3 collect: [ :each | each * 2 ].
doubled at: 1.   "=> 20"

"Size"
arr3 size.   "=> 8"

Undo / Redo in Few Lines

Because CTPersistentArray is purely functional, tracking history is as simple as storing the returned arrays in a standard collection. The array is the version.

history := OrderedCollection new.
arr0 := CTPersistentArray new: 1024.

"Track the baseline state"
history add: arr0.

"Perform edits and store the resulting new arrays"
arr1 := arr0 at: 512 put: 'draft'.
history add: arr1.

arr2 := arr1 at: 512 put: 'final'.
history add: arr2.

"Undo: restore the previous state instantly by retrieving it from history"
previousState := history at: 2.
previousState at: 512. "=> 'draft'"

"The current state remains perfectly intact"
arr2 at: 512. "=> 'final'"

Performance & Empirical Proof

To bypass the biases and observer effects inherent in sampling-based profilers (where the act of recording execution changes the results), this implementation utilizes a custom isolation benchmarker. By using a high-precision microsecond clock and directly querying VM parameters, we capture the true execution time alongside full and incremental Garbage Collection overhead.


1. Standard Operations (Execution & GC Overhead)

Operation Scale / Workload Normal Array Persistent Array (With History) Relative Overhead
Traversal (Read) $N = 1,000$ 14 µs (0.00% GC) 410 µs (0.00% GC) ~29x slower
$N = 10,000$ 32 µs (0.00% GC) 7,184 µs (0.00% GC) ~224x slower
$N = 100,000$ 318 µs (0.00% GC) 110,872 µs (9.92% GC) ~349x slower
$N = 1,000,000$ 2,964 µs (0.00% GC) 1,197,892 µs (8.19% GC) ~404x slower
Insertion (Write) $N = 1,000$ 10 µs (0.00% GC) 576 µs (0.00% GC) ~58x slower
$N = 10,000$ 44 µs (0.00% GC) 14,443 µs (20.77% GC) ~328x slower
$N = 100,000$ 391 µs (0.00% GC) 354,802 µs (65.13% GC) ~907x slower
$N = 1,000,000$ 4,159 µs (0.00% GC) 4,441,238 µs (69.37% GC) ~1,068x slower
Shop Inventory Simulation 10k items / 50k trans. 419 µs (0.00% GC) 128,041 µs (47.64% GC) ~306x slower

Technical Analysis

  • Complexity Verification: Results verify strict $O(\log N)$ complexity for both traversal and insertion operations.
  • Path-Copying Signature: The significantly higher relative GC percentages (reaching 69.37%) in write-heavy workloads are the definitive empirical signature of the path-copying algorithm.
  • Memory Overhead: The 69.37% GC rate for 1,000,000 updates reflects the VM's intensive effort to reclaim short-lived internal nodes generated to preserve immutability.
  • Audit Capability: The Shop Inventory Simulation proves the primary value proposition. While a Normal Array is destructive and maintains no time-travel history, the persistent implementation allows for instantaneous access to any historical state (e.g., Version 0 vs. Version 50,000) with no manual bookkeeping.

Reproducibility: Anyone can verify these numbers. The Shop Inventory Simulation is fully documented and implemented as the runShopInventorySimulation method within the CTPersistentArrayBenchmark class. Simply execute it in your image to observe the overhead firsthand.



2. Time-Travel (History Preservation) Benchmarks

To demonstrate the primary value proposition of CTPersistentArray, we benchmarked a "Time-Travel" scenario. In this test, every state of the collection must be preserved after an update. The standard Array requires a full copy for every transaction to support Time Travel, whereas CTPersistentArray achieves this through path-copying.

Scenario (Size / Updates) Normal Array (Full Copy) Persistent Array (Path Copying) Outcome
Size: 100 / Updates: 100 27 μs (0.00% GC) 95 μs (0.00% GC) Normal Array is ~3.5x faster
Size: 1,000 / Updates: 100 81 μs (0.00% GC) 103 μs (0.00% GC) Normal Array is ~1.2x faster
Size: 1,000 / Updates: 1,000 3,968 μs (75.60% GC) 939 μs (0.00% GC) Persistent is ~4x faster
Size: 10,000 / Updates: 1,000 254,079 μs (93.67% GC) 1,088 μs (0.00% GC) Persistent is ~233x faster
Size: 100,000 / Updates: 1,000 8,314,819 μs (96.33% GC) 1,236 μs (0.00% GC) Persistent is ~6,727x faster

Time-Travel Analysis:

  • The GC Memory Wall: As the dataset scales, maintaining full copies of standard arrays causes catastrophic memory pressure. At 100,000 elements, the Pharo VM spends 96.33% of its execution time simply fighting the Garbage Collector to manage the massive influx of duplicated memory.
  • Structural Sharing Dominance: The persistent array executes the exact same 1,000 historical updates at scale with 0.00% GC overhead, operating over 6,700 times faster. This empirically proves that structural sharing is the only viable architecture for version history and time-travel mechanics.

Reproducibility: You can recreate this exact stress test to watch the standard array hit the memory wall on your own hardware. Execute the runTimeTravelComparisonSize:updates: method in the CTPersistentArrayBenchmark class to run the comparison.


Complexity Summary

Operation Time Space (incremental)
new: n O(N) O(N)
at: i O(log N) O(1)
at: i put: v O(log N) O(log N)
size O(1) O(1)
collect: O(N log N) O(N)
do: / doWithIndex: O(N) O(log N) stack

Contributing

This library is part of the Pharo Containers project. Contributions are welcome, whether implementing additional functional combinators, improving test coverage, or enhancing documentation. Please open an issue or pull request on GitHub.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors