adix/bist

Source   Edit  

Binary Indexed Sum Tree (BIST); Fenwick proposed "BIT" but that A) collides w/many uses B) takes partial (S)ums as implied, but explicit is better (though products can work) and C) does not rhyme with "dist" (for distribution - what it is mostly about). While Inet has tutorials, to my knowledge no one (yet) collects all these algos in one place. Fenwick1994 itself messed up invCDF, correcting w/a tech report a year later. This code only allocates needed space & uses 0-based array indexing. See https://en.wikipedia.org/wiki/Fenwick_tree

The idea of a standard binary heap with kids(k)@[2k],[2k+1] for dynamic distributions goes back to Wong&Easton 1980 (or earlier?). Fenwick's clever index encoding/overlaid trees idea allows using 1/4 to 1/2 that space (only max index+1 array elements vs 2*lgCeil(n)), a constant factor improvement. Good explanations truly need figures, as in the original Fenwick paper | Wikipedia.

The Bist[T] type in this module is generic over the type of counters used for partial sums|counts. For few total items, you can use a Bist[uint8] while for many you want to use Bist[uint32]. This can be space-optimized up to 2X further with adix/sequint specialized to store an array of B-bit counters. Ranked B-trees are faster for >24..28-bit index spaces as L3 CPU caching fails, but needing >7..8 decimal dynamic ranges is also rare.

Types

Bist[T] = object
  tot*: T
  data*: seq[T]
A razor thin wrapper around seq[T] Source   Edit  

Procs

proc `$`[T](t: Bist[T]): string
Source   Edit  
proc `[]`[T](t: Bist[T]; i: int): T
Source   Edit  
proc `[]`[T](t: var Bist[T]; i: int): var T
Source   Edit  
proc `[]=`[T](t: var Bist[T]; i: int; x: T)
Source   Edit  
proc cdf[T](t: Bist[T]; i: int): T
INCLUSIVE sum(pmf[0..i]), (rank,EDF,prefix sum,scan,..); Tm~1 bits in i Source   Edit  
proc clear[T](t: var Bist[T])
Source   Edit  
proc count[T](t: Bist[T]): T
Source   Edit  
proc dec[T](t: var Bist[T]; i: int; d: T)
Adjust for count decrement by d; Tm ~ 1/2..3/4 lg n Source   Edit  
proc fromCnts[T](t: var Bist[T])
In-place bulk convert/reformat t[] from counts to BIST; Max time ~1*n. Source   Edit  
proc inc[T](t: var Bist[T]; i: int; d: T)
Adjust for count increment by d; Tm ~ 1/2..3/4 lg n Source   Edit  
proc init[T](t: var Bist[T]; len: int)
Source   Edit  
proc initBist[T](len: int): Bist[T]
Source   Edit  
proc invCDF[T](t: Bist[T]; s: T): (int, T)
For 0 < s <= tot return (i0,s0) so sum(..<i0)=s0 < s and sum(..i0)>=s Source   Edit  
proc invCDF[T](t: Bist[T]; s: T; s0, s1: var T): int
For 0 < s <= tot, find i0,s0,s1 so s0 < s <= s1 and s0+pmf(i0)==s1. Source   Edit  
proc invCDF[T](t: Bist[T]; s: T; s0: var T): int
For 0 < s <= tot, bracket ECDF jump >= s. I.e. find i0, s0 so s0 = sum(..< i0) < s yet sum(..i0) >= s in lgCeil n array probes. Source   Edit  
proc len[T](t: Bist[T]): int
Source   Edit  
proc max[T](t: Bist[T]): int
Simple wrapper: invCDF(t,t.count). Source   Edit  
proc min[T](t: Bist[T]): int
Simple wrapper: invCDF(t, 1) Source   Edit  
proc nCDF[T](t: Bist[T]): seq[float32]
Return classic CDF from read-only BIST Source   Edit  
proc nPDF[T](t: Bist[T]): seq[float32]
Return classic PMF from read-only BIST Source   Edit  
proc pmf[T](t: Bist[T]; i: int): T
Probability Mass Function @i; Avg Tm ~ 2 probes; Max Tm ~ lg n Source   Edit  
proc quantile[T](t: Bist[T]; q: float): float
Parzen-interpolated quantile when no caller index mapping is needed Source   Edit  
proc quantile[T](t: Bist[T]; q: float; iL, iH: var int): float
Parzen-interpolated quantile; E.g., q=0.9 => 90th percentile. answer = result*iL + (1-result)*iH, but is left to caller to do { in case it is mapping larger numeric ranges to/from iL,iH }. Tm ~ 2*lg(addrSpace). Unlike other (broken!) quantile-interpolation methods, Parzen's connects midpoints of vertical CDF jumps, not horizontal. This makes more sense, corresponding to Wilcoxon 1945 & later tie mid-ranking recommendations. Source   Edit  
func space[T](t: Bist[T]): int
Source   Edit  
proc toCnts[T](t: var Bist[T])
In-place bulk convert/reformat t[] from BIST to counts; Max time ~1n *Unlike the others, this routine only works for power of 2-sized arrays. Source   Edit  
proc up[T](t: var Bist[T])
Simple no-op for BISTs Source   Edit