adix/embist

Source   Edit  

This is like bist.nim, but grows weight added to counters (some kind of floating point) exponentially as new updates happen. Up to floating point arithmetic, this is equivalent to ctrs vector*= wOld; ctr[i] += 1. Doing it this way allows BIST use for efficient exponentially weighted moving quantile. It does need rescaling BUT this can be made rare. For wOld=0.9, 1.11^(6736|842) =~ 1.8e308|3.4e38 ~= dbl_max. Re-scaling to tinies can ~2X that to ~13k|1.7k data points. Rescaling is very CPU-vectorizable & so should be 8..16x*lg(nBin) =~ 80..320X faster per it; <=~2X to total amortized cost at <136..270,000 bins.

Like EWMAs, IF YOU NEVER dec then this filter has technically infinite, yet usually short time-memory, but can operate only with a small histogram space. EWMoving Median inherits a usual high breakdown point/robustness in that, even if infinite memory/having enduring time-breakdown, influencing the median takes MUCH more than one wild data point long ago in history - it takes a downright different epoch / non-stationarity which is not nearly as bothersome.

Also, note that weighting styles of distributions & averages are analogous but distinct. *=wOld,+=(1-wOld)*x is a normalized average, but weight in distros is about "faked-repetition" of relative weight. So, while things like half-life & lag are the same formulae, meaning varies since distros have MANY interacting relative weights. (See LMBist for linear weight moving medians w/non-flat recency weight + strict windows or Bist for flat, strict windows.) The API is the same as Bist[F]. The bottom of this module has a small test/timing program showing the differences.

Happy to cite someone, but as far as I can tell, this is a novel (though fairly obvious) application of Fenwick BISTs for a fast EWMMedian transform. Luxenberg & Boyd (2024) "Exponentially Weighted Moving Models" does something ~100X more complex & surely slower than the one-pass O(n*lg nBin) (20 ns/item!) way done here. Coleman at https://randorithms.com/2022/01/31/streaming-histograms.html has some nice animations, but it has pedagogical/poorly scaling O(n*nBin) code. It seems likely someone doing big data analytics has this somewhere, though and I am happy to give credit when due. Similarly, please cite this github repo if this code inspires your work.

Types

EMBist[F] = object
Exponentially weighted moving distrib. Source   Edit  

Procs

proc cdf[F](d: EMBist[F]; i: int): F
wrap Bist.cdf Source   Edit  
proc clear[F](d: var EMBist[F])
Source   Edit  
proc count[F](d: EMBist[F]): F
Total Weight Source   Edit  
proc dec[F](d: var EMBist[F]; i: int; w: F = 1)
Un-count-old operation for more rare EW with strict windows; Use .scale! Source   Edit  
proc inc[F](d: var EMBist[F]; i: int; w: F = 1)
Add weight w to bin i Source   Edit  
proc init[F](d: var EMBist[F]; len: int; wOld: float = 0.9375)
Source   Edit  
proc initEMBist[F](len: int; wOld: float): EMBist[F]
Source   Edit  
proc invCDF[F](d: EMBist[F]; s: F; s0, s1: var F): int
wrap Bist.invCDF Source   Edit  
proc invCDF[F](d: EMBist[F]; s: F; s0: var F): int
wrap Bist.invCDF Source   Edit  
proc len[F](d: EMBist[F]): int
Number of bins & bytes Source   Edit  
proc max[F](d: EMBist[F]): int
Simple wrapper of Bist.max. Source   Edit  
proc min[F](d: EMBist[F]): int
Simple wrapper of Bist.min. Source   Edit  
proc nCDF[F](d: EMBist[F]): seq[F]
Source   Edit  
proc nPDF[F](d: EMBist[F]): seq[F]
Source   Edit  
proc pmf[F](d: EMBist[F]; i: int): F
wrap Bist.pdf Source   Edit  
proc quantile[F](d: EMBist[F]; q: float): float
wrap Bist.quantile Source   Edit  
proc quantile[F](d: EMBist[F]; q: float; iL, iH: var int): float
wrap Bist.quantile Source   Edit  
proc scale[F](d: EMBist[F]; age: int): F
Scale for more rare un-count old; Can re-use if dec @same relative age. Source   Edit  
func space[F](d: EMBist[F]): int
Source   Edit  
proc tot[F](d: EMBist[F]): F
Raw total Source   Edit  
proc up[F](d: var EMBist[F])
Simple no-op for EMBist Source   Edit