adix/hist

Source   Edit  

This provides a simple regular histogram with the bist.nim interface, but using vanilla bin counters mostly so it can be used with adix/xhist1. Bin incs are fast & adix/cumsum can be at least tolerable via parallel prefix sum on x86_64. Re-scaling|shifting bins can also be done (externally) post-decrement & pre-increment and is very vectorizable. This enables general weight decay, not only exponential/linear/flat, but adds per-point expense proportional to nBin. For specific weight decays this could be ~2X optimized some by "always counting up", as with Fenwick-backed embist/lmbist, but we hold off on that for now to provide a distinct calculation with distinct FP math trade offs.

Performance break-even vs BIST methods depends on (at least!) counter size / vectorization, platform, time kernel, and work load. Ballpark expectations might be to use this <= ~300 bins for 1:1 ratios of dec, inc & quantile with this being about 4X faster for 8 bins & nBin/lg(nBin) slower for larger nBin. This can be strictly faster if your use case has many counter updates per quantile query. This can also potentially be accelerated by GPUs (especially in the context of transforming whole, already existing arrays rather than online / incremental transformation). The bottom of this module has a small test/timing program against a bist.

Types

Hist[N] = object
  cnt*, csum*: seq[N]        ## PDF/PMF/counter array and its cumulative sum
  tot*: N                    ## csum[^1], but always up to date
  dirty*: bool               ## Flag indicating if .csum may be out of date from .cnt
Simple Histogram Source   Edit  

Procs

proc `$`[N](h: Hist[N]): string
Source   Edit  
proc cdf[N](h: Hist[N]; i: int): N
Source   Edit  
proc clear[N](h: var Hist[N])
Source   Edit  
proc count[N](h: Hist[N]): N
Total Weight Source   Edit  
proc dec[N](h: var Hist[N]; i: int; w: N = 1)
Subtract weight w from bin i & .tot; set dirty Source   Edit  
proc inc[N](h: var Hist[N]; i: int; w: N = 1)
Add weight w to bin i & .tot; set dirty Source   Edit  
proc init[N](h: var Hist[N]; len: int)
Source   Edit  
proc initHist[N](len: int): Hist[N]
Source   Edit  
proc invCDF[N](h: Hist[N]; s: N): (int, N)
For 0 < s <= tot return (i0,s0) so sum(..<i0)=s0 < s and sum(..i0)>=s Source   Edit  
proc invCDF[N](h: Hist[N]; s: N; s0, s1: var N): int
For 0 < s <= tot, find i0,s0,s1 so s0 < s <= s1 and s0+pmf(i0)==s1. Source   Edit  
proc invCDF[N](h: Hist[N]; s: N; s0: var N): int
For 0 < s <= tot, bracket ECDF jump >= s. I.e. find i0, s0 so s0 = sum(..< i0) < s yet sum(..i0) >= s in lgCeil n array probes. Source   Edit  
proc len[N](h: Hist[N]): int
Number of bins & bytes Source   Edit  
proc max[N](h: Hist[N]): int
Simple wrapper: invCDF(h,h.count). Source   Edit  
proc min[N](h: Hist[N]): int
Simple wrapper: invCDF(h, 1) Source   Edit  
proc nCDF[N](h: Hist[N]): seq[float32]
Source   Edit  
proc nPDF[N](h: Hist[N]): seq[float32]
Source   Edit  
proc pmf[N](h: Hist[N]; i: int): N
Source   Edit  
proc quantile[N](h: Hist[N]; q: float): float
Parzen-interpolated quantile when no caller index mapping is needed Source   Edit  
proc quantile[N](h: Hist[N]; q: float; iL, iH: var int): float
Parzen-interpolated quantile; E.g., q=0.9 => 90th percentile. answer = result*iL + (1-result)*iH, but is left to caller to do { in case it is mapping larger numeric ranges to/from iL,iH }. Tm ~ 2*lg(addrSpace). Unlike other (broken!) quantile-interpolation methods, Parzen's connects midpoints of vertical CDF jumps, not horizontal. This makes more sense, corresponding to Wilcoxon 1945 & later tie mid-ranking recommendations. Source   Edit  
func size[N](h: Hist[N]): int
Source   Edit  
func space[N](h: Hist[N]): int
Source   Edit  
proc tot[N](h: Hist[N]): N
Raw total Source   Edit  
proc up[N](h: var Hist[N])
Update .csum field after various inc/dec's Source   Edit