Source Edit

This provides a simple regular histogram with the bist.nim interface, but using vanilla bin counters mostly so it can be used with adix/xhist1. Bin incs are fast & adix/cumsum can be at least tolerable via parallel prefix sum on x86_64. Re-scaling|shifting bins can also be done (externally) post-decrement & pre-increment and is very vectorizable. This enables general weight decay, not only exponential/linear/flat, but adds per-point expense proportional to nBin. For specific weight decays this could be ~2X optimized some by "always counting up", as with Fenwick-backed embist/lmbist, but we hold off on that for now to provide a distinct calculation with distinct FP math trade offs.

Performance break-even vs BIST methods depends on (at least!) counter size / vectorization, platform, time kernel, and work load. Ballpark expectations might be to use this <= ~300 bins for 1:1 ratios of dec, inc & quantile with this being about 4X faster for 8 bins & nBin/lg(nBin) slower for larger nBin. This can be strictly faster if your use case has many counter updates per quantile query. This can also potentially be accelerated by GPUs (especially in the context of transforming whole, already existing arrays rather than online / incremental transformation). The bottom of this module has a small test/timing program against a bist.

Imports

cumsum

Types

Hist[N] = object cnt*, csum*: seq[N] ## PDF/PMF/counter array and its cumulative sum tot*: N ## csum[^1], but always up to date dirty*: bool ## Flag indicating if .csum may be out of date from .cnt: Simple Histogram Source Edit

Procs

proc `$`[N](h: Hist[N]): string: Source Edit
proc cdf[N](h: Hist[N]; i: int): N: Source Edit
proc clear[N](h: var Hist[N]): Source Edit
proc count[N](h: Hist[N]): N: Total Weight Source Edit
proc dec[N](h: var Hist[N]; i: int; w: N = 1): Subtract weight w from bin i & .tot; set dirty Source Edit
proc inc[N](h: var Hist[N]; i: int; w: N = 1): Add weight w to bin i & .tot; set dirty Source Edit
proc init[N](h: var Hist[N]; len: int): Source Edit
proc initHist[N](len: int): Hist[N]: Source Edit
proc invCDF[N](h: Hist[N]; s: N): (int, N): For 0 < s <= tot return (i0,s0) so sum(..<i0)=s0 < s and sum(..i0)>=s Source Edit
proc invCDF[N](h: Hist[N]; s: N; s0, s1: var N): int: For 0 < s <= tot, find i0,s0,s1 so s0 < s <= s1 and s0+pmf(i0)==s1. Source Edit
proc invCDF[N](h: Hist[N]; s: N; s0: var N): int: For 0 < s <= tot, bracket ECDF jump >= s. I.e. find i0, s0 so s0 = sum(..< i0) < s yet sum(..i0) >= s in lgCeil n array probes. Source Edit
proc len[N](h: Hist[N]): int: Number of bins & bytes Source Edit
proc max[N](h: Hist[N]): int: Simple wrapper: invCDF(h,h.count). Source Edit
proc min[N](h: Hist[N]): int: Simple wrapper: invCDF(h, 1) Source Edit
proc nCDF[N](h: Hist[N]): seq[float32]: Source Edit
proc nPDF[N](h: Hist[N]): seq[float32]: Source Edit
proc pmf[N](h: Hist[N]; i: int): N: Source Edit
proc quantile[N](h: Hist[N]; q: float): float: Parzen-interpolated quantile when no caller index mapping is needed Source Edit
proc quantile[N](h: Hist[N]; q: float; iL, iH: var int): float: Parzen-interpolated quantile; E.g., q=0.9 => 90th percentile. answer = result*iL + (1-result)*iH, but is left to caller to do { in case it is mapping larger numeric ranges to/from iL,iH }. Tm ~ 2*lg(addrSpace). Unlike other (broken!) quantile-interpolation methods, Parzen's connects midpoints of vertical CDF jumps, not horizontal. This makes more sense, corresponding to Wilcoxon 1945 & later tie mid-ranking recommendations. Source Edit
func size[N](h: Hist[N]): int: Source Edit
func space[N](h: Hist[N]): int: Source Edit
proc tot[N](h: Hist[N]): N: Raw total Source Edit
proc up[N](h: var Hist[N]): Update .csum field after various inc/dec's Source Edit

adix/hist

Imports

Types

Procs