This module provides an (in|un)ordered multiset/multitable representation via Linear Probing with aging-friendly Backshift Delete(Knuth TAOCPv3) & optional Robin Hood re-org (Celis,1986). Linear probing collision clusters yields "no tuning needed" locality of reference: 1 DRAM hit per access for large tables of small items. RH sorts collision clusters by search depth which adds nice properties: faster miss search (eg. for inserts, usually compensating for data motion) and min depth variance (no unlucky keys). The latter enables ~100% table space utilization (we need one empty slot to halt some loops).
Misuse/attack is always possible. Inserting many duplicate causes overlong scans just as hash collisions do and is thus misuse/self-attack. If this is likely then use V=seq[T] instead. We provide a few mitigations triggered, like table growth itself, by overlong scans on underfull tables:
- automatic rehash of user hash output with a strong integer hash,
- overlong scan warnings (disabled in danger mode),
- automatic Robin Hood re-org activation, and
- use althash.getSalt to allow hard to predict per-table hash salt.
Program-wide-tunable defaults are to rehash, warn, re-org & salt with vmAddr since this is a safe-ish portable mode, but most can also be set via module globals consulted @init time. -d:unstableHash makes the default getSalt a secure random number via std/sysrand, but debugging may be harder.
MultiSET personality ensues when the V value type generic parameter is void. Otherwise the style of interface is multiTABLE. Every attempt is made for either personality to be drop-in compatible with Nim's standard library sets & tables, but extra features are provided here.
Space-time optimization of a sentinel key (a value of K disallowed for ordinary keys) is supported through the final two generic parameters, Z, and z. If Z is void, hash codes are saved and z is ignored. If Z==K, z is the sentinel key value.
If Z is neither K nor void then compact, insertion-ordered mode is used and z means how many bits of hcode are saved beside an index into a dense seq[(K,V)]. 6..8 bits avoids most "double cache misses" for miss lookups/inserts. z=0 works if space matters more than time.
Vars
lpGrowPow2 = 1
- default growth power of 2; 1 means double Source Edit
lpInitialSize = 2
- default initial size aka capacity aka cap Source Edit
lpRobinHood = false
- default to Robin Hood re-org; auto-activated Source Edit
Procs
proc containsOrIncl[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K): bool {. inline.}
- Source Edit
proc depthStats[K, V, Z; z: static int](s: LPTabz[K, V, Z, z]): tuple[ m1, m2: float, mx: int]
- Performance Forensics Source Edit
proc difference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z, z]
- Source Edit
proc getOrDefault[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K; def = default(V)): V {.inline.}
- Source Edit
proc hasKeyOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V): bool {.inline.}
- Source Edit
proc initLPSetz[K, Z; z: static int](initialSize = lpInitialSize; numer = lpNumer; denom = lpDenom; minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = lpRehash; robinhood = lpRobinHood): LPSetz[ K, Z, z] {.inline.}
- Return an LPTabz specialized to sets with a sentinel key. See initLPTabz for parameter details. Source Edit
proc initLPTab[K, V](initialSize = lpInitialSize; numer = lpNumer; denom = lpDenom; minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = lpRehash; robinhood = lpRobinHood): LPTab[K, V] {.inline.}
- Return an LPTabz specialized to tables with no sentinel key. See initLPTabz for parameter details. Source Edit
proc initLPTabz[K, V, Z; z: static int](initialSize = lpInitialSize; numer = lpNumer; denom = lpDenom; minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = lpRehash; robinhood = lpRobinHood): LPTabz[K, V, Z, z] {.inline.}
- Source Edit
proc intersection[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z, z]
- Source Edit
proc loadLPTabz[K, V, Z; z: static int](path: string): LPTabz[K, V, Z, z]
- Source Edit
proc missingOrExcl[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): bool
- Source Edit
proc pop[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]): (K, V) {. inline.}
- Source Edit
proc symmetricDifference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[ K, void, Z, z]
- Source Edit
Iterators
iterator mostCommon[K](xs: openArray[K]; n = 10): (K, int)
- Iterate over (n most common values in xs, their counts) tuples. Source Edit
Templates
template editOrInit[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; v, found, missing: untyped): untyped
- Source Edit
template getItOrFail[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; k: K; found, missing): untyped
- Provide value it corresponding to key k in found or do missing. Source Edit