adix/lptabz

Search:
Group by:
Source   Edit  

This module provides an (in|un)ordered multiset/multitable representation via Linear Probing with aging-friendly Backshift Delete(Knuth TAOCPv3) & optional Robin Hood re-org (Celis,1986). Linear probing collision clusters yields "no tuning needed" locality of reference: 1 DRAM hit per access for large tables of small items. RH sorts collision clusters by search depth which adds nice properties: faster miss search (eg. for inserts, usually compensating for data motion) and min depth variance (no unlucky keys). The latter enables ~100% table space utilization (we need one empty slot to halt some loops).

Misuse/attack is always possible. Inserting many duplicate causes overlong scans just as hash collisions do and is thus misuse/self-attack. If this is likely then use V=seq[T] instead. We provide a few mitigations triggered, like table growth itself, by overlong scans on underfull tables:

  1. automatic rehash of user hash output with a strong integer hash,
  2. overlong scan warnings (disabled in danger mode),
  3. automatic Robin Hood re-org activation, and
  4. use althash.getSalt to allow hard to predict per-table hash salt.

Program-wide-tunable defaults are to rehash, warn, re-org & salt with vmAddr since this is a safe-ish portable mode, but most can also be set via module globals consulted @init time. -d:unstableHash makes the default getSalt a secure random number via std/sysrand, but debugging may be harder.

MultiSET personality ensues when the V value type generic parameter is void. Otherwise the style of interface is multiTABLE. Every attempt is made for either personality to be drop-in compatible with Nim's standard library sets & tables, but extra features are provided here.

Space-time optimization of a sentinel key (a value of K disallowed for ordinary keys) is supported through the final two generic parameters, Z, and z. If Z is void, hash codes are saved and z is ignored. If Z==K, z is the sentinel key value.

If Z is neither K nor void then compact, insertion-ordered mode is used and z means how many bits of hcode are saved beside an index into a dense seq[(K,V)]. 6..8 bits avoids most "double cache misses" for miss lookups/inserts. z=0 works if space matters more than time.

Types

LPSet[K] = LPTabz[K, void, void, 0]
LPTabz for no-sentinel Set Source   Edit  
LPSetz[K; Z; z] = LPTabz[K, void, Z, z]
LPTabz for sentinel Set Source   Edit  
LPTab[K; V] = LPTabz[K, V, void, 0]
LPTabz for no-sentinel Tab Source   Edit  
LPTabz[K; V; Z; z] = object
  when Z is K or Z is void:
  else:
Robin Hood Hash Set Source   Edit  

Vars

lpDenom = 1
default denominator for lg(n) probe depth limit Source   Edit  
lpGrowPow2 = 1
default growth power of 2; 1 means double Source   Edit  
lpInitialSize = 2
default initial size aka capacity aka cap Source   Edit  
lpMaxWarn = 10
Most warnings per program invocation Source   Edit  
lpMinFree = 1
default min free slots; (>= 1) Source   Edit  
lpNumer = 3
default numerator for lg(n) probe depth limit Source   Edit  
lpRehash = false
default hcode rehashing behavior; auto-activated Source   Edit  
lpRobinHood = false
default to Robin Hood re-org; auto-activated Source   Edit  
lpWarn = stderr
Set to wherever you want warnings to go Source   Edit  

Procs

proc `$`[K, V: not void; Z; z: static int](t: LPTabz[K, V, Z, z]): string
Source   Edit  
proc `$`[K, Z; z: static int](s: LPTabz[K, void, Z, z]): string
Source   Edit  
proc `*`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z,
    z]
Source   Edit  
proc `+`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z,
    z]
Source   Edit  
proc `-`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void, Z,
    z]
Source   Edit  
proc `-+-`[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void,
    Z, z]
Source   Edit  
proc `<`[K, Z; z: static int](s, t: LPTabz[K, void, Z, z]): bool
Source   Edit  
proc `<=`[K, Z; z: static int](s, t: LPTabz[K, void, Z, z]): bool
Source   Edit  
proc `==`[K, V: not void; Z; z: static int](x, y: LPTabz[K, V, Z, z]): bool
Source   Edit  
proc `==`[K, Z; z: static int](s, t: LPTabz[K, void, Z, z]): bool
Source   Edit  
proc `[]`[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V {.inline.}
Source   Edit  
proc `[]`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): var V {.
    inline.}
Source   Edit  
proc `[]=`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {.
    inline.}
Source   Edit  
proc add[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {.
    inline.}
Source   Edit  
proc add[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K) {.inline.}
Source   Edit  
proc allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K;
                                       vals: var seq[V]): bool {.inline.}
Source   Edit  
proc card[K, Z; z: static int](s: LPTabz[K, void, Z, z]): int {.inline.}
Source   Edit  
proc clear[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z])
Source   Edit  
proc contains[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): bool {.
    inline.}
Source   Edit  
proc containsOrIncl[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K): bool {.
    inline.}
Source   Edit  
proc debugDump[K, V, Z; z: static int](s: LPTabz[K, V, Z, z]; label = "")
Source   Edit  
proc del[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K) {.inline.}
delete one key key from t; If you want to know if it was present then use missingOrExcl, take, or pop instead. Source   Edit  
proc depths[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): seq[int]
Compute & return exact distribution of search depths over a set Source   Edit  
proc depthStats[K, V, Z; z: static int](s: LPTabz[K, V, Z, z]): tuple[
    m1, m2: float, mx: int]
Performance Forensics Source   Edit  
proc difference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K,
    void, Z, z]
Source   Edit  
proc disjoint[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): bool
Source   Edit  
proc editKey[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; old, new: K)
Insertion-ordered tables support editing keys while keeping iteration order. Source   Edit  
proc excl[K, Z; z: static int](s: var LPTabz[K, void, Z, z];
                               other: LPTabz[K, void, Z, z])
Source   Edit  
proc excl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; elt: K) {.inline.}
Source   Edit  
proc getCap[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): int {.inline.}
Source   Edit  
proc getOrDefault[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K;
    def = default(V)): V {.inline.}
Source   Edit  
proc hash[K, Z; z: static int](s: LPTabz[K, void, Z, z]): Hash
Source   Edit  
proc hasKey[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): bool {.
    inline.}
Source   Edit  
proc hasKeyOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    val: V): bool {.inline.}
Source   Edit  
proc inc[K, V, Z; z: static int](c: var LPTabz[K, V, Z, z]; key: K;
                                 amount: V = 1) {.inline.}
Increment a key counter in table t by amount. Source   Edit  
proc incl[K, Z; z: static int](s: var LPTabz[K, void, Z, z];
                               other: LPTabz[K, void, Z, z])
Source   Edit  
proc incl[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; elt: K) {.inline.}
Source   Edit  
proc indexBy[A, K, V, Z; z: static int](collection: A; index: proc (x: V): K): LPTabz[
    K, V, Z, z]
Source   Edit  
proc init[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z];
                                  initialSize = lpInitialSize; numer = lpNumer;
                                  denom = lpDenom; minFree = lpMinFree;
                                  growPow2 = lpGrowPow2; rehash = lpRehash;
                                  robinhood = lpRobinHood) {.inline.}
Source   Edit  
proc initLPSet[K](initialSize = lpInitialSize; numer = lpNumer; denom = lpDenom;
                  minFree = lpMinFree; growPow2 = lpGrowPow2; rehash = lpRehash;
                  robinhood = lpRobinHood): LPSet[K] {.inline.}
Return an LPTabz specialized to sets with no sentinel key. See initLPTabz for parameter details. Source   Edit  
proc initLPSetz[K, Z; z: static int](initialSize = lpInitialSize;
                                     numer = lpNumer; denom = lpDenom;
                                     minFree = lpMinFree; growPow2 = lpGrowPow2;
                                     rehash = lpRehash; robinhood = lpRobinHood): LPSetz[
    K, Z, z] {.inline.}
Return an LPTabz specialized to sets with a sentinel key. See initLPTabz for parameter details. Source   Edit  
proc initLPTab[K, V](initialSize = lpInitialSize; numer = lpNumer;
                     denom = lpDenom; minFree = lpMinFree;
                     growPow2 = lpGrowPow2; rehash = lpRehash;
                     robinhood = lpRobinHood): LPTab[K, V] {.inline.}
Return an LPTabz specialized to tables with no sentinel key. See initLPTabz for parameter details. Source   Edit  
proc initLPTabz[K, V, Z; z: static int](initialSize = lpInitialSize;
                                        numer = lpNumer; denom = lpDenom;
                                        minFree = lpMinFree;
                                        growPow2 = lpGrowPow2;
                                        rehash = lpRehash;
                                        robinhood = lpRobinHood): LPTabz[K, V,
    Z, z] {.inline.}
Source   Edit  
proc intersection[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K,
    void, Z, z]
Source   Edit  
proc len[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): int {.inline.}
Source   Edit  
proc load[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; path: string)
Source   Edit  
proc loadLPTabz[K, V, Z; z: static int](path: string): LPTabz[K, V, Z, z]
Source   Edit  
proc map[K, A, Z; z: static int](data: LPTabz[K, void, Z, z];
                                 op: proc (x: K): A {.closure.}): LPTabz[A,
    void, Z, z]
Source   Edit  
proc merge[K, V, Z; z: static int](c: var LPTabz[K, V, Z, z];
                                   b: LPTabz[K, V, Z, z])
Merge values from CountTable/histogram b into histogram c. Source   Edit  
proc mgetOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V): var V {.
    inline.}
Source   Edit  
proc mgetOrPut[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
                                       val: V; had: var bool): var V {.inline.}
Source   Edit  
proc missingOrExcl[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): bool
Source   Edit  
proc mmap[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; path: string)
NOTE: Read-Only memory map; Attempted edits => run-time errors (like SEGV). Source   Edit  
proc nthKey[K, Z; z: static int](t: LPTabz[K, void, Z, z]; n: int): K {.inline.}
Insertion-ordered multisets support 0-origin nth-in-order key. Source   Edit  
proc nthPair[K, V: not void; Z; z: static int](t: LPTabz[K, V, Z, z]; n: int): (
    K, V)
Insertion-ordered tables support 0-origin nth-in-order pair. Source   Edit  
proc nthPair[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; n: int): (
    K, ptr V) {.inline.}
Insertion-ordered tables support 0-origin nth-in-order pair w/editable val. Source   Edit  
proc numItems[K, Z; z: static int](t: LPTabz[K, void, Z, z]; key: K): int {.
    inline.}
Source   Edit  
proc pop[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]): (K, V) {.
    inline.}
Source   Edit  
proc pop[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    val: var V): bool {.inline.}
Source   Edit  
proc pop[K, Z; z: static int](s: var LPTabz[K, void, Z, z]; key: var K): bool
Source   Edit  
proc pop[K, Z; z: static int](t: var LPTabz[K, void, Z, z]): K {.inline.}
Source   Edit  
proc rightSize(count: int; numer = 0; denom = 0; minFree = 0): int {.inline,
    ...deprecated: "Deprecated since 0.2; identity function", raises: [], tags: [],
    forbids: [].}
Deprecated: Deprecated since 0.2; identity function
Source   Edit  
proc save[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; pathStub: string)
Source   Edit  
proc setCap[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; newSize = -1)
Source   Edit  
proc setOrIncl[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: K): bool {.
    inline.}
Source   Edit  
proc setPolicy[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z];
                                       numer = lpNumer; denom = lpDenom;
                                       minFree = lpMinFree;
                                       growPow2 = lpGrowPow2; rehash = lpRehash;
                                       robinhood = lpRobinHood) {.inline.}
Must call setCap after changing certain params here (e.g. rehash). Source   Edit  
proc symmetricDifference[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[
    K, void, Z, z]
Source   Edit  
proc take[K, V: not void; Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    val: var V): bool {.inline.}
Source   Edit  
proc take[K, Z; z: static int](t: var LPTabz[K, void, Z, z]; key: var K): bool
Source   Edit  
proc toLPTabz[K; V: not void; Z; z: static int](pairs: openArray[(K, V)];
    dups = false): LPTabz[K, V, Z, z]
Source   Edit  
proc toLPTabz[K; V: void; Z; z: static int](keys: openArray[K]; dups = false): LPTabz[
    K, V, Z, z]
Source   Edit  
proc union[K, Z; z: static int](s1, s2: LPTabz[K, void, Z, z]): LPTabz[K, void,
    Z, z]
Source   Edit  
proc `{}`[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V {.inline.}
Source   Edit  
proc `{}`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K): var V {.
    inline.}
Source   Edit  
proc `{}=`[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K; val: V) {.
    inline.}
Source   Edit  

Iterators

iterator allItems[K, Z; z: static int](s: LPTabz[K, void, Z, z]; key: K): K
Source   Edit  
iterator allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; key: K): V
Source   Edit  
iterator allValues[K, V, Z; z: static int](t: LPTabz[K, V, Z, z];
    vals: var seq[V]): K
Source   Edit  
iterator hcodes[K, Z; z: static int](s: LPTabz[K, void, Z, z]): (int, Hash)
Source   Edit  
iterator items[K, Z; z: static int](s: LPTabz[K, void, Z, z]): K
Source   Edit  
iterator keys[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): K
Source   Edit  
iterator mitems[K, Z; z: static int](s: var LPTabz[K, void, Z, z]): var K
Source   Edit  
iterator mostCommon[K](xs: openArray[K]; n = 10): (K, int)
Iterate over (n most common values in xs, their counts) tuples. Source   Edit  
iterator mpairs[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]): (K, var V)
Source   Edit  
iterator mvalues[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]): var V
Source   Edit  
iterator numItems[K, Z; z: static int](t: LPTabz[K, void, Z, z]): (K, int)
Source   Edit  
iterator pairs[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): (K, V)
Source   Edit  
iterator pairs[K, Z; z: static int](t: LPTabz[K, void, Z, z]): (int, K)
Source   Edit  
iterator topByVal[K, V, Z; z: static int](c: LPTabz[K, V, Z, z]; n = 10;
    min = V.low; order = Cheap): (K, V)
Iterate from smallest to largest over biggest n items by value in c. order can be Cheap, Ascending, or Descending. Source   Edit  
iterator values[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]): V
Source   Edit  

Templates

template editOrInit[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    v, found, missing: untyped): untyped
Source   Edit  
template getItOrFail[K, V, Z; z: static int](t: LPTabz[K, V, Z, z]; k: K;
    found, missing): untyped
Provide value it corresponding to key k in found or do missing. Source   Edit  
template withIt[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; k: K;
                                        edit, init): untyped
Provide value variable it corresponding to key k in both bodies that represents the value found|allocated in the table. Source   Edit  
template withValue[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    value, found, missing: untyped): untyped
Source   Edit  
template withValue[K, V, Z; z: static int](t: var LPTabz[K, V, Z, z]; key: K;
    value, found: untyped): untyped
Source   Edit