Source Edit

This module provides several alternative hashes to the Nim stdlib targeting high entropy in low order bits for and mask conversion to a table index. Activate via e.g., proc hash(x: K): Hash {.inline.} = hashRoMu1(x), where K is your integer key type.

The fastest hash used here is a likely the multiply & rotate hash (lately called a "Fibonacci hash", after Knuth's golden ratio fascination. Knuth immediately, in context, shows it works for any irrational number and then computer arithmetic is finite anyway). This propagates entropy in key bits to the double-width product in a Pascal's Binomial Triangle sort of pattern. Middle bits have the most key related entropy (there is even a PRNG method called "middle square" based on this pattern). More specifically, consider the product of a pair of 2 digit decimal numbers: (a1*10+a0)*(b1*10+b0) = a1*b1*100 + (a1*b0+b0*a1)*10 + a0*b0. This shows the "triangle" kind of pattern that makes the mixing strongest in the middle.

The full product is easily accessed for half CPU register width and narrower numbers. For the widest integer type, C/backend arithmetic rules only give the modulus|lower order bits of said product (x86/amd64 does make the upper word available in another register). hashRoMu1 takes the simple portable way out and discards half of key entropy, just rotating down the more well mixed bits (bit reversal might be better, though expensive compared to rotr). Reducing hash codes to table address via shr is another way out. Getting high order avalanche is more natural than the low order avalanche needed for and mask, both may be https://burtleburtle.net/bob/hash/integer.html's "half avalanche". Anyway, rotating is portably fast (often 1 cycle latency, 1-per cycle tput for an immediate constant number of bits). Bit reversal is not so slow using a fully unrolled loop and 16 entry nibble lookup table. Different hashes will perform more/less well on different data. So, we just provide a few here, and one is based upon highly non-linear bit reversal.

A stronger hash is hashWangYi1 which passes SMHasher's entropy tests, but "funnels" 64 inputs into about 62-bits of output entropy. It is still the default Hash-rehasher since it is fast & tables with 2^62 entries are rare. WangYi1's primitive is the xor of the high & low parts of a double-width product of a salt with a key. The xor blends well-mixed low order bits of the high output product word with less well-mixed low order bits of the low output product word, yielding "mostly level" mixing across all bits of the hash. WangYi1 takes two rounds of such mixing to achieve avalanche. It may be possible to "nearly pass" in only one round. hashWY0 is one attempt at that, but the salt may need to be re-optimized to have any hope of doing well on SMHasher. This hash became the default hash(int) in Nim stdlib.

There is a motley selection of other hashes here. The strongest fast hash is Pelle Evensen's bijective (64-bits -> 64-bits) hashNASAM. It's 2-3x slower than WangYi1 on some CPU architectures, but has no known statistical flaws.

Incidentally, most "fast in GB/s" hashes are far too slow for just one int. Even assessing them so is misleading for lookup tables. You want time =~ a + b*nBytes where a & b maybe come from line regressions. 1/b alone says too little, especially for integer keys where a dominates. Short string hashes or a alone can similarly mislead. TLDR, want >=2 "summary numbers" not one & whole curves are best.

Imports

bitop

Vars

getSalt = vmaddrSalt: Source Edit

Procs

proc hash(hsh, salt: Hash): Hash {.inline, ...raises: [], tags: [], forbids: [].}: Source Edit
proc hashDegski[T: Ordinal | enum](x: T): Hash {.inline.}: https://gist.github.com/degski/6e2069d6035ae04d5d6f64981c995ec2 Source Edit
proc hashIdentity[T: Ordinal | enum](x: T): Hash {.inline.}: Source Edit
proc hashMoreMur[T: Ordinal | enum](x: T): Hash {.inline.}: This is from https://github.com/tommyettinger Source Edit
proc hashNASAM[T: Ordinal | enum](x: T): Hash {.inline.}: Source Edit
proc hashRevFib(x: int32 | uint32): Hash {.inline.}: Source Edit
proc hashRevFib(x: int64 | uint64): Hash {.inline.}: Source Edit
proc hashRevFib(x: uint): Hash {.inline, ...raises: [], tags: [], forbids: [].}: Source Edit
proc hashRoMu1(x: SomeOrdinal | Hash): Hash {.inline.}: 1-hop of Romul-DuoJr using x as seed Source Edit
proc hashRoMu1(x: uint): Hash {.inline, ...raises: [], tags: [], forbids: [].}: Source Edit
proc hashRoMu2(x: SomeOrdinal | Hash): Hash {.inline.}: Just the cross terms of 128-bit whole product Source Edit
proc hashSplit64[T: Ordinal | enum](x: T): Hash {.inline.}: https://nullprogram.com/blog/2018/07/31/ Source Edit
proc hashSplitMix[T: Ordinal | enum](x: T): Hash {.inline.}: This is one hop of a PRNG. For more information on the PRNG part see http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html Source Edit
proc hashWangYi1[T: Ordinal | enum](x: T): Hash {.inline.}: Wang Yi's hash_v1 for 8B int. https://github.com/rurban/smhasher has more details. This passed all scrambling tests in Spring 2019 and is simple. NOTE: It's ok to define proc(x:int16): Hash=hashWangYi1(cast[Hash](x)). Source Edit
proc hashWY0[T: Ordinal | enum](x: T): Hash {.inline.}: A slightly simplified/early version of Wang Yi's hash for 8B ints. Faster, but less scrambling. Definitely has some easy weak spots. Source Edit
proc normalized[T](x: openArray[T]): seq[float]: Source Edit
proc raiseNotFound[K](key: K): Source Edit
proc secureSalt(x: pointer): Hash {.inline, ...raises: [OSError], tags: [], forbids: [].}: Source Edit
proc vmaddrSalt(x: pointer): Hash {.inline, ...raises: [], tags: [], forbids: [].}: Source Edit
proc zeroSalt(x: pointer): Hash {.inline, ...raises: [], tags: [], forbids: [].}: Source Edit

Templates

template dbg(x): Source Edit

adix/althash

Imports

Vars

Procs

Templates