adix/althash

Source   Edit  

This module provides several alternative hashes to the Nim stdlib targeting high entropy in low order bits for and mask conversion to a table index. Activate via e.g., proc hash(x: K): Hash {.inline.} = hashRoMu1(x), where K is your integer key type.

The fastest hash used here is a likely the multiply & rotate hash (lately called a "Fibonacci hash", after Knuth's golden ratio fascination. Knuth immediately, in context, shows it works for any irrational number and then computer arithmetic is finite anyway). This propagates entropy in key bits to the double-width product in a Pascal's Binomial Triangle sort of pattern. Middle bits have the most key related entropy (there is even a PRNG method called "middle square" based on this pattern). More specifically, consider the product of a pair of 2 digit decimal numbers: (a1*10+a0)*(b1*10+b0) = a1*b1*100 + (a1*b0+b0*a1)*10 + a0*b0. This shows the "triangle" kind of pattern that makes the mixing strongest in the middle.

The full product is easily accessed for half CPU register width and narrower numbers. For the widest integer type, C/backend arithmetic rules only give the modulus|lower order bits of said product (x86/amd64 does make the upper word available in another register). hashRoMu1 takes the simple portable way out and discards half of key entropy, just rotating down the more well mixed bits (bit reversal might be better, though expensive compared to rotr). Reducing hash codes to table address via shr is another way out. Getting high order avalanche is more natural than the low order avalanche needed for and mask, both may be https://burtleburtle.net/bob/hash/integer.html's "half avalanche". Anyway, rotating is portably fast (often 1 cycle latency, 1-per cycle tput for an immediate constant number of bits). Bit reversal is not so slow using a fully unrolled loop and 16 entry nibble lookup table. Different hashes will perform more/less well on different data. So, we just provide a few here, and one is based upon highly non-linear bit reversal.

A stronger hash is hashWangYi1 which passes SMHasher's entropy tests, but "funnels" 64 inputs into about 62-bits of output entropy. It is still the default Hash-rehasher since it is fast & tables with 2^62 entries are rare. WangYi1's primitive is the xor of the high & low parts of a double-width product of a salt with a key. The xor blends well-mixed low order bits of the high output product word with less well-mixed low order bits of the low output product word, yielding "mostly level" mixing across all bits of the hash. WangYi1 takes two rounds of such mixing to achieve avalanche. It may be possible to "nearly pass" in only one round. hashWY0 is one attempt at that, but the salt may need to be re-optimized to have any hope of doing well on SMHasher. This hash became the default hash(int) in Nim stdlib.

There is a motley selection of other hashes here. The strongest fast hash is Pelle Evensen's bijective (64-bits -> 64-bits) hashNASAM. It's 2-3x slower than WangYi1 on some CPU architectures, but has no known statistical flaws.

Incidentally, most "fast in GB/s" hashes are far too slow for just one int. Even assessing them so is misleading for lookup tables. You want time =~ a + b*nBytes where a & b maybe come from line regressions. 1/b alone says too little, especially for integer keys where a dominates. Short string hashes or a alone can similarly mislead. TLDR, want >=2 "summary numbers" not one & whole curves are best.

Vars

getSalt = vmaddrSalt
Source   Edit  

Procs

proc hash(hsh, salt: Hash): Hash {.inline, ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc hashDegski[T: Ordinal | enum](x: T): Hash {.inline.}
https://gist.github.com/degski/6e2069d6035ae04d5d6f64981c995ec2 Source   Edit  
proc hashIdentity[T: Ordinal | enum](x: T): Hash {.inline.}
Source   Edit  
proc hashMoreMur[T: Ordinal | enum](x: T): Hash {.inline.}
This is from https://github.com/tommyettinger Source   Edit  
proc hashNASAM[T: Ordinal | enum](x: T): Hash {.inline.}
Source   Edit  
proc hashRevFib(x: int32 | uint32): Hash {.inline.}
Source   Edit  
proc hashRevFib(x: int64 | uint64): Hash {.inline.}
Source   Edit  
proc hashRevFib(x: uint): Hash {.inline, ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc hashRoMu1(x: SomeOrdinal | Hash): Hash {.inline.}
1-hop of Romul-DuoJr using x as seed Source   Edit  
proc hashRoMu1(x: uint): Hash {.inline, ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc hashRoMu2(x: SomeOrdinal | Hash): Hash {.inline.}
Just the cross terms of 128-bit whole product Source   Edit  
proc hashSplit64[T: Ordinal | enum](x: T): Hash {.inline.}
https://nullprogram.com/blog/2018/07/31/ Source   Edit  
proc hashSplitMix[T: Ordinal | enum](x: T): Hash {.inline.}
This is one hop of a PRNG. For more information on the PRNG part see http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html Source   Edit  
proc hashWangYi1[T: Ordinal | enum](x: T): Hash {.inline.}
Wang Yi's hash_v1 for 8B int. https://github.com/rurban/smhasher has more details. This passed all scrambling tests in Spring 2019 and is simple. NOTE: It's ok to define proc(x:int16): Hash=hashWangYi1(cast[Hash](x)). Source   Edit  
proc hashWY0[T: Ordinal | enum](x: T): Hash {.inline.}
A slightly simplified/early version of Wang Yi's hash for 8B ints. Faster, but less scrambling. Definitely has some easy weak spots. Source   Edit  
proc normalized[T](x: openArray[T]): seq[float]
Source   Edit  
proc raiseNotFound[K](key: K)
Source   Edit  
proc secureSalt(x: pointer): Hash {.inline, ...raises: [OSError], tags: [],
                                    forbids: [].}
Source   Edit  
proc vmaddrSalt(x: pointer): Hash {.inline, ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc zeroSalt(x: pointer): Hash {.inline, ...raises: [], tags: [], forbids: [].}
Source   Edit  

Templates

template dbg(x)
Source   Edit