The flashcards below were created by user
javatomas
on FreezingBlue Flashcards.

Definition of a Hash Table
A data structure that uses a Hash function to map identifying values, known as keys, to their associated values. It can be quickly searched by the key.
Think about an array that allows any value to be used as an index.

Applications of a Hash table
Driver's license records  with a hash table, you could quickly get information about the driver given the license number
Compiler symbol tables  the compiler uses a symbol table to keep track of the userdefined symbols in a C++ program
Internet search engines  meta tags allow the owner of a page to specify key words and concepts under which the page will be indexed. The meta tags can guide the search engine in choosing which of the several possible meanings for these words is correct.

How to choose Hash Function?
An efficient hash function  distribute the index values of inserted objects uniformly across the table.
An efficient collision resolution  computes an alternative index for a key whose hash index corresponds to a location previously inserted in the hash table.
Calculated quickly, returns values within the range of locations in our table and minimize collisions.

Characteristics of Good Hash Functions
 Minimize collision
 Easy and quick to compute
 Distribute key values evenly in the hash table
 Use all the information provided in the key
 Have a high load factor for a given set of keys

Types of Hash functions
 Division method
 Double hashing with passbit

Pigeonhole Priniciple
 If M items are placed in N buckets, and M is
 greater than N, one or more buckets contain two or more items.
 It proves that no hashing algorithm can hash every key to a unique
 index if the possible keys exceed the size of the array

What is a Load Factor
It is the decision parameter used when we want to rehash or expand the existing hash table entries.

Why use a Hash table?
They are good for doing a quick search on things. We don't have to search through each element in the array, and just access the position you want.

Types of hash function
 Additive hash
 XOR hash
 Rotating hash
 Bernstein hash
 Modified Bernstein hash
 ShiftAddXOR hash
 FNV hash
 OneataTime hash
 JSW hash
 ELF hash
 Jenkins hash

Additive hash (Very bad algorithm)
Simplest algorithm for hashing a sequence of integral values (such as a string).
Add all of the characters together and then force the range into something suitable for lookup with the remainder of division.
Generally, any hash algorithm that relies primarily on a commutative operation will have an exceptionally bad distribution. This hash fails to treat permutations differently, so “abc”, “cba”, and “cab” will all result in the same hash value.

XOR hash (Not very good)
XOR hash repeatedly folds the bytes together.
 unsigned xor_hash ( void *key, int len )
 {
 unsigned char *p = key;
 unsigned h = 0;
 int i;
 for ( i = 0; i < len; i++ )
 h ^= p[i];
 return h;
 }
the variable h, is not mixed nearly enough to come close to achieving avalanche, nor is a single XOR effective at permuting the internal state

Rotating hash
The rotating hash is identical to the XOR hash except instead of simply folding each byte of the input into the internal state, it also performs a fold of the internal state before combining it with each byte of the input. This extra mixing step is enough to give the rotating hash a much better distribution. Much of the time, the rotating hash is sufficient, and can be considered the minimal acceptable algorithm. Notice that with each improvement, the internal state is being mixed up more and more. This is a key element in a good hash function.

Bernstein hash (aka Chris Torek hash)
It is not very sound when it comes to avalanche and permutation of the internal state.
It has proven very good for small character keys, where it can outperform algorithms that result in a more random distribution:
 1 unsigned djb_hash ( void *key, int len )
 2 {
 3 unsigned char *p = key;
 4 unsigned h = 0;
 5 int i;
 6
 7 for ( i = 0; i < len; i++ )
 8 h = 33 * h + p[i];
 9
 10 return h;
 11 }
Performs very well in practice. Always test this function with sample data for every application to ensure that it does not encounter a degenerate case and cause excessive collisions.

Modified Bernstein
A minor update to Bernstein's hash replaces addition with XOR for the combining step.
The original algorithm is still recommended by nearly everyone, but the new algorithm typically results in a better distribution.

ShiftAddXOR hash
The shiftaddXOR hash was designed as a string hashing function, but because it is so effective, it works for any data as well with similar efficiency.
Similar to the rotating hash except a different choice of constants for the rotation is used and addition is a preferred operation for mixing.. All in all, this is a surprisingly powerful and flexible hash. Like many effective hashes, it will fail tests for avalanche, but that does not seem to affect its performance in practice.
 1 unsigned sax_hash ( void *key, int len )
 2 {
 3 unsigned char *p = key;
 4 unsigned h = 0;
 5 int i;
 6
 7 for ( i = 0; i < len; i++ )
 8 h ^= ( h << 5 ) + ( h >> 2 ) + p[i];
 9
 10 return h;
 11 }


OneataTime hash
This algorithm quickly reaches avalanche and performs very well.
 1 unsigned oat_hash ( void *key, int len )
 2 {
 3 unsigned char *p = key;
 4 unsigned h = 0;
 5 int i;
 6
 7 for ( i = 0; i < len; i++ ) {
 8 h += p[i];
 9 h += ( h << 10 );
 10 h ^= ( h >> 6 );
 11 }
 12
 13 h += ( h << 3 );
 14 h ^= ( h >> 11 );
 15 h += ( h << 15 );
 16
 17 return h;
 18 }




Collision resolution
Algorithm and data structure to handle two keys that hash to the same index.

Collision resolution Techniques
Opening Addressing  Array or LinkedList based
 Linear probing (linear search)
 Quadratic probing (non linear search)
 Double hashing (use 2 hash functions)

Clustering
Groups of consecutively occupied locations.

Quadratic Probing
rehash(key) = (n+k^2) % tablesize

