Data Structures - Hashtable

The flashcards below were created by user javatomas on FreezingBlue Flashcards.

  1. Definition of a Hash Table
    A data structure that uses a Hash function to map identifying values, known as keys, to their associated values. It can be quickly searched by the key.

    Think about an array that allows any value to be used as an index.
  2. Applications of a Hash table
    Driver's license records - with a hash table, you could quickly get information about the driver given the license number

    Compiler symbol tables - the compiler uses a symbol table to keep track of the user-defined symbols in a C++ program

    Internet search engines - meta tags allow the owner of a page to specify key words and concepts under which the page will be indexed. The meta tags can guide the search engine in choosing which of the several possible meanings for these words is correct.
  3. How to choose Hash Function?
    An efficient hash function - distribute the index values of inserted objects uniformly across the table.

    An efficient collision resolution - computes an alternative index for a key whose hash index corresponds to a location previously inserted in the hash table.

    Calculated quickly, returns values within the range of locations in our table and minimize collisions.
  4. Characteristics of Good Hash Functions
    • Minimize collision
    • Easy and quick to compute
    • Distribute key values evenly in the hash table
    • Use all the information provided in the key
    • Have a high load factor for a given set of keys
  5. Types of Hash functions
    • Division method
    • Double hashing with passbit
  6. Pigeonhole Priniciple
    • If M items are placed in N buckets, and M is
    • greater than N, one or more buckets contain two or more items.

    • It proves that no hashing algorithm can hash every key to a unique
    • index if the possible keys exceed the size of the array
  7. What is a Load Factor
    It is the decision parameter used when we want to rehash or expand the existing hash table entries.
  8. Why use a Hash table?
    They are good for doing a quick search on things. We don't have to search through each element in the array, and just access the position you want.
  9. Types of hash function
    • Additive hash
    • XOR hash
    • Rotating hash
    • Bernstein hash
    • Modified Bernstein hash
    • Shift-Add-XOR hash
    • FNV hash
    • One-at-a-Time hash
    • JSW hash
    • ELF hash
    • Jenkins hash
  10. Additive hash (Very bad algorithm)
    Simplest algorithm for hashing a sequence of integral values (such as a string).

    Add all of the characters together and then force the range into something suitable for look-up with the remainder of division.

    Generally, any hash algorithm that relies primarily on a commutative operation will  have an exceptionally bad distribution. This hash fails to treat permutations differently, so “abc”, “cba”, and “cab” will all result in the same hash value.
  11. XOR hash (Not very good)
    XOR hash repeatedly folds the bytes together.

    • unsigned xor_hash ( void *key, int len )
    • {
    •   unsigned char *p = key;
    •   unsigned h = 0;

    •   int i;
    •   for ( i = 0; i < len; i++ )
    •     h ^= p[i];
    •   return h;
    • }

    the variable h, is not mixed nearly enough to come close to achieving avalanche, nor is a single XOR effective at permuting the internal state
  12. Rotating hash
    The rotating hash is identical to the XOR hash except instead of simply folding each byte of the input into the internal state, it also performs a fold of the internal state before combining it with each byte of the input. This extra mixing step is enough to give the rotating hash a much better distribution. Much of the time, the rotating hash is sufficient, and can be considered the minimal acceptable algorithm. Notice that with each improvement, the internal state is being mixed up more and more. This is a key element in a good hash function.
  13. Bernstein hash (aka Chris Torek hash)
    It is not very sound when it comes to avalanche and permutation of the internal state.

    It has proven very good for small character keys, where it can outperform algorithms that result in a more random distribution:

    • 1 unsigned djb_hash ( void *key, int len )
    • 2 {
    • 3  unsigned char *p = key;
    • 4  unsigned h = 0;
    • 5  int i;
    • 6
    • 7  for ( i = 0; i < len; i++ )
    • 8    h = 33 * h + p[i];
    • 9
    • 10   return h;
    • 11 }

    Performs very well in practice. Always test this function with sample data for every application to ensure that it does not encounter a degenerate case and cause excessive collisions.
  14. Modified Bernstein
    A minor update to Bernstein's hash replaces addition with XOR for the combining step.

    The original algorithm is still recommended by nearly everyone, but the new algorithm typically results in a better distribution.
  15. Shift-Add-XOR hash
    The shift-add-XOR hash was designed as a string hashing function, but because it is so effective, it works for any data as well with  similar efficiency.

    Similar to the rotating hash except a different choice of constants for the rotation is used and addition is a preferred operation for mixing.. All in all, this is a surprisingly powerful and flexible hash. Like many  effective hashes, it will fail tests for  avalanche, but that does not seem to affect its performance in practice.

    •  1 unsigned sax_hash ( void *key, int len )
    •  2 {
    •  3   unsigned char *p = key;
    •  4   unsigned h = 0;
    •  5   int i;
    •  6
    •  7   for ( i = 0; i < len; i++ )
    •  8     h ^= ( h << 5 ) + ( h >> 2 ) + p[i];
    •  9
    • 10   return h;
    • 11 }
  16. FNV hash
  17. One-at-a-Time hash
    This algorithm quickly reaches avalanche and performs very well.

    •  1 unsigned oat_hash ( void *key, int len )
    •  2 {
    •  3   unsigned char *p = key;
    •  4   unsigned h = 0;
    •  5   int i;
    •  6
    •  7   for ( i = 0; i < len; i++ ) {
    •  8     h += p[i];
    •  9     h += ( h << 10 );
    • 10     h ^= ( h >> 6 );
    • 11   }
    • 12
    • 13   h += ( h << 3 );
    • 14   h ^= ( h >> 11 );
    • 15   h += ( h << 15 );
    • 16
    • 17   return h;
    • 18 }
  18. JSW hash
  19. ELF hash
  20. Jenkins hash
  21. Collision resolution
    Algorithm and data structure to handle two keys that hash to the same index.
  22. Collision resolution Techniques
    Opening Addressing - Array or LinkedList based

    • Linear probing (linear search)
    • Quadratic probing (non linear search)
    • Double hashing (use 2 hash functions)
  23. Clustering
    Groups of consecutively occupied locations.
  24. Quadratic Probing
    rehash(key) = (n+k^2) % tablesize
Card Set:
Data Structures - Hashtable
2013-10-06 15:58:08
Data structures

All about Hash tables
Show Answers: