C++ data structure 5

ahmeddxn7
CSC240Lab9Specifications.pdf

CSC 240

Lab 9

Hashing Strings

Perform a comparative analysis regarding collision rates for various hash codes of character strings by

using a polynomial hash algorithm that allows different values of the parameter 𝒂. It is known that for

English words using values of: 33, 37, 39, or 41 for 𝒂 is preferred and reduces collisions. Use an open-

addressing scheme of a hash table and determine collisions using these four different values of 𝒂. Count

collisions when a hashed location for any given string maps to an occupied location in the hash table.

Use linear probing for collision resolution for one trial and quadratic probing for another trial.

Let’s understand how the polynomial hash algorithm computes a hash for a given character string.

Words in English are variable-length strings that can be viewed as a tuple of (x0, x1, ..., xk-1). Choose a nonzero constant, 𝑎 ≠ 1, and calculate (x0 𝑎 k-1+ x1 𝑎 k-2+...+ xk-2 𝑎 + xk-1) as the hash code, ignoring overflows. Mathematically speaking, this is simply a polynomial in 𝑎 that takes the components

(x0, x1,..., xk-1) of a string as its coefficients.

Here is the code for the polynomial hash algorithm: int hash = 0;

int n = s.length();

for (int i = 0; i < n; i ++)

hash = a * hash + s.charAt(i);

Test the hash codes of the following text files provided in Extra Files under Content in D2L:

hashText1.txt, hashText2.txt, hashText3.txt, hashText4.txt.

The input files come in the form:

[prime number greater than or equal to number of words]

[file contents as words]

For example, hashText1.txt:

2347

The quick brown fox jumped over

the lazy river…

Give a report of at least 250 words that summarizes the number of collisions for each of the files

respective to each of the four values of 𝑎 given above. Which value of 𝑎 gives the best results for each

file and why do you think so? Which probing technique yielded more clustering over the other? Where

did clustering occur? Why? Are there other probing techniques that prove to be more effective than the

two techniques tested?

Resource: https://www.cpp.edu/~ftang/courses/CS240/lectures/hashing.htm

Use the following function to parse the words from the input files and build the hash table:

void buildHashTable(ifstream& inFile, HashType<string>& hashTable, int type){ string word; if (inFile.is_open()){ while(inFile >> word){ // This for loop was taken from: // https://www.geeksforgeeks.org/removing-punctuations-given-string/ for (int i = 0, len = word.size(); i < len; i++) { // check whether parsing character is punctuation or not if (ispunct(word[i])) { word.erase(i--, 1); len = word.size(); } } if(type == 1) hashTable.InsertItemLinear(word); else hashTable.InsertItemQuadratic(word); } } }

Here is the header for the HashType class.

template <class ItemType> class HashType { public: HashType(); // Constructor HashType(int s, int factor); //Dynamic size constructor void MakeEmpty(); // Function: Returns the list to the empty state. // Post: List is empty. bool IsFull() const; // Function: Determines whether list is full. // Pre: List has been initialized. // Post: Function value = (list is full) int GetNumItems() const; // Function: Determines the number of elements in list. // Pre: List has been initialized. // Post: Function value = number of elements in list void RetrieveItem(ItemType& item, bool& found); // Function: Retrieves list element whose key matches item's key (if // present). // Pre: List has been initialized.

// Key member of item is initialized. // Post: If there is an element someItem whose value matches // item's value, then found = true and item contains the contents // of the item if it is found. // otherwise found = false and item is returned unchanged. // List is unchanged. void InsertItemLinear(ItemType item); // Function: Adds item to list and uses a linear probing technique to // resolve collisions. // Pre: List has been initialized. // List is not full. // item is not in list. // Post: item is in list. void InsertItemQuadratic(ItemType item); // Function: Adds item to list and uses a quadratic probing technique to // resolve collisions. // Pre: List has been initialized. // List is not full. // item is not in list. // Post: item is in list. void DeleteItem(ItemType item); // Function: Deletes the element whose key matches item's key. // Pre: List has been initialized. // Key member of item is initialized. // One and only one element in list has a key matching item's key. // Post: No element in list has a key matching item's key. int Hash(string item) const; //This is the hash function for this class unsigned long int GetCollisions() const; //return the number of collisions that occured during the build of // the hash table template <class ItemHash> friend ostream& operator<<(ostream& out, const HashType<ItemHash>& items); private: int a; //the value used for a polynomial hash function int numItems; int size; ItemType* info; ItemType emptyItem = ""; //The empty string unsigned long int numCollisions; };