Make a hashtable in C.

profilehzxlove
Lab5HashTablesinC.pdf

Lab 5: Hash Tables in C In this lab we're making hashtables in C. We’ve been using C arrays for the past two labs and now that Lab 4 has given us a working linked list in C, we can implement a hash table. You will be doing the following:

● TODO1 implement the hash table ● (EXTRA CREDIT) TODO2 No memory leaks

○ valgrind will display the following message:

All heap blocks were freed -- no leaks are possible

Hash Table Refresher Put simply, a hash table is just an array that is able to store data associated with a key. In Assignment 4 we were storing data using a ​String​ ​for our keys and an ​Object​ ​for our value, which would eventually be a list. The reason we want to make an array of lists instead of just using an array is it is much more efficient with storage space since we don’t need to make space in an array for every possible hash value. Additionally we don’t have to worry about running out of space in the array to deal with ​collisions​, two keys being turned into the same array index. Recall that in Assignment 4 we needed to call the Java function ​hashCode()​ on our String​ key and then turned the hash code (named ​hashCode​ in the PDF) into an index into out hash table (making it between ​0​ and ​tableSize - 1​.

String​ ​key​ = ​"hello"​; int​ hashCode = ​key​.hashCode(); int​ arrayIndex = Math.​abs​(hashCode) % tableSize;

At a high level this code is converting a ​String​ into a valid array index. What hashCode()​ is doing behind the scenes involves, at least in C, getting a bit closer to the actual data representation in memory.

Hashing in C Our ​hash​()​ will actually be made up of three functions:

● int​ hash(Dictionary D, ​char​* ​key​) ○ The function that will return an array index

● unsigned​ ​int​ ​pre_hash​(​char​* input) ○ This function converts a string into an ​unsigned​ ​int

● unsigned​ ​int​ ​rotate_left​(​unsigned​ ​int​ value, ​int​ shift) ○ The function responsible for moving bits to semi-randomize the

resulting key Note: ​You will only need to call ​hash(Dictionary D, ​char​* ​key​)​ when you want to compute a hash value. These functions, included in the ​Dictionary.c​ file are defined as follows:

unsigned ​int​ ​rotate_left​(​unsigned ​int​ ​value​, ​int​ shift​) { int​ sizeInBits = ​8​ * ​sizeof​(unsigned ​int​); shift = shift & (sizeInBits - ​1​); if​(shift == ​0​){

return​ ​value​; }

return​ (​value​ << shift) | (​value​ >> (sizeInBits - shift)); }

unsigned​ ​int​ ​pre_hash​(​char​* input)​ { unsigned​ ​int​ result = ​0xBAE86554​ ​//this may correspond to an

address

while​ (*input) { result ^= *input++;

result = rotate_left(result, ​5​); }

return​ result; }

int​ hash(Dictionary D, ​char​* ​key​) { return​ pre_hash(​key​) % D->tableSize;

}

Note: ​You will only need to call ​hash(Dictionary D, ​char​* ​key​)​ when you want to

compute a hash value. A few things to note about these functions are that they are manipulating the actual bits of our data in memory, specifically the ​&​, ​<<​, ​>>​, and ​^=​ operators which are defined as follows:

● &​ is ​bitwise ​ AND ○ Instead of looking at a truth value it looks at the binary

representations of its arguments and creates a new binary number using a logical AND for each bit (if you have taken CE12 you will be familiar with this).

● <<​ and ​>>​ are bit shifts ○ All data is represented as some sequence of bits in memory. Doing

a left or right shift moves the bit sequence stored in a given variable (on the left) the specified number of bits (on the right) and fills in 0s when there is no more of the original value to shift into a bit.

● ^=​ is a bitwise exclusive or assignment ○ This operator performs the XOR operation between its two

arguments and assigns the result to the left hand side. Wikipedia has a more in depth explanations of bitwise operators if you’re curious ( ​https://en.wikipedia.org/wiki/Bitwise_operations_in_C​) We have also included a program called ​HashDemo.c​ to show how the above three functions work. To compile it, simply enter the following:

bash$​ gcc -std=c99 -o HashDemo HashDemo.c To run the program:

./HashDemo

It will show you a breakdown of strings and their indices as created by the above functions, the functionality of ​rotate_left()​ on a single unsigned int, and the number of bits a single character takes up.

A Note on typedefs In this lab we’ll also be seeing more of these statements and it’s one of the built in abstractions in C. We’ve done ​typedef​ ​struct​ ​node_type​ {...} Node;​ to allow us to refer to a ​Node​ without having to repeatedly say ​struct​. Inside of ​Dictionary.h​ and Dictionary.c​ are two other ​typedef​s, one for a ​struct​ ​DictionaryObj​*​ and one for a ​struct​ ​EntryObj​*​. These allow you to refer to pointers by whatever name was given after the ​typedef​, in this case ​Dictionary​ is another name for a pointer to a DictionaryObj​ and ​Entry​ is another name for a pointer to an ​EntryObj​.

TODO1: Implementing the Hash Table Similar to Assignment 4 you’ll be implementing something resembling the ​MyHashtable class but not quite because, once again, C does not have classes. The basic blueprint for our C version should look very familiar:

● A ​struct​ defined in the ​Dictionary.c​ called ​Entry​ (a lot like the ​Node​ struct you implemented in Lab 4). It will once again store Key/Value pairs:

○ char​* ​key ○ char​* ​value ○ As before it should have an associated ​Entry ​newEntry​(​char​* key,

char​* ​value​)​ function ○ As we’re in C it should also have an associated ​void​ freeEntry(Entry*

pE)​ function (more details about this in TODO2). ● Again like in Assignment 4 your hash table struct will have properties that aren’t

directly accessible outside of the ​Dictionary.c​ file: ○ int​ tableSize​ - the size of the array being used by the hash table ○ int​ ​size​ - the number of key/value entries currently in the hash table ○ List*​* ​table​ - an array of pointers to ​List​s. As in Assignment 4, the

reason for this is to handle cases where two keys have the same hash value.

● You’ll be implementing the following functions in ​Dictionary.c​: ○ Dictionary ​newDictionary​(​int​ tableSize)

■ Allocates space for a new ​Dictionary​, initializes the new Dictionary​’s variables, and returns it

○ void​ freeDictionary(Dictionary* pD) ■ Frees all data associated with the ​Dictionary​ pointed at by ​pD

(more detail about this in TODO2)

○ int isEmpty​(Dictionary D) ■ Returns an ​int​; booleans in C are integers. Traditionally a ​1

represents ​true​ and ​0​ represents ​false ○ int​ ​size​(Dictionary D)

■ Returns the number of key/value pairs in D ○ void​ ​insert​(Dictionary D, ​char​* key, ​char​* value)

■ Adds a new key/value pair into the dictionary using a linked list to deal with collisions (translating Assignment 4’s ​put​()​ is a good place to start)

○ char​* ​lookup​(Dictionary D, ​char​* ​key​) ■ Returns the value in ​Dictionary​ D​ associated with ​key

(translating Assignment 4’s ​get​()​ is a good place to start) ○ void​ ​delete​(Dictionary D, ​char​* key)

■ Removes the ​Entry​ associated with ​key​ in ​Dictionary​ D (translating Assignment 4’s ​remove()​ is a good place to start)

■ Remember that because we’re in C we need to free whatever was deleted here.

○ void​ ​makeEmpty​(Dictionary D) ■ Removes all Entries from Dictionary D (translating Assignment 4’s

clear​()​ is a good place to start) ■ Remember that because we’re in C we need to free memory we’re

not using anymore ourselves ○ void​ ​printDictionary​(​FILE* ​out​, Dictionary D​)

■ Prints the contents of ​Dictionary D​ to a file. More on this below. Each of the above functions has a brief description in ​Dictionary.h​ including specifics about return values and preconditions (what each function assumes to be true when it’s called). Making an Array of Lists and Notes About calloc Because we’re in C, we don’t get to use the nice Java array syntax:

MyLinkedList[] table = ​new​ MyLinkedList[tableSize]; instead we need to do what we did in Lab 3 and use ​calloc​()​:

● First we need a pointer to a ​List​:

List​** ​table

● Next, inside ​newDictionary()​ we’ll need to set our newly instantiated Dictionary’s table pointer to point to something on the heap:

calloc​(tableSize, ​sizeof​(List*)) The reason we need table to be of type List** is because it’s an array of pointers to List​s. Even though table looks a little strange compared to a Java array declaration and initialization, we can still access whatever is in it using the same array indexing we’ve been working with (assume D is of type Dictionary):

D->​table[0] = make_list(); To make a new instance of a ​DictionaryObj​ we still need to call ​malloc​()​, which requires a pointer:

Dictionary D = ​malloc​(​sizeof​(DictionaryObj)); Note: ​One important thing to remember about ​sizeof​()​ is you want to give it the struct​ not the pointer. So when we want to make a new dictionary, we don’t want to give ​sizeof​()​ ​Dictionary​, we want to give it ​DictionaryObj​ because remember that Dictionary​ is really just another name for a ​DictionaryObj*​. A note on ​printDictionary() The function ​void​ ​printDictionary​(​FILE* ​out​, Dictionary D​)​ is a little more flexible than what we’ve been doing in C up to this point since it’s able to take any arbitrary file and write to it. Instead of using ​printf​()​ which only lets us print to stdout​, we can use ​fprintf​()​ which functions almost identically to ​printf​()​ with one difference. A general call to ​printf​ looks like: printf("​<format string>​", ​<argument1>​, ​<argument2>​,..., ​<argumentn>​);

while a general call to ​fprintf​ looks like:

fprintf(​<file name>​, "​<format string>​", ​<argument1>​,..., <argumentn>​);

Similar to Lab 2 in Java, C treats standard in (​stdin​) and standard out (​stdout​) as files and you’ll notice that the calls to ​printDictionary()​ in ​DictionaryClient.c​ look like this:

printDictionary(​stdout​, A);

Compiling and Running As in Lab 4, we have a few different files we need to compile at once to have our object file actually work as intended. We also now have two different files for you to use to test your hash table: ​DictionaryClient.c​ and ​DictionaryClient2.c​. To compile them you’ll need to run the following:

bash$ gcc ​DictionaryClient​.​c​ ​Dictionary​.​c​ list.​c​ --std=c99 -o DictionaryClient

to make the object file for ​DictionaryClient​ and

bash$ gcc ​DictionaryClient2​.​c​ ​Dictionary​.​c​ list.​c​ --std=c99 -o DictionaryClient2

To make the object file for ​DictionaryClient2​. DictionaryClient is fairly simple and should print the following to the console:

two bar

five happy

one foo

three blah

seven blue

four galumph

six sad

key=​"one"​ value=​"foo" key=​"two"​ value=​"bar" key=​"three"​ value=​"blah" key=​"four"​ value=​"galumph" key=​"five"​ value=​"happy" key=​"six"​ value=​"sad" key=​"seven"​ value=​"blue" two bar

five happy

four galumph

six sad

key=​"one"​ not found ​"(null)" key=​"two"​ value=​"bar" key=​"three"​ not found ​"(null)" key=​"four"​ value=​"galumph" key=​"five"​ value=​"happy" key=​"six"​ value=​"sad" key=​"seven"​ not found ​"(null)" false

4

true

DictionaryClient2​ does something a little more fun and loads each line of its own .c file into the hash table and prints the hashtable to a file named DictionaryClient2-out​, which should have the following contents:

line​ ​53​: key = &keyBuffer[keyBufferOffset]; line​ ​36​: value = &valBuffer[valBufferOffset]; line​ ​63​: freeDictionary(&A); line​ ​18​: ​char​* value; line​ ​45​: key = &keyBuffer[keyBufferOffset]; line​ ​11​: ​#define MAX_LEN 180 line​ ​28​: ​while​( fgets(​line​, MAX_LEN, in)!=NULL ){ line​ ​55​: insert(A, key, value); line​ ​21​: ​int​ keyBufferOffset = ​0​, valBufferOffset = ​0​; line​ ​38​: valBufferOffset = valBufferLength; line​ ​9​: ​#include"Dictionary.h"

line​ ​65​: free(valBuffer); line​ ​47​: keyBufferOffset = keyBufferLength; line​ ​31​: lineNumber++; line​ ​2​: ​// DictionaryClient2.c line​ ​13​: ​int​ main(​int​ argc, ​char​* argv[]){ line​ ​40​: ​// put label in keyBuffer line​ ​57​: valBufferOffset += (strlen(value) + ​1​); line​ ​23​: ​char​ ​line​[MAX_LEN+​1​]; line​ ​50​: ​// put keys and values in dictionary A line​ ​67​: fclose(out); line​ ​49​: line​ ​33​: ​line​[lineLength] = ​'\0'​; ​// overwrite newline '\n' with null '\0'

line​ ​60​: printDictionary(out, A); line​ ​15​: FILE* in = fopen(​"DictionaryClient2.c"​, ​"r"​); line​ ​42​: labelLength = strlen(label); line​ ​59​: line​ ​5​: line​ ​70​: } line​ ​25​: ​int​ i, labelLength, lineLength, lineNumber = ​0​; line​ ​52​: ​for​(i=​0​; i<lineNumber; i++){ line​ ​69​: ​return​(EXIT_SUCCESS); line​ ​35​: valBuffer = realloc(valBuffer, valBufferLength*​sizeof​(​char​) ); line​ ​62​: ​// free memory and close files line​ ​17​: ​char​* key; line​ ​44​: keyBuffer = realloc(keyBuffer, keyBufferLength*​sizeof​(​char​) ); line​ ​7​: ​#include<stdlib.h> line​ ​10​: line​ ​27​: ​// read input files line​ ​54​: value = &valBuffer[valBufferOffset]; line​ ​20​: ​char​* valBuffer = NULL; line​ ​37​: strcpy(value, ​line​); line​ ​8​: ​#include<string.h> line​ ​64​: free(keyBuffer); line​ ​19​: ​char​* keyBuffer = NULL; line​ ​46​: strcpy(key, label);

line​ ​30​: ​// put line in valBuffer line​ ​1​: //-------------------------------------------------------------------

----------

line​ ​12​: line​ ​29​: line​ ​56​: keyBufferOffset += (strlen(key) + ​1​); line​ ​22​: ​int​ keyBufferLength = ​0​, valBufferLength = ​0​; line​ ​39​: line​ ​66​: fclose(in); line​ ​48​: } line​ ​32​: lineLength = strlen(​line​)​-1​; line​ ​3​: ​// Another test client for the Dictionary ADT line​ ​14​: Dictionary A = newDictionary(); line​ ​41​: sprintf(label, ​"line %d:\t"​, lineNumber); line​ ​58​: } line​ ​4​: //-------------------------------------------------------------------

----------

line​ ​24​: ​char​ label[MAX_LEN+​1​]; line​ ​51​: keyBufferOffset = valBufferOffset = ​0​; line​ ​68​: line​ ​34​: valBufferLength += (lineLength+​1​); line​ ​61​: line​ ​16​: FILE* out = fopen(​"DictionaryClient2-out"​, ​"w"​); line​ ​43​: keyBufferLength += (labelLength+​1​); line​ ​6​: ​#include<stdio.h> line​ ​26​:

Memory Management We’ve now done two labs in C and have a bit of experience doing manual memory management so you’ll notice that our ​freeDictionary()​ function is asking for something a little odd, a ​Dictionary*​ (remember that ​Dictionary​ is another name for DictionaryObj*​). The same is true of ​freeEntry()​ wanting an ​Entry*​ rather than an Entry​ (another name for ​EntryObj*​). The reason for this is ​to allow the function that is responsible for freeing memory to also be able to set the pointer that is

pointing to the freed memory to NULL ​. Put more simply, it’s an emulation of the way deleting something in Java works since in Java you don’t have to worry about a reference pointing to deallocated memory. What does this mean for you? If you look at either of the ​DictionaryClient.c​ files you’ll notice that their calls to ​freeDictionary()​ use ​&A​. ​&A​ means we’re getting the address of ​A​, which is of type ​Dictionary​ (really a ​DictionaryObj*​). So when we say freeDictionary(&A)​, we’re passing it the pointer to a pointer, which is what freeDictionary()​ wants. When we’re actually writing our destructor method, instead of simply freeing what we were passed as we have done before with List:

void​ ​free_node​(Node* node)​{ ​free​(node->data); ​free​(node); }

as an example, we need to dereference the pointer to the pointer. In freeDictionary()​’s case ​pD​ is a pointer to a pointer to an instance of a DictionaryObj:

&pD->pD​->​instance​ ​of​ ​DictionaryObj If all we have is ​&pD​, how do we get to our instance? We can use the ​*​ operator to dereference​ ​&pD​ like this:

(*pD)

Once we have (​*pD​), we can treat it just like a regular pointer! But because we also have ​&pd​ (the pointer to the pointer), this allows us to set ​pD​ to ​NULL​ ​inside the freeDictionary()​ function ​. Note:​ Remember that the ​List​ we built in Lab 4 uses a ​void​*​ for data and since we’ll be storing an ​Entry​ in a ​Node​ struct, we’ll need to loop through the ​List​ calling freeEntry()​ on each Node’s data pointer. Once we finish freeing each ​Entry​, ​then we’re allowed to call ​free_list()​.

Reminders About valgrind Make sure you’re running valgrind as follows for both DictionaryClient and DictionaryClient2:

valgrind​ ​--leak-check=full -v ./DictionaryClient

valgrind​ ​--leak-check=full -v ./DictionaryClient2 The important things to look out for (after segmentation faults) is that the number of allocations and the number of frees should be the same. Generally when these are equal you won’t have any memory leaks, though the ​in use ​ report at exit part of the HEAP SUMMARY will give you a more detailed breakdown.

Grading TODO 1: 85 points

● Entry​ and ​Dictionary​ structs completed (10 points) ● newDictionary()​, ​newEntry()​, ​freeDictionary()​, ​freeEntry()

implemented (15 points) ● size()​ and ​isEmpty()​ implemented (5 points) ● delete​()​ and ​makeEmpty()​ implemented (20 points) ● insert()​ and ​lookup()​ implemented (25 points) ● printDictionary()​ implemented (10 points)

TODO 2: (EXTRA CREDIT) 10 points (no memory leaks) Style: 15 points As a note, it is far more important to turn in code that compiles and is a TODO or only partially does a TODO, than it is to attempt a TODO and make your code uncompilable. If you code doesn’t compile, you will receive 40 points off, so make sure your code compiles before you submit it (even if you are missing a TODO).

Turning in the Lab ● Create a directory with the following name: ​<student ID>_lab5​ where you

replace ​<student ID>​ with your actual student ID. For example, if your student ID is 1234567, then the directory name is ​1234567​_lab5​.

● Put a copy of your edited ​Dictionary.h, Dictionary.c , list.h, list.c, DictionaryClient.c, and DictionaryClient2.c​ ​files in the directory.

● Compress the folder using zip. Zip is a compression utility available on mac, linux and windows that can compress a directory into a single file. This should result in a file named ​<student ID>_lab5.zip​ (with ​<student ID>​ replaced with your real ID of course).

● Upload the zip file through the ​page for Lab 5 in canvas​.