Week 6 Trees, Heap, and Hashing
This week in Design and Analysis of Algorithms, we took a look at two types of balanced binary search trees and their corresponding algorithms such as the AVL Tree and the 2-3 Tree. We learned to figure out what qualifies as an AVL Tree or 2-3 Tree and how to maintain its qualifications after doing operations on it such as insertion. Additionally we explored how to build a heap given a list of inputs, and how to keep the heap maintained after inserting or removing elements. Lastly we learned about the hashing method using hash-tables for storing values in order to produce fast data look up.
When we took a look at the AVL tree we analyzed what qualifications it needs to be labeled as an AVL tree. Briefly explained in the lecture videos, a tree is an AVL if it is a binary search tree that maintains a strict balance condition. There needs to be a balance between the left and right subtrees and we can do this by finding the balance factor, accordingly in the lecture it is described as the difference between height of the left subtree minus height of the right subtree for each node. When calculating the balance factor for each node, we check if there is an imbalance which usually happens if the balancing factor is not -1,0, or 1. If there is an imbalance then we need to operate on the node that has the imbalance and utilize AVL tree rotations. These rotations are: if the node with the imbalance has a negative balance factor and the following is a -1 then we issue a left rotation, if the imbalance is positive and the following is positive then issue a right rotation, if the imbalance is negative and the following is a positive we do a right/left rotation, and vice versa if the imbalance is positive and the following is negative or in other words a left/right rotation. At first the balancing factors were hard to understand in the lectures but after re-watching the video a couple of times it finally clicked and I clearly understood the pattern. There were some youtube videos doing it quite differently than what the professor was explaining so that made it a bit more confusing; Also the quiz questions were helpful in understanding balancing factors and rotations.
The 2-3 tree is quite different on the hand, it involves having either a 2-node or 3-node implementation. For example for a 2 node, there needs to be 1 key or a value in the node followed by 2 children, or a 3-node which is 2 keys or a value in the node followed by 3 children. In addition all the leaves are at the same depth. Also the order must be maintained, for example a node could be 10,20 followed by children, left child 5, center child 15, right child 40. When inserting a value to a node, if the node contains 3 keys then the value/key down the middle needs to be promoted upwards.
In regards to the heap, we mainly looked at the Max-heap which says that the root value must be the maximum value. The main operations which involved the max-heap this week were finding the max value, removing a max value, and adding a number while maintaining the heap characteristics. When adding a value to the heap, it is usually placed in the left most available spot, then it's checked with the parent to see if its value is more, in other words if parent ≥ children going upwards. If the value is greater than the parent, we do a swap with the parent node until a maximum at the root node is satisfied. To remove a max value in the heap we need to get the last node and replace it with the root node, then go downwards and check if the root node is greater than its child or again parent ≥ children but doing down the nodes until it becomes properly heaped. Additionally we discussed how to construct the heap using a bottom-up algorithm, which basically involves taking input keys, starting at the rightmost parental node, fix the heap rooted at it, if it's not a heap then swap with the largest child until it becomes a heap, then keep doing this for nodes before it. I thought the algorithm was quite interesting when implementing it for homework 5-1.
Finally the last concept was hashing which involved creating hash tables for fast data lookups. The hash table is created by using a hash function to convert keys into array indices which allows for faster searches. But a problem that occurs is when the hash function creates the same index more than once which is called a collision. This is solved by two operations called separate chaining and linear probing. We took a look at both but I am just going to mention linear probing because it was utilized more this week due to the homework. Essentially it is just doing the hashing but if the following keys have the same index then go and place them in the next open index. Another implementation for hashtables is by utilizing load factors which are the number of keys divided by the hashtable size in order to create efficient hashtables. If the load factor after an insert is greater than a pre-defined load factor, then a rehash needs to be done by recreating a hashtable that is the size of the first prime number after the previous hash table size was doubled. A new hash function is created utilizing the new size, and the old values from the previous hash-tables are processed with it.
It seems I wrote a lot again but there was so much to review about this week. My favorite concepts were the heap and hashing, and the most difficult to understand was the AVL tree. I thought this week was very informative and practical with the amount of exercise power-points included in the module. The final is coming up so I definitely have to prepare for that. Overall it was another productive jam packed week!
When we took a look at the AVL tree we analyzed what qualifications it needs to be labeled as an AVL tree. Briefly explained in the lecture videos, a tree is an AVL if it is a binary search tree that maintains a strict balance condition. There needs to be a balance between the left and right subtrees and we can do this by finding the balance factor, accordingly in the lecture it is described as the difference between height of the left subtree minus height of the right subtree for each node. When calculating the balance factor for each node, we check if there is an imbalance which usually happens if the balancing factor is not -1,0, or 1. If there is an imbalance then we need to operate on the node that has the imbalance and utilize AVL tree rotations. These rotations are: if the node with the imbalance has a negative balance factor and the following is a -1 then we issue a left rotation, if the imbalance is positive and the following is positive then issue a right rotation, if the imbalance is negative and the following is a positive we do a right/left rotation, and vice versa if the imbalance is positive and the following is negative or in other words a left/right rotation. At first the balancing factors were hard to understand in the lectures but after re-watching the video a couple of times it finally clicked and I clearly understood the pattern. There were some youtube videos doing it quite differently than what the professor was explaining so that made it a bit more confusing; Also the quiz questions were helpful in understanding balancing factors and rotations.
The 2-3 tree is quite different on the hand, it involves having either a 2-node or 3-node implementation. For example for a 2 node, there needs to be 1 key or a value in the node followed by 2 children, or a 3-node which is 2 keys or a value in the node followed by 3 children. In addition all the leaves are at the same depth. Also the order must be maintained, for example a node could be 10,20 followed by children, left child 5, center child 15, right child 40. When inserting a value to a node, if the node contains 3 keys then the value/key down the middle needs to be promoted upwards.
In regards to the heap, we mainly looked at the Max-heap which says that the root value must be the maximum value. The main operations which involved the max-heap this week were finding the max value, removing a max value, and adding a number while maintaining the heap characteristics. When adding a value to the heap, it is usually placed in the left most available spot, then it's checked with the parent to see if its value is more, in other words if parent ≥ children going upwards. If the value is greater than the parent, we do a swap with the parent node until a maximum at the root node is satisfied. To remove a max value in the heap we need to get the last node and replace it with the root node, then go downwards and check if the root node is greater than its child or again parent ≥ children but doing down the nodes until it becomes properly heaped. Additionally we discussed how to construct the heap using a bottom-up algorithm, which basically involves taking input keys, starting at the rightmost parental node, fix the heap rooted at it, if it's not a heap then swap with the largest child until it becomes a heap, then keep doing this for nodes before it. I thought the algorithm was quite interesting when implementing it for homework 5-1.
Finally the last concept was hashing which involved creating hash tables for fast data lookups. The hash table is created by using a hash function to convert keys into array indices which allows for faster searches. But a problem that occurs is when the hash function creates the same index more than once which is called a collision. This is solved by two operations called separate chaining and linear probing. We took a look at both but I am just going to mention linear probing because it was utilized more this week due to the homework. Essentially it is just doing the hashing but if the following keys have the same index then go and place them in the next open index. Another implementation for hashtables is by utilizing load factors which are the number of keys divided by the hashtable size in order to create efficient hashtables. If the load factor after an insert is greater than a pre-defined load factor, then a rehash needs to be done by recreating a hashtable that is the size of the first prime number after the previous hash table size was doubled. A new hash function is created utilizing the new size, and the old values from the previous hash-tables are processed with it.
It seems I wrote a lot again but there was so much to review about this week. My favorite concepts were the heap and hashing, and the most difficult to understand was the AVL tree. I thought this week was very informative and practical with the amount of exercise power-points included in the module. The final is coming up so I definitely have to prepare for that. Overall it was another productive jam packed week!


Comments
Post a Comment