We will discuss the Huffman Coding Algorithm, pseudo code, Code implementation of Huffman Coding in C, Java, JavaScript, Python, advantages & disadvantages, and time complexity of Huffman Coding.
Table of Contents
1. What is Huffman Coding Algorithm?
Huffman Coding is a common technique of encoding, including lossless data compression.
Huffman Coding is based on the frequency of occurrence of a data item (pixel in images). The main aim of Huffman coding is to encode the data that occurs frequently using less number of bits.
Code books store codes that are constructed for images. Both the code book and encoded data are transmitted to decode the information.
Huffman codes can be constructed using the following two ideas:
- For an optimum code, symbols that have high probability should be assigned shorter code words.
- To make optimum prefix code, two symbols that occur least frequently should be assigned the same length.
2. Pseudocode for Huffman Coding
HUFFMAN(f[1...n]) 1. T = empty binary tree 2. Q = priority queue of pairs (i, f[i]), i = 1...n, with f as comparison key 3. for each k = 1...n − 1 4. i = extractMin(Q) 5. j = extractMin(Q) 6. f[n + k] = f[i] + f[j] 7. insertNode(T, n + k) with children i, j 8. insertRear(Q,(n + k, f[n + k])) 9. return T
3. Code Implementation of Huffman Coding
3.1 Huffman Coding in C
#include <stdio.h> #include <stdlib.h> // A Huffman tree node struct MinHeapNode { char data; unsigned freq; struct MinHeapNode *left, *right; }; struct MinHeap { unsigned size; unsigned capacity; struct MinHeapNode** array; }; // Create a new node struct MinHeapNode* newNode(char data, unsigned freq) { struct MinHeapNode* temp = (struct MinHeapNode*)malloc(sizeof(struct MinHeapNode)); temp->left = temp->right = NULL; temp->data = data; temp->freq = freq; return temp; } // Create a MinHeap struct MinHeap* createMinHeap(unsigned capacity) { struct MinHeap* minHeap = (struct MinHeap*)malloc(sizeof(struct MinHeap)); minHeap->size = 0; minHeap->capacity = capacity; minHeap->array = (struct MinHeapNode**)malloc(minHeap->capacity * sizeof(struct MinHeapNode*)); return minHeap; } // Swap two nodes void swapMinHeapNode(struct MinHeapNode** a, struct MinHeapNode** b) { struct MinHeapNode* t = *a; *a = *b; *b = t; } // Heapify void minHeapify(struct MinHeap* minHeap, int idx) { int smallest = idx; int left = 2 * idx + 1; int right = 2 * idx + 2; if (left < minHeap->size && minHeap->array[left]->freq < minHeap->array[smallest]->freq) smallest = left; if (right < minHeap->size && minHeap->array[right]->freq < minHeap->array[smallest]->freq) smallest = right; if (smallest != idx) { swapMinHeapNode(&minHeap->array[smallest], &minHeap->array[idx]); minHeapify(minHeap, smallest); } } // Check if size is one int isSizeOne(struct MinHeap* minHeap) { return (minHeap->size == 1); } // Extract minimum value node from heap struct MinHeapNode* extractMin(struct MinHeap* minHeap) { struct MinHeapNode* temp = minHeap->array[0]; minHeap->array[0] = minHeap->array[minHeap->size - 1]; --minHeap->size; minHeapify(minHeap, 0); return temp; } // Insert a new node to MinHeap void insertMinHeap(struct MinHeap* minHeap, struct MinHeapNode* minHeapNode) { ++minHeap->size; int i = minHeap->size - 1; while (i && minHeapNode->freq < minHeap->array[(i - 1) / 2]->freq) { minHeap->array[i] = minHeap->array[(i - 1) / 2]; i = (i - 1) / 2; } minHeap->array[i] = minHeapNode; } // Build MinHeap void buildMinHeap(struct MinHeap* minHeap) { int n = minHeap->size - 1; int i; for (i = (n - 1) / 2; i >= 0; --i) minHeapify(minHeap, i); } // Print an array of size n void printArr(int arr[], int n) { int i; for (i = 0; i < n; ++i) printf("%d", arr[i]); printf("\n"); } // Check if node is leaf int isLeaf(struct MinHeapNode* root) { return !(root->left) && !(root->right); } // Create and build min heap struct MinHeap* createAndBuildMinHeap(char data[], int freq[], int size) { struct MinHeap* minHeap = createMinHeap(size); for (int i = 0; i < size; ++i) minHeap->array[i] = newNode(data[i], freq[i]); minHeap->size = size; buildMinHeap(minHeap); return minHeap; } // Build Huffman Tree struct MinHeapNode* buildHuffmanTree(char data[], int freq[], int size) { struct MinHeapNode *left, *right, *top; struct MinHeap* minHeap = createAndBuildMinHeap(data, freq, size); while (!isSizeOne(minHeap)) { left = extractMin(minHeap); right = extractMin(minHeap); top = newNode('$', left->freq + right->freq); top->left = left; top->right = right; insertMinHeap(minHeap, top); } return extractMin(minHeap); } // Print Huffman Codes void printCodes(struct MinHeapNode* root, int arr[], int top) { if (root->left) { arr[top] = 0; printCodes(root->left, arr, top + 1); } if (root->right) { arr[top] = 1; printCodes(root->right, arr, top + 1); } if (isLeaf(root)) { printf("%c: ", root->data); printArr(arr, top); } } // Build Huffman Tree and print codes void HuffmanCodes(char data[], int freq[], int size) { struct MinHeapNode* root = buildHuffmanTree(data, freq, size); int arr[100], top = 0; printCodes(root, arr, top); } // Driver code int main() { char arr[] = {'a', 'b', 'c', 'd', 'e', 'f'}; int freq[] = {5, 9, 12, 13, 16, 45}; int size = sizeof(arr) / sizeof(arr[0]); HuffmanCodes(arr, freq, size); return 0; }
3.1.1 Explanation of Huffman Coding
- Include Headers: #include <stdio.h>, #include <stdlib.h> for standard input-output and memory allocation.
- Struct Definitions: Define MinHeapNode and MinHeap structures for tree nodes and heap management.
- New Node Creation: Function to create a new tree node.
- MinHeap Creation: Function to create a MinHeap.
- Node Swap: Utility function to swap two nodes in the heap.
- Heapify: Maintain the heap property.
- Check Size One: Utility to check if heap size is one.
- Extract Min: Function to extract the minimum value node from the heap.
- Insert Node: Insert a new node to the heap.
- Build MinHeap: Build a min heap from an unordered array.
- Print Array: Utility to print an array.
- Check Leaf: Check if a node is a leaf.
- Create and Build MinHeap: Create and build a min heap from given data and frequency arrays.
- Build Huffman Tree: Construct the Huffman Tree.
- Print Huffman Codes: Print codes by traversing the tree.
- Generate Huffman Codes: Function to build the tree and print the codes.
- Driver Code: Main function to execute the program.
3.2 Huffman Coding in Java
import java.util.PriorityQueue; import java.util.Comparator; class Node { char data; int freq; Node left, right; Node(char data, int freq) { left = right = null; this.data = data; this.freq = freq; } } class Huffman { public static void printCodes(Node root, String str) { if (root == null) return; if (root.data != '$') System.out.println(root.data + ": " + str); printCodes(root.left, str + "0"); printCodes(root.right, str + "1"); } public static void HuffmanCodes(char data[], int freq[], int size) { Node left, right, top; PriorityQueue<Node> minHeap = new PriorityQueue<>(size, new Comparator<Node>() { public int compare(Node l, Node r) { return l.freq - r.freq; } }); for (int i = 0; i < size; ++i) minHeap.add(new Node(data[i], freq[i])); while (minHeap.size() != 1) { left = minHeap.poll(); right = minHeap.poll(); top = new Node('$', left.freq + right.freq); top.left = left; top.right = right; minHeap.add(top); } printCodes(minHeap.poll(), ""); } public static void main(String[] args) { char arr[] = {'a', 'b', 'c', 'd', 'e', 'f'}; int freq[] = {5, 9, 12, 13, 16, 45}; int size = arr.length; HuffmanCodes(arr, freq, size); } }
3.3 Huffman Coding in JavaScript
class Node { constructor(data, freq) { this.data = data; this.freq = freq; this.left = this.right = null; } } function printCodes(root, str) { if (!root) return; if (root.data !== '$') console.log(root.data + ": " + str); printCodes(root.left, str + "0"); printCodes(root.right, str + "1"); } function HuffmanCodes(data, freq, size) { let minHeap = []; for (let i = 0; i < size; ++i) { minHeap.push(new Node(data[i], freq[i])); } minHeap.sort((a, b) => a.freq - b.freq); while (minHeap.length !== 1) { let left = minHeap.shift(); let right = minHeap.shift(); let top = new Node('$', left.freq + right.freq); top.left = left; top.right = right; minHeap.push(top); minHeap.sort((a, b) => a.freq - b.freq); } printCodes(minHeap[0], ""); } let arr = ['a', 'b', 'c', 'd', 'e', 'f']; let freq = [5, 9, 12, 13, 16, 45]; let size = arr.length; HuffmanCodes(arr, freq, size);
3.4 Huffman Coding in Python
import heapq class Node: def __init__(self, data, freq): self.data = data self.freq = freq self.left = None self.right = None def __lt__(self, other): return self.freq < other.freq def printCodes(root, code): if root is None: return if root.data != '$': print(f'{root.data}: {code}') printCodes(root.left, code + '0') printCodes(root.right, code + '1') def HuffmanCodes(data, freq, size): minHeap = [] for i in range(size): heapq.heappush(minHeap, Node(data[i], freq[i])) while len(minHeap) != 1: left = heapq.heappop(minHeap) right = heapq.heappop(minHeap) top = Node('$', left.freq + right.freq) top.left = left top.right = right heapq.heappush(minHeap, top) printCodes(heapq.heappop(minHeap), '') arr = ['a', 'b', 'c', 'd', 'e', 'f'] freq = [5, 9, 12, 13, 16, 45] size = len(arr) HuffmanCodes(arr, freq, size)
4. Advantages of Huffman Coding
- Optimality: Huffman coding is optimal for a set of symbols with known probabilities.
- Lossless Compression: Ensures no loss of data.
- Efficiency: Suitable for compressing data with variable-length encoding.
5. Disadvantages of Huffman Coding
- Complexity: More complex to implement compared to simpler methods like run-length encoding.
- Dependency on Data: Performance depends on the frequency distribution of the data.
- Decompression Speed: Decompression can be slower as each bit needs to be checked against the tree.
6. Time Complexity of Huffman Coding
The Overall Time Complexity of Huffman’s algorithm is O(n log n).
FAQs
What is Huffman Coding?
Huffman Coding is a common technique of encoding, including lossless data compression. Huffman Coding is based on the frequency of occurrence of a data item (pixel in images). The main aim of Huffman coding is to encode the data that occurs frequently using less number of bits.
What are the advantages of Huffman Coding?
Optimality: Huffman coding is optimal for a set of symbols with known probabilities.
Lossless Compression: Ensures no loss of data.
Efficiency: Suitable for compressing data with variable-length encoding.
What are the disadvantages of Bubble Sort?
Complexity: More complex to implement compared to simpler methods like run-length encoding.
Dependency on Data: Performance depends on the frequency distribution of the data.
Decompression Speed: Decompression can be slower as each bit needs to be checked against the tree.