05 — Hashing (Hash Maps, Hash Sets, and Friends)¶

Interview coding-এ সবচেয়ে বেশি ব্যবহৃত trick হলো hashing। Array যদি তোমার হাত হয়, তাহলে hashing হলো তোমার স্মৃতি।

এই data structure-টা কী?¶

Hash table key → value pair জমা রাখে আর তিনটা প্রশ্নের উত্তর প্রায় সাথে সাথে দিয়ে দেয়:

"এই key-টা কি আমার কাছে আছে?"
"এই key-র জন্য কোন value রেখেছিলাম?"
"এই key-টা মুছে দাও।"

Python-এ hashing-এর সাথে তোমার দেখা হয় dict, set, collections.Counter, আর collections.defaultdict হিসেবে। ভেতরে ভেতরে সবগুলো একই idea ব্যবহার করে: key-কে একটা সংখ্যায় (the hash) বদলে ফেলো, আর সেই সংখ্যা দিয়ে array-র ঠিক slot-টায় সরাসরি লাফ দাও।

কেন এটা দরকার (কোন কষ্টটা দূর করে)¶

ভাবো একটা সাধারণ list-এ একটা নাম খুঁজছ:

Unsorted list: প্রতিটা item চেক করো → প্রতি lookup-এ O(n)।
Sorted list: binary search → O(log n), কিন্তু insert করলে sorted রাখতে হয়, যেটা slow।

Hashing বলে: খোঁজা বন্ধ করো। Item-টা থেকেই হিসেব করে ফেলো জিনিসটা কোথায় থাকতে বাধ্য, আর সরাসরি সেখানে চলে যাও। Average lookup, insert, আর delete — সবই হয়ে যায় O(1)। এতেই "loop-এর ভেতরে loop" solution-গুলো single-pass solution-এ বদলে যায়।

বাস্তব software-এ কোথায় ব্যবহার হয়¶

Python-এর নিজের object system: প্রতিটা attribute access (obj.name) একটা dict-এর ভেতর দিয়ে যায়।
Database index আর in-memory cache (Redis আসলে একটা বিশাল hash table)।
De-duplication: "এই user / URL / file checksum কি আগে দেখেছি?"
জিনিস গোনা: word frequency, log analysis, vote tally।
Router আর compiler: symbol table, MAC address table।

Prerequisites¶

Array আর basic loop (../02-arrays-and-strings/ যদি তোমার repo-তে থাকে, না হলে যেকোনো array basics)।
Big-O intuition: O(1), O(n), O(n^2) কেমন লাগে সেই অনুভূতি।
Python basics: function, for loop, if।

শেখার আগে কী revise করবে¶

Modulo operator % — hashing এটা ব্যবহার করে বড় সংখ্যাকে ছোট index-এ মুড়ে ফেলতে।
Python list-এ index কীভাবে কাজ করে (a[i] হলো O(1))।
Tuple বনাম list (tuple dict-এর key হতে পারে, list পারে না — কেন, সেটা এখানে শিখবে)।

Learning goals¶

[ ] এক বাক্যে বলতে পারা dict lookup average-এ কেন O(1)।
[ ] স্মৃতি থেকে buckets-and-chaining ছবিটা আঁকতে পারা।
[ ] Collision কী আর সেটা সামলানোর একটা উপায় বলতে পারা।
[ ] কখন dict, set, Counter, আর defaultdict — প্রত্যেকটা কোথায় জ্বলে ওঠে, সেটা জানা।
[ ] এক pass-এ Two Sum solve করা আর "যা দেখেছ মনে রাখো" tweak-টা ব্যাখ্যা করা।
[ ] Canonical key দিয়ে anagram group করা।
[ ] Prefix sum-এর সাথে hash map জোড়া লাগানো (subarray-sum pattern)।
[ ] একদম গোড়া থেকে একটা ছোট্ট hash map বানানো (দেখো implementation.py)।

Problem categories¶

Category	One-line idea
Frequency counting	Map দিয়ে occurrence গোনো, তারপর count নিয়ে ভাবো
Complement lookup	যা দেখেছ জমা রাখো; জিজ্ঞেস করো "আমার missing piece কি আছে?"
Grouping by canonical key	"Equivalent" item-গুলোকে একই key-তে map করো (sorted string, signature)
Seen-set / de-duplication	Set "আগে দেখা হয়েছে?" প্রশ্নের উত্তর দেয় O(1)-এ
Prefix sum + hash map	আগের prefix value মনে রেখে subarray গোনো
Hash as index accelerator	O(n) membership scan-কে O(1) লাফে বদলে দাও

Practice list¶

প্রতিটা problem তার pattern দিয়ে tag করা আছে। প্রতিটা band-এর ভেতরে ক্রমানুসারে solve করো।

Easy¶

Problem	Source	Pattern
Two Sum	https://leetcode.com/problems/two-sum/	Complement lookup
Contains Duplicate	https://leetcode.com/problems/contains-duplicate/	Seen-set
Valid Anagram	https://leetcode.com/problems/valid-anagram/	Frequency counting
First Unique Character in a String	https://leetcode.com/problems/first-unique-character-in-a-string/	Frequency counting
Ransom Note	https://leetcode.com/problems/ransom-note/	Frequency counting
Intersection of Two Arrays	https://leetcode.com/problems/intersection-of-two-arrays/	Seen-set
Missing Number	https://leetcode.com/problems/missing-number/	Seen-set (একটা math trick-ও আছে)

Medium¶

Problem	Source	Pattern
Group Anagrams	https://leetcode.com/problems/group-anagrams/	Grouping by canonical key
Top K Frequent Elements	https://leetcode.com/problems/top-k-frequent-elements/	Frequency counting
Longest Consecutive Sequence	https://leetcode.com/problems/longest-consecutive-sequence/	Seen-set + smart start
Subarray Sum Equals K	https://leetcode.com/problems/subarray-sum-equals-k/	Prefix sum + hash map
Valid Sudoku	https://leetcode.com/problems/valid-sudoku/	প্রতিটা row/col/box-এ seen-set
Encode and Decode TinyURL	https://leetcode.com/problems/encode-and-decode-tinyurl/	Hash as index accelerator
4Sum II	https://leetcode.com/problems/4sum-ii/	Complement lookup (pair sums)
Continuous Subarray Sum	https://leetcode.com/problems/continuous-subarray-sum/	Prefix sum + hash map (mod trick)

Hard¶

Problem	Source	Pattern
Minimum Window Substring	https://leetcode.com/problems/minimum-window-substring/	Frequency counting + sliding window
Substring with Concatenation of All Words	https://leetcode.com/problems/substring-with-concatenation-of-all-words/	Frequency counting + windows
Sum Symmetric / pair-count style tasks on Codeforces	https://codeforces.com/problemset/ ("hashing" tag দিয়ে filter করো)	CP scale-এ complement lookup
CSES "Sum of Two Values"	https://cses.fi/problemset/ (task name: Sum of Two Values)	Complement lookup

Visualization ideas¶

8টা bucket-কে বাক্স হিসেবে আঁকো; hash(key) % 8 দিয়ে 5টা key ছুড়ে দাও; দেখো একটা বাক্সে দুটো key পড়ে (collision) আর সেখানে একটা chain গজায়।
Two Sum-এর animation বানাও: একটা pointer array ধরে হাঁটে আর পাশে একটা "memory cloud" (the dict) ভরতে থাকে।
Load factor plot করো: table ~0.7 পেরোলে chain লম্বা হতে থাকে; একটা resize আবার সেগুলো খালি করে দেয়।
Interactive version-এর জন্য https://visualgo.net/en → Hash Table দেখো।

Common mistakes (সাধারণ ভুল)¶

যেখানে set চলত সেখানে list ব্যবহার → loop-এর ভেতরে accidental O(n) membership check → O(n^2)।
Dict-এর উপর iterate করতে করতে সেটাকে mutate করা (RuntimeError)।
List-কে dict-এর key বানানোর চেষ্টা (list unhashable; tuple-এ convert করো)।
ভুলে যাওয়া যে worst-case-এ dict operation O(n); adversarial input-ও আছে (interview-তে বিরল, CP-তে বাস্তব)।
d[k] (key না থাকলে KeyError) আর d.get(k) (None ফেরত দেয়) গুলিয়ে ফেলা।
Float-কে key বানানো: 0.1 + 0.2 != 0.3, তাই float key চুপচাপ miss করে।

Interview problem-এর সাথে সংযোগ¶

Hashing হলো default "make it faster" চাল। বেশিরভাগ O(n^2) brute-force interview উত্তর dict বা set-এ জিনিস মনে রেখে O(n)-এ নেমে আসে। Two Sum, Group Anagrams, আর Subarray Sum Equals K — এই তিনটাই canonical demonstration; প্রায় প্রতিটা big-tech interview loop-এ অন্তত একটা hashing pattern ছুঁয়ে যায়।

Competitive programming-এর সাথে সংযোগ¶

Codeforces-এর প্রায় প্রতিটা Div 2 A/B counting problem-এ frequency map হাজির থাকে।
Prefix sum + hash map subarray গোনে O(n^2)-র বদলে O(n)-এ।
Rolling hash (patterns.md-এ teaser আছে) fast string matching চালায়; দেখো https://cp-algorithms.com/string/string-hashing.html।
Codeforces-এর anti-hash test থেকে সাবধান: competitively string-hashing করলে তোমার hash base randomize করো।

Recommended learning order¶

concept.md — analogy, buckets-এর ছবি, O(1) কেন কাজ করে।
visual-explanation.md — frame-by-frame: key → hash → bucket, সাথে একটা collision।
operations.md — প্রতিটা dict/set operation, তার cost আর pitfall সহ।
patterns.md — ছয়টা named pattern, worked example সহ (Two Sum এখানেই থাকে)।
implementation.py — run করো; chaining + resize সহ from-scratch HashMap-টা পড়ো।
problems/README.md — practice tracker শুরু করো, easy থেকে hard।

Source map¶

এই folder-এর সব source, link, আর copying status: দেখো source-map.md।