Hashing — Named Patterns¶

Pattern মানে reusable চিন্তা। নিচের প্রত্যেকটা তোমাকে বলবে: কখন এটার দিকে হাত বাড়াবে (trigger words), core idea-টা কী, কোন thinking tweak একটা brute force-কে upgrade করে, একটা ছোট্ট worked example, এটা কী থেকে inherit করে, আর practice-এর problem।

Pattern 1: Frequency counting¶

Trigger words: "how many times", "most frequent", "anagram", "same letters", "can you build X from Y".
Core idea: এক pass-এ value → count বানাও, তারপর raw item নিয়ে না ভেবে count নিয়ে ভাবো।
Inherits from: সাধারণ array counting (ছোট alphabet-এর count array হলো পূর্বপুরুষ; hash map হলো unlimited, sparse index-ওয়ালা একটা count array)।
Thinking tweak: item-গুলোকে একে অপরের সাথে তুলনা করা বন্ধ করো; তাদের count profile তুলনা করো।

Worked example — "listen" আর "silent" কি anagram?

from collections import Counter
Counter("listen") == Counter("silent")    # True

Dry run: দুটোই বানায় {'l':1,'i':1,'s':1,'t':1,'e':1,'n':1}; dict equality profile তুলনা করে, ordering না।

Problems: Valid Anagram https://leetcode.com/problems/valid-anagram/, Top K Frequent Elements https://leetcode.com/problems/top-k-frequent-elements/, Ransom Note https://leetcode.com/problems/ransom-note/.

Pattern 2: Complement lookup — the Two Sum pattern (REQUIRED reading)¶

Trigger words: "find two numbers that...", "pair with sum/difference/product", "does a partner exist".
Core idea: প্রতিটা item x-এর জন্য তোমার দরকারি partner-টা হিসেব করা যায়: target - x। Partner খুঁজে বেড়ানোর বদলে hash map-কে জিজ্ঞেস করো "আমার partner-কে কি আগেই দেখেছি?"
Inherits from: frequency counting + complement thinking (যা আছে সেটা গুনছ/লিখে রাখছ, তারপর query করছ)।
Data structure: hash map (value → index) বা hash set (যখন শুধু existence দরকার)।
Thinking tweak: pair খোঁজার বদলে মনে রাখো এখন পর্যন্ত কী কী দেখেছ। Brute force সামনের দিকে partner খুঁজে scan করে (প্রতি element-এ O(n))। Pattern-টা পেছনের স্মৃতিতে তাকায় (প্রতি element-এ O(1))।

Worked example — Two Sum (https://leetcode.com/problems/two-sum/), task-টা নিজেদের ভাষায়: একটা array আর একটা target দেওয়া আছে, এমন দুটো entry-র index ফেরত দাও যাদের যোগফল target।

def two_sum(nums, target):
    seen = {}                      # value -> index
    for i, x in enumerate(nums):
        need = target - x          # compute the partner
        if need in seen:           # has the past supplied it?
            return [seen[need], i]
        seen[x] = i                # remember myself for the future
    return []

nums=[3, 8, 5, 4], target=9-এর উপর dry run:

i=0 x=3 need=6: 6 in {}?        no  → seen={3:0}
i=1 x=8 need=1: 1 in {3:0}?     no  → seen={3:0, 8:1}
i=2 x=5 need=4: 4 in seen?      no  → seen={3:0, 8:1, 5:2}
i=3 x=4 need=5: 5 in seen?      YES → return [2, 3]

O(n) time, O(n) space, এক pass। একই tweak solve করে "pair with difference k", "দুটো string যারা মিলে palindrome বানায়", আর (দ্বিগুণ করে) 4Sum II।

Problems: Two Sum https://leetcode.com/problems/two-sum/, 4Sum II https://leetcode.com/problems/4sum-ii/, CSES "Sum of Two Values" at https://cses.fi/problemset/ (task name: Sum of Two Values).

Pattern 3: Grouping by canonical key¶

Trigger words: "group the...", "which items are equivalent", "buckets of similar things".
Core idea: এমন একটা function বানাও যেটা সব "equivalent" item-কে একটাই identical key-তে (canonical form) map করে, তারপর defaultdict(list) grouping-টা ফ্রি-তে করে দেয়।
Inherits from: frequency counting (canonical key হলো একটা জমাট-বাঁধা count profile বা normalized form)।
Thinking tweak: item-গুলোকে জোড়ায় জোড়ায় তুলনা কোরো না (O(n^2) comparison) — প্রতিটা item একবার normalize করো আর hash map-কে cluster করতে দাও।

Worked example — group anagrams (task-টা নিজেদের ভাষায়: ঠিক একই অক্ষর ব্যবহার করা word-গুলোকে একসাথে জড়ো করো)। Canonical key = sorted letters।

from collections import defaultdict

def group_anagrams(words):
    groups = defaultdict(list)
    for w in words:
        groups["".join(sorted(w))].append(w)
    return list(groups.values())

group_anagrams(["eat", "tea", "tan", "ate"])
# [['eat', 'tea', 'ate'], ['tan']]

Dry run: "eat"→"aet", "tea"→"aet", "tan"→"ant", "ate"→"aet"। তিনটা word "aet" key share করে, তাই তারা একই bucket-এ পড়ে।

লম্বা word-এর জন্য বিকল্প key: 26টা letter count-এর tuple — প্রতি word-এ O(len log len)-র বদলে O(len)।

Problems: Group Anagrams https://leetcode.com/problems/group-anagrams/, Valid Sudoku https://leetcode.com/problems/valid-sudoku/ (key = "(value, unit-id)" signatures).

Pattern 4: Seen-set for duplicates / de-duplication¶

Trigger words: "contains duplicate", "first repeated", "distinct count", "remove duplicates", "have we visited".
Core idea: set হলো অতীতের একটা নিখুঁত স্মৃতি, "আগে দেখা হয়েছে?" query-র উত্তর দেয় O(1)-এ।
Inherits from: complement lookup, একটা সরলীকরণ সহ — এখানে partner-টা জিনিসটা নিজেই।
Thinking tweak: "এটা কি পরে repeat করবে?" প্রশ্নটা উল্টে যায় "এটা কি আগে এসেছিল?"-তে — অতীতকে query করা যায়, ভবিষ্যৎকে না।

Worked example — প্রথম যে সংখ্যাটা repeat করে:

def first_repeat(nums):
    seen = set()
    for x in nums:
        if x in seen:
            return x
        seen.add(x)
    return None

first_repeat([2, 5, 3, 5, 2])   # 5  (5 is the first to APPEAR twice)

Dry run: 2→add, 5→add, 3→add, 5→already in {2,5,3} → return 5।

এই একই seen-set পরে এই repo-র graph BFS/DFS-এর "visited" set — state-এর de-duplication।

একটা চালাক extension: Longest Consecutive Sequence — সবকিছু একটা set-এ ভরো, আর x-এ run গোনা শুরু করো শুধু তখনই যখন x-1 set-এ নেই (তাহলে প্রতিটা run একবারই গোনা হয়, মোট O(n))।

Problems: Contains Duplicate https://leetcode.com/problems/contains-duplicate/, Intersection of Two Arrays https://leetcode.com/problems/intersection-of-two-arrays/, Longest Consecutive Sequence https://leetcode.com/problems/longest-consecutive-sequence/.

Pattern 5: Prefix sum + hash map¶

Trigger words: "how many subarrays sum to k", "subarray with sum divisible by", "longest subarray with equal...".
Core idea: subarray sum (i..j] = prefix[j] - prefix[i]। prefix[j] - prefix[i] = k চাওয়া মানে: প্রতিটা j-তে map-কে জিজ্ঞেস করো "কতগুলো আগের prefix prefix[j] - k-এর সমান?"
Inherits from: ../01-math-based-programming-fundamentals/05-prefix-difference-contribution/-এর prefix-sum idea PLUS Pattern 2-এর complement-lookup tweak। এটা আক্ষরিক অর্থেই prefix value-র উপর করা Two Sum।
Thinking tweak: একটা range-এর প্রশ্নকে pair-of-prefixes-এর প্রশ্নে বদলে ফেলো, তারপর remember-what-you've-seen।

Worked example — [1, 1, 1]-এ k=2 যোগফলওয়ালা subarray গোনো (Subarray Sum Equals K task-টা, নিজেদের ভাষায়: যে contiguous run-গুলোর মোট k, সেগুলো গোনো)।

def subarray_sum(nums, k):
    count = 0
    prefix = 0
    seen = {0: 1}                  # empty prefix occurs once
    for x in nums:
        prefix += x
        count += seen.get(prefix - k, 0)
        seen[prefix] = seen.get(prefix, 0) + 1
    return count

subarray_sum([1, 1, 1], 2)         # 2

Dry run:

start: prefix=0, seen={0:1}, count=0
x=1: prefix=1, need 1-2=-1 → 0 found; seen={0:1, 1:1}
x=1: prefix=2, need 0 → seen[0]=1 → count=1; seen={0:1,1:1,2:1}
x=1: prefix=3, need 1 → seen[1]=1 → count=2; seen={0:1,1:1,2:1,3:1}
answer: 2   (subarrays [1,1] starting at index 0 and at index 1)

seen = {0: 1} কেন? Empty prefix-টা থাকলে index 0 থেকে শুরু হওয়া subarray-গুলোও গোনা যায়।

Problems: Subarray Sum Equals K https://leetcode.com/problems/subarray-sum-equals-k/, Continuous Subarray Sum https://leetcode.com/problems/continuous-subarray-sum/ (prefix-এর বদলে prefix % k জমা রাখো).

Pattern 6: Hash-based de-duplication of states / answers¶

Trigger words: "distinct results", "unique combinations", "avoid revisiting", "canonical output".
Core idea: একটা process যখন একই জিনিস নানা পথে বানাতে পারে, প্রতিটা result-এর canonical encoding একটা set-এ hash করো; শুধু প্রথম দেখাগুলোই রাখো।
Inherits from: seen-set (Pattern 4) + canonical key (Pattern 3) — দুটো একসাথে কাজ করছে।
Thinking tweak: duplicate বানানো ঠিক আছে যদি duplicate চেনা O(1) হয়; encoding-টা (sorted tuple, frozenset, string form) এমনভাবে design করো যাতে equivalent result-গুলো ইচ্ছে করেই collide করে।

ছোট্ট example — distinct pair sum:

def distinct_pair_sums(nums):
    sums = set()
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            sums.add(nums[i] + nums[j])
    return len(sums)

distinct_pair_sums([1, 2, 3])      # {3, 4, 5} → 3

ঠিক এই চালটাই তুমি ../06-recursion-and-backtracking/-এ ব্যবহার করবে repeated value-ওয়ালা input-এর subset/permutation de-duplicate করতে, আর BFS-এ visited board state encode করতে।

Problems: Encode and Decode TinyURL https://leetcode.com/problems/encode-and-decode-tinyurl/, plus any "count distinct X" warm-up; Codeforces tag "hashing" at https://codeforces.com/problemset/.

Pattern 7 (teaser): Rolling hash¶

Trigger words: "find this substring fast", "compare many substrings", "repeated DNA sequences", "string matching".
Core idea: string-কে কোনো base-এর একটা সংখ্যা হিসেবে দেখো; window এক character sliding করলে hash update হয় O(1)-এ (বেরিয়ে যাওয়া char-এর contribution বিয়োগ, multiply, ঢুকতে থাকা char যোগ)। Character-by-character string তুলনার বদলে hash তুলনা করো।
Inherits from: math fundamentals-এর place-value/positional চিন্তা plus seen-set pattern (দেখা window hash জমা রাখো)।
Thinking tweak: length-m string-এর equality test O(m) থেকে নেমে আসে O(1)-এ — দাম হিসেবে বিরল false collision, যেটা verification বা double hashing দিয়ে সামলানো হয়।

এটা শুধুই একটা teaser — mod arithmetic আর anti-collision trick সহ পুরো treatment আছে https://cp-algorithms.com/string/string-hashing.html-এ। Competitive note: base/mod randomize করো, কারণ fixed parameter-গুলোকে Codeforces-এর anti-hash test case attack করে।

Pattern-গুলো কীভাবে একে অপরের সাথে জোড়া লাগে¶

count array (math basics)
      │
      ▼
frequency counting ──► grouping by canonical key
      │
      ▼
complement lookup (Two Sum) ──► prefix sum + hash map
      │
      ▼
seen-set ──► de-duplication of states ──► visited sets in graphs/backtracking
      │
      ▼
rolling hash (strings, CP)

Pattern 2-এর tweak-টা আয়ত্ত করো — search না করে remember করো — বাকিগুলো তারই variation।

পরেরটা: implementation.py run করে পড়ো, তারপর problems/README.md খোলো।