Arrays and hashing, three hours, no hand-holding | Learn

StrongYes tip

Arrays and hashing is not one trick. It is the bucket interviewers reach for when they want to see whether you can trade a brute-force scan for fast lookups, clean state, and a clear explanation.

Across more than 200 candidate screens, arrays and hashing is the category where the research logs the most preventable failures. Not because the problems are hard — they usually are not. Because people start coding before they have picked the right storage shape, and then spend the rest of the screen fighting a structure that was wrong from minute two.

In the current StrongYes interview-reference set, Arrays & Hashing is the single largest tag group with 66 linked questions. That is not a coincidence. These problems show up at the start of screens, in the middle of onsite loops, and inside harder problems that later branch into , , or design work.

LeetCode's own Top Interview Questions collection opens with array and hash map problems for a reason. Interviewers want to see whether you can translate a brute-force idea into efficient code without overcomplicating the structure. The strong answer is rarely the clever one — it is the one that names the trade-off.

The goal here is not to dump fifty problems on you. It is the small set of moves that keep repeating, so you can recognize them early and talk through them calmly.

What the interviewer is actually listening for

Strip away the problem statement and most arrays-and-hashing questions reduce to one of four decisions:

Do I need fast existence checks?
Do I need to remember where I saw something before?
Do I need counts instead of just presence?
Can I turn the input range or frequency counts into a bucketed array?

If you can name that decision out loud, you are already ahead of most candidates. The mistake is starting with code before you know what your storage has to do. The research has logged senior engineers with ten years of production experience reaching for a nested loop on Two Sum because they skipped this step. It is the same failure mode every time.

The pick-your-structure decision tree

Here is the same logic as a flowchart. Walk through it once on the prompt before you start typing. Most wrong answers in this category come from reaching for the wrong storage shape, not from a broken loop.

Diagram

Rendering diagram...

Green nodes are the first-layer plays — between them they cover the majority of Arrays & Hashing questions. Amber is the next layer (counting, fixed-width keys). Purple is the "this looks like hashing but it is really a prefix sum" trap and its top-k bucket cousin — not harder, just a different question wearing hashing clothes.

The five plays that cover most of the category

1. Seen set for duplicate or membership checks

Use a set when the whole question is really "have I seen this before?"

This is the move behind Contains Duplicate, parts of Longest Consecutive Sequence, and many "find the first repeated value" variants.

PYTHON

def contains_duplicate(nums):
    seen = set()
 
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
 
    return False

The interview move here is to say the out loud: "The set stores every number I have seen so far, so each new lookup is O(1) on average." That one sentence separates candidates who understand what they wrote from candidates who memorized the pattern.

2. Value-to-index map for complement problems

Reach for a hash map when you need more than yes/no membership. Usually you need the earlier index, the earlier count, or the earlier prefix sum.

Two Sum is the cleanest example:

PYTHON

def two_sum(nums, target):
    seen = {}  # value -> index
 
    for i, num in enumerate(nums):
        need = target - num
        if need in seen:
            return [seen[need], i]
        seen[num] = i
 
    return []

The talk track is what sells this:

"I want O(1) complement lookups."
"I check before I store, so I do not reuse the same element."
"The map answers a value-to-index question, not just a membership question."

That is a far stronger interview explanation than "I know Two Sum uses a hash map." I have hired engineers based partly on the second sentence and rejected candidates who could only produce the first.

3. Frequency map for counting and grouping

The next step up is when presence is not enough and you need exact counts.

That covers Valid Anagram, Group Anagrams, Top K Frequent Elements, and a lot of string-heavy interview questions.

PYTHON

from collections import defaultdict
 
def group_anagrams(words):
    groups = defaultdict(list)
 
    for word in words:
        count = [0] * 26
        for ch in word:
            count[ord(ch) - ord('a')] += 1
        groups[tuple(count)].append(word)
 
    return list(groups.values())

Sorting each word also works and it is fine as a first cut. A fixed-size count key is better because it shows you understand the structure of the problem, not just the API of the language. If the interviewer pushes on follow-up optimization, you already have the story ready.

4. Bucket by frequency when counts are bounded by n

Sometimes the best answer is "hash first, then array."

Top K Frequent Elements is the classic. Count with a map, then use bucket positions because no value can appear more than n times.

PYTHON

from collections import Counter
 
def top_k_frequent(nums, k):
    freq = Counter(nums)
    buckets = [[] for _ in range(len(nums) + 1)]
 
    for num, count in freq.items():
        buckets[count].append(num)
 
    result = []
    for count in range(len(buckets) - 1, 0, -1):
        for num in buckets[count]:
            result.append(num)
            if len(result) == k:
                return result
 
    return result

The reason this is a strong interview answer is range awareness. You are not reaching for a heap by habit. You are using the shape of the problem. In production this is the same instinct that tells you whether a counting sort beats a general sort — it is the same trade-off wearing different clothes.

The complexity story matters if the interviewer pushes back. Bucket sort on Top K Frequent is O(n) because you make one pass to count and one pass to walk the buckets. The heap version is O(n log k) because every insert and eject costs log k. For small k relative to n the heap is fine in practice, but the side-by-side analysis on Medium is worth reading because it shows the interviewer-facing talking point: "bucket sort is better when the frequency bound is known, heap is better when k is tiny and you want to skip allocating n+1 buckets." Either answer is correct — the win is being able to explain why you picked one.

5. Prefix sum plus hash map for subarray variants

This is the play most people miss. The array looks innocent, but the question is really about remembering an earlier running total.

Prefix sum deserves its own category even though the storage is a hash map — the key is a running total, not a raw value. I have seen this go wrong 50 times and it is always the same thing: someone recognizes the word "subarray," pattern-matches to a sliding window, and spends 20 minutes fighting a window that cannot shrink. The reason sliding window fails here is simple: negative numbers break the window-shrinking invariant, so the window cannot validly replace a prefix-sum pass.

If the prompt says "subarray sum equals k," "longest subarray with X," or "number of ranges matching Y," stop and ask whether a prefix sum plus hash map fits before you brute-force nested loops.

The move itself is short once you see it. Hello Interview's walkthrough frames it cleanly: subarraySum(i..j) == prefix[j] - prefix[i], so the problem reduces to "how many pairs of prefix sums differ by k" — which is Two Sum on a running total. The community-curated LeetCode study guide lists the full problem family that collapses to this pattern.

The detail most candidates miss is the initialization. You seed the map with {0: 1} before the loop starts. AlgoMaster's writeup walks through exactly why: without the seed, any subarray whose prefix sum hits k on its own — meaning it starts at index 0 — gets silently dropped. That one missing line is responsible for maybe half the off-by-one bugs I see in debriefs on this problem.

PYTHON

from collections import defaultdict
 
def subarray_sum(nums, k):
    seen = defaultdict(int)
    seen[0] = 1  # the subarray-starts-at-zero case
    running, count = 0, 0
    for num in nums:
        running += num
        count += seen[running - k]
        seen[running] += 1
    return count

The invariant to say out loud: "seen[x] is the number of prefixes I have already walked whose running sum equals x, so seen[running - k] is the number of valid subarrays ending at the current index." That sentence is the difference between memorizing the pattern and actually understanding it.

The questions that keep repeating

From the current StrongYes reference set, the most repeated Arrays & Hashing questions are:

Top K Frequent Elements — linked to 13 companies
3Sum — 12 companies
Longest Substring Without Repeating Characters — 11 companies
Best Time to Buy and Sell Stock — 11 companies
Insert Delete GetRandom O(1) — 9 companies

That list is useful for two reasons.

First, it tells you the category is broader than "easy hash map warm-ups." Second, it shows the progression interviewers like to walk through:

start with presence or complement lookups
move to counting
blend hashing with another pattern
test whether you can still explain the data structure choice under pressure

The last bullet is the one that actually matters. Most candidates who get the pass had at least one moment where they got stuck on implementation. What separated the strong hires was that they could still narrate what the data structure was doing when the code was broken.

Common mistakes that make these questions harder than they are

These are the failure modes I flag in debriefs over and over.

Sorting when order or index matters

If the problem cares about original indices, sorting is not a free move. I once watched a candidate sort the input on Two Sum and then spend six minutes of a 35-minute screen rebuilding the indices they just destroyed. The solution eventually worked. The decision signal was already lost.

Using the right data structure for the wrong job

A set answers membership. A map answers membership plus attached data. Say which one you need before you code. The number of times I have seen someone declare a map, use it only for in checks, and then realize halfway through that a set would have been cleaner — it is depressingly high.

Forgetting the input constraints

If values are in 1..n, you may be able to use the array itself as storage. If characters are lowercase English letters, a fixed-size count array is cleaner than a general map. The constraints are hints, not decoration. Read them twice.

Explaining the code instead of the invariant

Interviewers care less about "then I increment left" and more about "the window is always valid here" or "the map stores the last seen index." Lead with the state you are maintaining. The code is the easy part — the state is the thing that makes you sound like an engineer who understands what they are building.

The talk track that actually works

Try this structure out loud:

"Brute force would be O(n^2) because I would compare every pair."
"I can trade space for time by storing X in a hash map or set."
"My invariant is Y."
"Each element is processed once, so time is O(n) and space is O(n)."

That structure works on most of this category. It makes you sound deliberate instead of lucky. In the debriefs, "deliberate" is the word that comes up most often about candidates who get the pass.

90-minute practice block

If you only have one serious practice session, do these in order:

Contains Duplicate
Two Sum
Valid Anagram
Group Anagrams
Top K Frequent Elements
Product of Array Except Self
Longest Consecutive Sequence
Insert Delete GetRandom O(1)

The point is not to finish every edge case from memory. The point is to see the same storage decisions enough times that the category starts to feel boring. That is the spaced-repetition approach — repeated exposure to the same pattern family beats grinding novel problems.

When the category feels boring, you are ready.

Practice Arrays & Hashing.

Explain your thinking like you're in the interview.

Try Two Sum free

What the interviewer is actually listening for

The pick-your-structure decision tree

The five plays that cover most of the category

1. Seen set for duplicate or membership checks

2. Value-to-index map for complement problems

3. Frequency map for counting and grouping

4. Bucket by frequency when counts are bounded by n

5. Prefix sum plus hash map for subarray variants

The questions that keep repeating

Common mistakes that make these questions harder than they are

Sorting when order or index matters

Using the right data structure for the wrong job

Forgetting the input constraints

Explaining the code instead of the invariant

The talk track that actually works

90-minute practice block

Practice Arrays & Hashing.

Practice Arrays hashing.