Skip to main content
GuideArrays hashingInterview questionsHash map

Arrays and hashing, three hours, no hand-holding

A practical arrays and hashing guide for coding interviews: when to use a set, when to use a hash map, which questions keep repeating, and the mistakes that cost people easy wins.

Fin·Apr 8, 2026·11 min read
StrongYes tip

Arrays and hashing is not one trick. It is the bucket interviewers reach for when they want to see whether you can trade a brute-force scan for fast lookups, clean state, and a clear explanation.

Across more than 200 candidate screens, arrays and hashing is the category where the research logs the most preventable failures. Not because the problems are hard — they usually are not. Because people start coding before they have picked the right storage shape, and then spend the rest of the screen fighting a structure that was wrong from minute two.

In the current StrongYes interview-reference set, Arrays & Hashing is the single largest tag group with 66 linked questions. That is not a coincidence. These problems show up at the start of screens, in the middle of onsite loops, and inside harder problems that later branch into , , or design work.

LeetCode's own Top Interview Questions collection opens with array and hash map problems for a reason. Interviewers want to see whether you can translate a brute-force idea into efficient code without overcomplicating the structure. The strong answer is rarely the clever one — it is the one that names the trade-off.

The goal here is not to dump fifty problems on you. It is the small set of moves that keep repeating, so you can recognize them early and talk through them calmly.

What the interviewer is actually listening for

Strip away the problem statement and most arrays-and-hashing questions reduce to one of four decisions:

  1. Do I need fast existence checks?
  2. Do I need to remember where I saw something before?
  3. Do I need counts instead of just presence?
  4. Can I turn the input range or frequency counts into a bucketed array?

If you can name that decision out loud, you are already ahead of most candidates. The mistake is starting with code before you know what your storage has to do. The research has logged senior engineers with ten years of production experience reaching for a nested loop on Two Sum because they skipped this step. It is the same failure mode every time.

The pick-your-structure decision tree

Here is the same logic as a flowchart. Walk through it once on the prompt before you start typing. Most wrong answers in this category come from reaching for the wrong storage shape, not from a broken loop.

Diagram
Rendering diagram...

Green nodes are the first-layer plays — between them they cover the majority of Arrays & Hashing questions. Amber is the next layer (counting, fixed-width keys). Purple is the "this looks like hashing but it is really a prefix sum" trap and its top-k bucket cousin — not harder, just a different question wearing hashing clothes.

The five plays that cover most of the category

1. Seen set for duplicate or membership checks

Use a set when the whole question is really "have I seen this before?"

This is the move behind Contains Duplicate, parts of Longest Consecutive Sequence, and many "find the first repeated value" variants.

PYTHON
def contains_duplicate(nums): seen = set() for num in nums: if num in seen: return True seen.add(num) return False

The interview move here is to say the out loud: "The set stores every number I have seen so far, so each new lookup is O(1) on average." That one sentence separates candidates who understand what they wrote from candidates who memorized the pattern.

2. Value-to-index map for complement problems

Reach for a hash map when you need more than yes/no membership. Usually you need the earlier index, the earlier count, or the earlier prefix sum.

Two Sum is the cleanest example:

PYTHON
def two_sum(nums, target): seen = {} # value -> index for i, num in enumerate(nums): need = target - num if need in seen: return [seen[need], i] seen[num] = i return []

The talk track is what sells this:

  • "I want O(1) complement lookups."
  • "I check before I store, so I do not reuse the same element."
  • "The map answers a value-to-index question, not just a membership question."

That is a far stronger interview explanation than "I know Two Sum uses a hash map." I have hired engineers based partly on the second sentence and rejected candidates who could only produce the first.

3. Frequency map for counting and grouping

The next step up is when presence is not enough and you need exact counts.

That covers Valid Anagram, Group Anagrams, Top K Frequent Elements, and a lot of string-heavy interview questions.

PYTHON
from collections import defaultdict def group_anagrams(words): groups = defaultdict(list) for word in words: count = [0] * 26 for ch in word: count[ord(ch) - ord('a')] += 1 groups[tuple(count)].append(word) return list(groups.values())

Sorting each word also works and it is fine as a first cut. A fixed-size count key is better because it shows you understand the structure of the problem, not just the API of the language. If the interviewer pushes on follow-up optimization, you already have the story ready.

4. Bucket by frequency when counts are bounded by n

Sometimes the best answer is "hash first, then array."

Top K Frequent Elements is the classic. Count with a map, then use bucket positions because no value can appear more than n times.

PYTHON
from collections import Counter def top_k_frequent(nums, k): freq = Counter(nums) buckets = [[] for _ in range(len(nums) + 1)] for num, count in freq.items(): buckets[count].append(num) result = [] for count in range(len(buckets) - 1, 0, -1): for num in buckets[count]: result.append(num) if len(result) == k: return result return result

The reason this is a strong interview answer is range awareness. You are not reaching for a heap by habit. You are using the shape of the problem. In production this is the same instinct that tells you whether a counting sort beats a general sort — it is the same trade-off wearing different clothes.

The complexity story matters if the interviewer pushes back. Bucket sort on Top K Frequent is O(n) because you make one pass to count and one pass to walk the buckets. The heap version is O(n log k) because every insert and eject costs log k. For small k relative to n the heap is fine in practice, but the side-by-side analysis on Medium is worth reading because it shows the interviewer-facing talking point: "bucket sort is better when the frequency bound is known, heap is better when k is tiny and you want to skip allocating n+1 buckets." Either answer is correct — the win is being able to explain why you picked one.

5. Prefix sum plus hash map for subarray variants

This is the play most people miss. The array looks innocent, but the question is really about remembering an earlier running total.

Prefix sum deserves its own category even though the storage is a hash map — the key is a running total, not a raw value. I have seen this go wrong 50 times and it is always the same thing: someone recognizes the word "subarray," pattern-matches to a sliding window, and spends 20 minutes fighting a window that cannot shrink. The reason sliding window fails here is simple: negative numbers break the window-shrinking invariant, so the window cannot validly replace a prefix-sum pass.

If the prompt says "subarray sum equals k," "longest subarray with X," or "number of ranges matching Y," stop and ask whether a prefix sum plus hash map fits before you brute-force nested loops.

The move itself is short once you see it. Hello Interview's walkthrough frames it cleanly: subarraySum(i..j) == prefix[j] - prefix[i], so the problem reduces to "how many pairs of prefix sums differ by k" — which is Two Sum on a running total. The community-curated LeetCode study guide lists the full problem family that collapses to this pattern.

The detail most candidates miss is the initialization. You seed the map with {0: 1} before the loop starts. AlgoMaster's writeup walks through exactly why: without the seed, any subarray whose prefix sum hits k on its own — meaning it starts at index 0 — gets silently dropped. That one missing line is responsible for maybe half the off-by-one bugs I see in debriefs on this problem.

PYTHON
from collections import defaultdict def subarray_sum(nums, k): seen = defaultdict(int) seen[0] = 1 # the subarray-starts-at-zero case running, count = 0, 0 for num in nums: running += num count += seen[running - k] seen[running] += 1 return count

The invariant to say out loud: "seen[x] is the number of prefixes I have already walked whose running sum equals x, so seen[running - k] is the number of valid subarrays ending at the current index." That sentence is the difference between memorizing the pattern and actually understanding it.

The questions that keep repeating

From the current StrongYes reference set, the most repeated Arrays & Hashing questions are:

  • Top K Frequent Elements — linked to 13 companies
  • 3Sum12 companies
  • Longest Substring Without Repeating Characters11 companies
  • Best Time to Buy and Sell Stock11 companies
  • Insert Delete GetRandom O(1)9 companies

That list is useful for two reasons.

First, it tells you the category is broader than "easy hash map warm-ups." Second, it shows the progression interviewers like to walk through:

  • start with presence or complement lookups
  • move to counting
  • blend hashing with another pattern
  • test whether you can still explain the data structure choice under pressure

The last bullet is the one that actually matters. Most candidates who get the pass had at least one moment where they got stuck on implementation. What separated the strong hires was that they could still narrate what the data structure was doing when the code was broken.

Common mistakes that make these questions harder than they are

These are the failure modes I flag in debriefs over and over.

Sorting when order or index matters

If the problem cares about original indices, sorting is not a free move. I once watched a candidate sort the input on Two Sum and then spend six minutes of a 35-minute screen rebuilding the indices they just destroyed. The solution eventually worked. The decision signal was already lost.

Using the right data structure for the wrong job

A set answers membership. A map answers membership plus attached data. Say which one you need before you code. The number of times I have seen someone declare a map, use it only for in checks, and then realize halfway through that a set would have been cleaner — it is depressingly high.

Forgetting the input constraints

If values are in 1..n, you may be able to use the array itself as storage. If characters are lowercase English letters, a fixed-size count array is cleaner than a general map. The constraints are hints, not decoration. Read them twice.

Explaining the code instead of the invariant

Interviewers care less about "then I increment left" and more about "the window is always valid here" or "the map stores the last seen index." Lead with the state you are maintaining. The code is the easy part — the state is the thing that makes you sound like an engineer who understands what they are building.

The talk track that actually works

Try this structure out loud:

  1. "Brute force would be O(n^2) because I would compare every pair."
  2. "I can trade space for time by storing X in a hash map or set."
  3. "My invariant is Y."
  4. "Each element is processed once, so time is O(n) and space is O(n)."

That structure works on most of this category. It makes you sound deliberate instead of lucky. In the debriefs, "deliberate" is the word that comes up most often about candidates who get the pass.

90-minute practice block

If you only have one serious practice session, do these in order:

  1. Contains Duplicate
  2. Two Sum
  3. Valid Anagram
  4. Group Anagrams
  5. Top K Frequent Elements
  6. Product of Array Except Self
  7. Longest Consecutive Sequence
  8. Insert Delete GetRandom O(1)

The point is not to finish every edge case from memory. The point is to see the same storage decisions enough times that the category starts to feel boring. That is the spaced-repetition approach — repeated exposure to the same pattern family beats grinding novel problems.

When the category feels boring, you are ready.

Practice Arrays & Hashing.

Explain your thinking like you're in the interview.

Practice with Fin or Coco
Source note

Fin and Coco are StrongYes editorial personas from the Council of Ternary Vertices — a trinary-star animal civilization that studies Earth's coding-interview process. Anecdotes map animal-universe experience to human interview mechanics; they are NEVER human-career claims. External citations link to public primary sources.

Fin-voice guide grounded in LeetCode Top Interview Questions, Hello Interview's prefix-sum walkthroughs, AlgoMaster's subarray-sum-equals-k writeup, the community-curated LeetCode Discuss master study guide, and tech-interview-handbook (MIT).

Last verified Apr 19, 2026.

Practice Arrays hashing.

Reading builds recognition. Explaining builds recall. Run these problems with Fin or Coco.