System Design #6: Design a Typeahead / Autocomplete Widget

Hey folks, Rahul here 👋

You know that search box on Google, Amazon, or YouTube where suggestions magically appear as you type? That's a typeahead (or autocomplete) widget — and it's deceptively hard to build well.

I've seen candidates nail the basic debounce-and-fetch approach, then completely fall apart when asked about "What happens when the user types faster than the network?" or "How do you handle 10 billion possible suggestions?" Let's make sure that doesn't happen to you.

R — Requirements

Functional Requirements

Display suggestions as the user types in a search input
Highlight matched portions of each suggestion
Support keyboard navigation (↑/↓/Enter/Escape)
Show recent searches for authenticated users
Support multi-section results (products, categories, articles)
Handle selection → navigate to result or fill input

Non-Functional Requirements

Latency: Suggestions must appear within 100ms of the last keystroke (perceived)
Network efficiency: Minimize redundant API calls
Accessibility: Full ARIA combobox pattern with screen reader support
Scalability: Handle suggestion pools of billions of entries
Resilience: Graceful degradation on network failures

A — Architecture

Let me walk you through the component architecture. There are three approaches candidates typically consider:

Approach 1: Naive Fetch-on-Keystroke

Fire an API call on every keystroke. Simple, but generates ~10 requests for "javascript" — most of which are wasted. ❌ Don't do this.

Approach 2: Debounced Fetch

Classic approach: wait N ms after the last keystroke, then fetch. Reduces calls dramatically but introduces perceived latency — the user finishes typing and waits 200-300ms before seeing anything.

Approach 3: Debounce + Request Deduplication + Client Cache ✅

This is the production pattern. Combine debouncing with an LRU cache and request deduplication (in-flight tracking). The cache means repeated prefixes are instant. Let me show you:

Why 150ms debounce?

Research shows the average inter-keystroke interval for proficient typists is ~100-150ms. Setting the debounce at 150ms means we fire after most rapid typing bursts while keeping perceived latency low. Google actually uses ~100ms — they can afford it with their edge infrastructure.

Component Tree

D — Data Model

Client-Side State

LRU Cache Design

Why LRU and not just a plain object? Memory. If a user explores many queries, an unbounded cache grows forever. LRU with 100 entries keeps memory ~50KB while caching the most relevant prefixes.

I — Interface Definition

API Contract

The Stale Response Problem

This is a classic gotcha. User types "rea" → fires request → types "react" → fires another. If "rea" response arrives after "react" response, you'd show wrong suggestions. Solution:

Alternatively, the response echoes back the query field, so you can compare without closure tricks.

ARIA Combobox Pattern

The aria-activedescendant pattern is crucial — it tells screen readers which suggestion is "focused" without actually moving DOM focus out of the input. This lets the user keep typing while navigating suggestions.

O — Optimizations

1. Prefix-Based Cache Warming

Here's a trick Google uses: if you have cached results for "reac", you can use them as provisional results for "react" while the network request is in-flight:

This makes the UI feel instant. The user sees filtered results from cache while the real results load in the background.

2. Highlight Matching with Fuzzy Support

4. Mobile-First: Full-Screen Takeover

On mobile, the dropdown pattern breaks — virtual keyboards eat half the screen. The production pattern is a full-screen search overlay:

5. Analytics & Search Intelligence

The selectedIndex is gold for ranking — if users consistently pick the 3rd suggestion, your ranking model needs tuning.

6. Rate Limiting & Graceful Degradation

7. Server-Side: Trie + Ranking

On the backend (brief overview for completeness), suggestions are typically served from a Trie or prefix tree stored in Redis or a dedicated service like Elasticsearch's completion suggester:

Production Gotchas Rahul Has Debugged 🔥

IME Composition: For CJK (Chinese/Japanese/Korean) input, don't trigger searches during compositionstart/compositionend — the input is incomplete. Listen for compositionend before fetching.
Dropdown Positioning: Use position: fixed + floating-ui to handle scroll containers and viewport edges. CSS absolute breaks inside overflow-hidden ancestors.
Click Outside vs. Mousedown: Use onMouseDown on suggestion items, not onClick. Why? Because onBlur on the input fires before onClick on the item, closing the dropdown before the click registers.
URL Encoding: Always encodeURIComponent the query. Users will paste emojis, special characters, and even SQL injection attempts into your search box.
Flash of Empty State: When switching from cached results to loading fresh ones, don't clear the UI. Show stale results with a subtle loading indicator until fresh data arrives.

Summary Comparison Table

Aspect	Naive	Debounce Only	Production (Cache + Dedup)
API calls for "javascript"	10	2-3	1 (rest from cache)
Perceived latency	Network RTT per key	Debounce + RTT	~0ms (cache) or Debounce + RTT
Stale responses	Frequent	Possible	Handled via query check
Memory usage	Low	Low	Bounded (LRU)
Offline support	None	None	Cache serves stale results

Next up: #7: Design a Calendar/Date Picker — where we'll tackle date math nightmares, timezone handling, range selection state machines, and why Date is the worst API in JavaScript. Stay tuned! 🗓️