System Design Interview Questions

Question: How would you design a URL shortener like bit.ly?

Key concepts:

API: create_short_url(long_url), get_long_url(short_url)
Database: Relational (mapping ID to URL) or NoSQL (Key-Value store like Redis/DynamoDB).
Encoding: Base62 encoding of a unique ID.
Scalability: Load balancers, caching frequently accessed URLs.

Question: How would you implement a rate limiter for an API?

Strategies:

Token Bucket Algorithm: Allow a burst of traffic but limit average rate.
Leaky Bucket: Process requests at a constant rate.
Implementation: Redis is commonly used to store counters with expiration times.

Question: Compare Write-Through, Write-Back, and Cache-Aside strategies.

Cache-Aside (Lazy Loading): App checks cache. If miss, reads DB, updates cache, returns data. Best for read-heavy workloads.
Write-Through: App writes to cache and DB synchronously. Ensures consistency but higher write latency.
Write-Back (Write-Behind): App writes to cache only. Cache writes to DB asynchronously. Fastest writes, but risk of data loss if cache fails.

Question: What is Sharding and when should you use it?

Definition: Horizontal scaling method that splits a large database into smaller, faster, more easily managed parts called data shards.

When to use: When a single database server cannot handle the write volume or storage requirements.

Challenges: Complex queries (joins across shards), rebalancing data, and handling transactions.

Question: Why use a message queue in a system architecture?

Decoupling: Producers and consumers don't need to know about each other.

Asynchronous Processing: Offload heavy tasks (email sending, video processing) from the main request-response cycle.

Load Leveling: If traffic spikes, the queue absorbs the load, allowing workers to process tasks at their own pace without crashing the system.

Question: Explain the CAP Theorem.

In a distributed system, you can only guarantee two of the following three:

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

Since network partitions are inevitable, you must choose between Consistency (CP) and Availability (AP).