૮ ⚆ﻌ⚆ა Two Pug ૮ –ﻌ⚆ა

Interview Prep: Classic System Design

meme-v0-kj8k4prt9m2g1 1

This note is more focused on classic infrastructure roles. Not every company will run interviews like this, most modern AI Infra roles include ML System interview loops, which differ significantly and align more closely with the topics covered in The ML Grind section.

First, source a good list of questions, you have a few options available on the web:

  1. Paid ones, I used bytebytego, neetcode also has a dedicated section to systems design, there are many more out there, find the one that suits your needs.
  2. You can also do a deepsearch in a chatbot to give you a list of question.

The issues with many of the solutions provided for System Design questions is that they are too basic and may work for entry/mid level positions, but they are not in-depth enough for more senior roles.

Those sites are still valuable for sourcing questions, but instead of just reading them, use a chatbot for mock interviews.

Use the prompt below:

**System Design Mock Interview Format**

Let's run a system design mock interview in a realistic way:

* You'll ask me to design a system without giving unnecessary details.
* I will ask only the clarifying questions I need.
* You may ask follow-up or leading questions based on my answers.
* I will verbalize my reasoning process as we go, and we'll iterate.
* A large part of a real system design interview is the candidate asking questions, so please don't overshare, let me extract the requirements.

Below is a question we will use as a reference. Assume I haven't read it; use it as a guide, not a script.

COPY-PASTE here the entire question & solution from the sources above (bytebytego/neetcode/etc)

Think of this as a real interview. Walk through your solution step by step. You can use the voice feature in chatbots to reason out loud as you work through the problem. When you're done, say that you'd like to end the interview, and ask to explain how you should have approached to solve the problem.

Know the numbers

Scales

You should get comfortable working with large numbers. The easiest method I've found is to use scientific notation. Here are a few examples:

While you work through the chatbot-guided mock interviews, always do the math by hand.

Power of 10 Symbol Name Power of 2 Time
10¹⁵ P peta 2⁵⁰ 31.7 million yrs
10¹² T tera 2⁴⁰ 31.7 thousand yrs
10⁹ G giga 2³⁰ 31.7 yrs
10⁶ M mega 2²⁰ 11.6 days
10³ K kilo 2¹⁰ 16 min 40 sec
10⁰ unit
10⁻³ m milli
10⁻⁶ µ micro
10⁻⁹ n nano

Just a few more random numbers

Clock sync drift

HW latencies

Latency Component Notes
0.250 ns 1 cycle @ 4 GHz
0.333 ns 1 cycle @ 3 GHz
1 ns L1 Cache
4 ns L2 Cache
100 ns DRAM DRAM access 13 ns, memory controller 40 ns, CPU cache miss handling 25 ns, bus traversal 15 ns. Assume p99 around x1.2 or the same, there is not as much jitter in the latency.
65 µs NVMe ~3x for p99
10 ms HDD ~10x for p99

Network

The numbers here are rough latency and bandwidth estimates, it's an extension of Jeff Dean's classic 2010 latency sheet, but adapted to modern AWS network tiers. It doesn't matter whether the company uses AWS specifically, the patterns and relative orders of magnitude are universal.

Why do we care about latencies across the network hierarchy?

Because SLAs around latency and availability directly constrain workload placement and distribution strategies. Your placement across failure domains is dictated by your ability to meet those SLAs.

For example:

This is a cheat sheet I created for quick reference and rehearsal. I keep all of this on a single index card, it may look like a lot, but it's actually very manageable once you learn a few memory tricks I describe below.

Start every interview with a question, with SLAs requirements for the service that needs to be built, if they don't care, suggest a reasonable SLA yourself.

I used the following notation for latencies between two hosts:

Availability SLAs

Availability Downtime / month
99.5% 3.65 h
99.95% 21.6 min
99.99% 4.4 min

Network Latencies

The latency and link columns are easy to memorize. For GByte/s column, just remember the value for 10 Gbit/s, and you can derive the rest even if you're not great at arithmetic (halve it, multiply by 5×, 10×, etc.).

The numbers below are expressed as Round-Trip Time (RTT), the time it takes to send a request and receive a response. Keep in mind that actual latency may be higher when you include the TCP handshake, TLS negotiation, or other protocol overheads. In practice, these estimates are still accurate for back-of-the-envelope calculations, especially when using long-lived connections that amortize those setup costs.

The estimates below represent p50 latencies. Should you also memorize p99 numbers, since many SLAs are defined using p99 (e.g., "99th percentile of requests must be under 200 ms")? Not really. For rough reasoning, you can approximate p99 as 2 × p50. It's not scientifically precise, but it's good enough for back-of-the-envelope calculations. For example, a 0.4 ms p50 latency would translate to ~0.8 ms p99.

Link Type <1KB Gbit/s Link GByte/s
iAZ 0.4ms 25 3.125
iAZ 0.4ms 50 6.25
iAZ 0.4ms 100 12.5
×AZ 1.2ms 10 1.25
×REG 20ms 5 0.625
×COA 65ms 5 0.625
×CON 80ms 5 0.625

While the latencies above are enough for most cases, the actual time to transfer a blob of data depends on more than just RTT. It's influenced by optimizations in the network stack, such as pipelining, where multiple packets are sent back-to-back. How many packets can be pushed at once varies and involves some networking math, including congestion control and TCP slow-start.

Back in the day I had to learn those details as I was preparing for Meta’s infra interviews. These days, most infra engineers aren't expected to go deep on networking unless the role specifically calls for it.

Here's how to think about the numbers: the RTT table applies to payloads under ~1.46 KB because that fits in a single packet (standard MTU). Sending 1 KB means you send the packet and get an ACK - one full round trip. For larger transfers, you can stream many packets in parallel, but eventually you'll hit the slowest link in the path, which creates backpressure via congestion control (e.g. by dropping packets).

The key idea is this: depending on bandwidth and latency, transfer times shift from being latency-bound to bandwidth-bound. For example, sending 1 KB vs 100 KB isn't 100× slower, because many packets can be sent at once. But with 10 MB → 100 MB → 1 GB transfers, time scales roughly 10×, so the transition from latency-bound to bandwidth-bound happens around 1–10 MB. That’s what the color coding highlights.

You mostly need to memorize the latency-bound part (on the left) the rest can be derived with:

Transfer time ~ RTT + payload ÷ bandwidth.

Link 100 KB 1 MB 10 MB 100 MB 1 GB
iAZ 0.43 ms 0.7 ms 3.6 ms 33 ms 330 ms
xAZ 1.23 ms 2.0 ms 9.2 ms 81 ms 820 ms
xREG 20.2 ms 22 ms 96 ms 180 ms 1640 ms
xCOA 65.2 ms 67 ms 98 ms 225 ms 1685 ms
xCON 80.2 ms 82 ms 96 ms 240 ms 1700 ms

Storage

This section covers the cost of persisting data. When paired with the previous latency breakdown, it ties into the trade-offs between performance and consistency.

Broadly, systems fall into two categories: those that provide strong consistency, and those that optimize for latency (See PACELC below for more details).

Classic synchronous replication with 2-Phase or 3-Phase Commit is less common today due to its impact on availability. However, it's still worth considering:

To achieve strong consistency and high availability, Paxos/Raft based systems (etcd, zookeeper) provide a reference architecture:

Distributed Blob Stores

Two architectural approaches:

The core difference lies in where the decision is made about which hosts to contact to perform read/write operations. Each model comes with trade-offs:

Latency behavior is governed by the quorum formula:

R + W > N
(Number of replicas read from) + (Number of replicas written with ACK) must exceed total replicas N.

Examples for N = 3

This model lets you trade off read vs write latency and availability. For example:

PACELC Theorem

The PACELC theorem extends the CAP theorem. It states that:

Since parts of the system can't communicate, we either shut everything down to preserve strong consistency, or continue operating and risk serving inconsistent data.

A short cheatsheet below demonstrates why it’s called PACELC:

There is a lot more to explore about storages, however, this section should be sufficient for non-storage focused roles. If you are curious to learn more, I would highly encourage you to read Designing Data-Intensive Applications.

Monitoring & Debugging

L.U.S.E.R. is an extension of the USE method for performance analyses to include additional metrics required for distributed systems monitoring and debugging.

  1. Image source