My System Design Interview Experience
Overview
This document details a candidate's experience during a system design interview. The interview question involved designing a system that infinitely scrolls through websites, extracts raw data (HTML, CSS, JS, images), and stores it. The candidate reflects on their approach, the interviewer's feedback (or lack thereof), and key takeaways for future interviews.
Interview Rounds
The interview began with clarifying questions to define the scope of the problem, including whether the design should focus on end-to-end architecture or frontend flow, the identification of end-users, and the need for a UI or a purely backend system.
Initial Design
The candidate proposed an initial flow:
Client → API Gateway → Backend Services (Web Content Service + others)
When prompted about automating the process without manual intervention, the candidate introduced a message queue for job distribution.
Discussion & Evolution
During the interview, the candidate's design was discussed and refined. The interviewer's expectations leaned towards a more autonomous, backend-focused system, triggering a shift in the candidate's approach.
The proposed system, triggered by a scheduler, queues jobs to worker nodes (headless browsers). Each job extracts data and streams results to storage asynchronously. This refined approach aligned more closely with the interviewer's distributed systems background.
High-Level Design Components:
Input: List of target URLs
↓
Orchestrator / Scheduler
↓
Workers (Headless Browsers - Puppeteer / Playwright)
↓
Scrolling & Extraction Logic
↓
Data Processing / Queue
↓
Storage (S3, GCS, or Database)
↓
Monitoring & Logging
Component Details:
- Orchestrator: Coordinates scraping jobs, handles retries & rate limiting.
- Workers: Use Puppeteer/Playwright to scroll & extract HTML.
- Queue: Decouples ingestion and processing (Kafka / RabbitMQ).
- Storage: Raw HTML → S3; Metadata → DB.
- Monitoring: Centralized logs & retry tracking.
Scaling and Performance:
- Horizontally scale workers (Kubernetes / ECS).
- Reuse browser sessions, limit tabs, cache pages.
Tradeoffs:
| Aspect | Option A | Option B | | ------------- | ---------------------- | ------------------------ | | Scalability | More worker containers | Centralized browser pool | | Storage | Raw HTML | Parsed content | | Network | Cache | Re-fetch every time |
System Summary:
The proposed system utilizes an orchestrator to manage a scalable pool of headless browsers that scroll, extract content, and push it asynchronously to storage.
Final Whiteboard Diagram:
┌───────────────────┐
│ URL Scheduler │
└───────┬───────────┘
│
┌───────▼───────────┐
│ Worker Pool │
│ (Puppeteer/Play) │
└───────┬───────────┘
│ Extracts content
▼
┌───────────────────┐
│ Message Queue │
└───────┬───────────┘
│
┌───────▼───────────┐
│ Storage Layer │
│ (S3, DB, Logs) │
└───────────────────┘
Key Takeaways
The candidate identified several key areas for improvement:
- Clarifying questions: Demonstrated systems thinking.
- Modular design: Showed clear separation of concerns.
- Adaptability: Introduced message queues when prompted.
- Full-stack awareness: Considered frontend, backend, and infra links.
Divergence in Expectations:
The candidate noted a difference in focus, stemming from the interviewer's backend/distributed systems background versus the intended frontend role.
| Area | Candidate's Focus | Interviewer's Expectation | | ------------- | ----------------------| ----------------------------| | Entry Point | UI-triggered flow | Autonomous system | | Automation | Added queue later | Expected scheduler from start| | Focus | Architecture clarity | Distributed scalability | | Terminology | Frontend/system mix | Backend/infrastructure terms |
By immediately aligning the approach with a scheduler-driven, automated system, the candidate could have addressed the interviewer's concerns more effectively.
Despite not moving forward, the experience provided valuable learning and reinforced the importance of understanding the interviewer's perspective and tailoring the response accordingly.
Original Source
This experience was originally published on medium. Support the author by visiting the original post.
Read on medium