Understanding System Design

6 min readMay 12, 2021

I am writing this article about my process of learning system design and I will try to include the topics which are related to system design.

To start with — What is System Design?

SD defines the architecture, components, modules and data for a system based on the requirements. It the most integral part of Software engineering.

Key Principles of System Design

Availability
Performance
Reliability
Scalability
Manageability
Cost

Availability

Design a system to be constantly available and resilient to failure. It’s very critical as unavailability of System can lead to bad reputation of the business and a big loss of revenue. Having said that on the other hand system failure is part of software; the key thing is how rapidly it recovers and gracefully.

Performance

This is again another factor that directly co-relates to revenue and retention. The performance of the application affects usage and user satisfaction. Hence, designing a system that is optimised for fast responses and low latency is important.

Reliability

System needs to be reliable — Users need to know that if something is written to the system, or stored, it will persist and can be relied on to be in place for future retrieval.

Scalability

For a large distributed system — scalability is very important. I would not consider only the SIZE as an important factor to scale but also the TIME AND HOW EASY it is to scale. Scalability parameters could be extending storage size, handling addition traffic, processing more transaction per second, etc.

Manageability

How simple the system is to operate which includes scalability operations, maintenance and updates.

Cost

Cost is an important factor. This obviously can include hardware and software costs, but it is also important to consider other facets needed to deploy and maintain the system. The amount of developer time the system takes to build, the amount of operational effort required to run the system, and even the amount of training required should all be considered. Cost is the total cost of ownership.

Each of these principles provides the basis for decisions in designing a distributed system. However, they is also possibility that achieving one objective comes at the cost of another. e.g. choosing to address capacity by simply adding more servers (scalability) can come at the price of manageability (you have to operate an additional server) and cost (the price of the servers).

System Design

Building blocks for Large scale system

DNS

In layman terms, It translate name of a person into telephone number(phonebook of the Internet). DNS map a host name to an IP address and vice-versa.

CDN

In layman terms, CDN Is a distribution system similar to the post office network. Making your content available at the closest point of presence aka the post office.

A content delivery network (CDN) is a huge collection of high powered servers that are globally distributed or I should say a collection of servers which work together to provide fast delivery of content through the Internet.

Why? One of the main reason is to address latency issues — CDN server caches your content. CDN does not host a website in its server, but rather it caches from the original web server and provides fast delivery of that content.

Load Balancer

Load balancing is a core networking solution responsible for distributing incoming traffic among servers of an application. It prevents any application server from becoming a single point of failure, thus improving overall application availability and responsiveness

Load balancer performs the following functions:

Distributes client requests or network load efficiently across multiple servers
Ensures high availability and reliability by sending requests only to servers that are up and running.
Provides the flexibility to add or subtract servers as demand dictates

Compute Service (Servers)

Servers could be dedicated Physical servers on-premise or virtual servers provided by cloud services(AWS, GCP, etc.). It’s part of your infra setup which helps an application for computation workload

Data Lake

A Data lake is a storage repository that holds a vast amount of raw data. It can store structured, semi-structured, or unstructured data, which means data can be kept in a more flexible format for future use. In essence is nothing but storage part of Big data world that could be Hadoop, S3 or any cheap big data storage systems.

Data Warehouse

Data warehouse is like a neatly arranged wardrobe.

In simple words, a data warehouse is a central repository containing historical data collected from various source system like (ERP, Finance, HRM, CRM, Sales). Data is extracted periodically and moved to data warehouse. This includes the process of data cleansing, formatting and summarising(Extract, Transform and Load — ETL).

Ordinary Databases store MBs and GBs of data, which is recently collected. They are not meant to store historic data which could range up to TBs. Due to this reason ordinary Databases don’t provide the facility of analytics. So to store all this large scale historic data, Data warehouses were introduced.

Data warehouses help in insightful and strategic decision making for the business by providing analytical support.

e.g. Redshift, Teradata,

Database

I believe — reader is already aware about the database but let me still put it in simple words for those who don’t know. A database is an organised collection of data. Data is organised into rows, columns and tables, and it is indexed to make it easier to find relevant information. Data gets updated, expanded and deleted as new information is added.

Database could be Relational or Non-Relational. e.g. Postgres (relational), DynamoDB (non-relational)

Caching

Here we are talking about in-memory data storage system (e.g. Redis, Memcached). Caching is storing data in a location different than the main data source such that it’s faster to access the data. Generally, caches contain the most recently accessed data as there is a chance that recently requested data is likely to be asked again and again. So, in such scenarios, we need to use caching to minimise data retrieval operation from the database.

It helps in reducing the source database call and improvement performance as well.

e.g. Caching popular Facebook celebrity profile access. Say, a celebrity updates a new post with photos in a Facebook profile. A lot of people are checking that profile updates. If all the users are requesting the same status update and the database has to retrieve the same data every time, it will put huge pressure on the database. In such cases, we can cache the profile of the popular profile and get the data from cache service

If data is mostly static, caching is an easy way to improve performance. In the case of data that is edited often, the cache is a bit tricky to implement.

Distributed Queue System

Queues are everywhere in today’s modern distributed systems architecture — adopted across various industries for different use cases.

A queue provides temporary storage between the sender and the receiver so that the sender can keep operating without interruption when the destination program is busy or not connected.

https://ctoasaservice.org/2019/08/30/the-aws-messaging-stack-sqs-sns-kinesis/

There are client applications called producers that create messages and deliver them to the message queue. Another application, called a consumer, connects to the queue and gets the messages to be processed. Messages placed onto the queue are stored until the consumer retrieves them.

In Synchronous processing, we

call a service, then
wait for the service to finish, and then
move on to the next task.

Asynchronous processing. on the other hand, allows a task to call a service, and move on to the next task while the service processes the request at its own pace. That’s why a queue is a beautiful, elegant way to unblock your systems because it puts a layer in front of your services and allows them to tackle the tasks at their own pace.

Real-time Streaming Service

The process of taking action on data at the time the data is generated or published. One of the best examples of a real-time system are those used in the stock market.

Some of the stream services Kinesis, Kafka are real-time streaming services.

Thanks for reading. If you enjoyed this article, comment below: What all other components would you include which covers almost all use case of designing a Large scale system.

Follow me on LinkedIn.