To support an enterprise class distributed database
management system, performance optimization techniques must be implemented. Key
principles of a high-performing and scalable system include faster data
retrieval and efficient traffic distribution. To meet these requirements,
techniques such as caching and load balancing can be utilized.
Caching for faster data retrieval
Database caching is a technique that helps increase data retrieval speed, improving overall database performance. A cache is a high-speed data storage layer that stores a subset of data that is frequently requested or of high value (Emerich, n.d.). Caching allows retrieval of data that is faster than constantly accessing the primary database, therefore reducing its workload (Rangarajan & Prudhviraj, 2024).
Modern enterprises like Netflix use caching to ensure data availability on a global scale, specifically through a method called distributed caching, which stores data across multiple servers to ensure quick access and high availability (Rangarajan & Prudhviraj, 2024). Netflix is able to enhance system reliability, minimize latency, and deliver a scalable system through their EVCache caching solution, which is based on popular caching solution, Memcached, which stores data in RAM, offering faster access times while maintaining horizontal scalability, with the ability to add more servers if needed (Memcached - a distributed memory object caching System, n.d.).
After the analysis of caching strategies, utilizing Memcached will best support the global DDBMS to improve data retrieval speeds. A Memcached caching solution will be used within each of the three datacenters in US, Japan, and Argentina primarily for faster data retrieval times.
Round-robin load balancing is a technique that distributes requests between servers in a cyclical manner (Sid, 2024). Round robin load balancing works best when servers have similar computing capabilities and storage capacities (What Is Round Robin Load Balancing? Definition & FAQs | VMware, 2024). This technique assumes all servers receive about the same amount of traffic.
Least connections load balancing is a technique that distributes requests to the server having the least number of active connections (Sid, 2024). However, this technique may lose efficiency in scenarios where servers have differing capacities.
Geolocation routing directs requests to the nearest server based on the location of the client (HAProxy Technologies, 2025). This technique helps increase response times and can be effective for content localization.
After the analysis of different load balancing techniques, geolocation routing seems the most aligned to support a global DDBMS to efficiently distribute network traffic. With geolocation routing, the locations from which requests originate are considered to improve the response from the DDBMS. For example, the database network would handle requests from users based on their proximity to datacenters located in the US, Japan, and Argentina.
To ensure that an enterprise class distributed database
management system can adapt to varying workloads and changing demands,
implementing effective scaling strategies is essential. Modern cloud-based
databases are becoming more common and are emerging to enhance scalability.
Strategies such as auto-scaling, ai-driven database scaling, and distributed
cloud databases are all emerging approaches being used to achieve this.
Serverless databases for auto-scaling
Serverless databases are databases that use service
providers for their server management tasks (GeeksforGeeks, 2024). Serverless
databases can allocate and deallocate resources automatically, compared to the
manual allocation of resources (Hellerstein et. Al, 2019). With auto-scaling,
resources are configured to scale automatically in response to an event or
threshold that closely correlates with lower performance, as defined by the
organization (Datadog, 2022). However, auto-scaling is not recommended in the
early stages of database implementations that are still growing. For example, a
system that manages performance reviews for a business experiences high traffic
and usage during the period when reviews are due. To handle the increased
demand, the system implements auto-scaling to allocate more servers during this
peak period.
AI-Driven database scaling
AI-Driven scaling features allow for auto-scaling depending
on whether the system needs to be optimized for cost, optimized for
performance, or a balance of both defined by current needs of the organization.
Amazon Redshift Serverless, a cloud data warehousing solution, uses AI
techniques to learn patterns like query complexity and data volume to auto
scale resources up or down (Stormacq, 2023). Examples of AI-driven database
scaling implementations are limited, since this is a newer and developing scaling
strategy, so it may not be the best approach to implement our DDBMS.
Distributed databases
Using distributed databases is an effective strategy to handle large amounts of data and user activity. Our DDBMS will need to support the storage of data for over 2,000 employee personnel and be capable of managing increased data volumes as necessary. Choosing a distributed database as compared to a centralized database offers advantages such as increased scalability, speed, and overall higher performance (Aswal, 2020).
Oracle and SQL Server can be used to implement distributed databases. Oracle DBMS is suitable for large enterprises and offers high scalability with architectures such as Oracle Real Application Clusters. Although more costly than SQL Server, it simplifies budget management by charging only for data currently being used / stored. However, SQL Server holds an advantage over Oracle with its faster query execution times (Ilić, 2021). SQL Server is simpler to implement than Oracle because of the SQL language used and its simple security type, different from Oracle which uses multilayer security. Both will be compatible with the distributed database protype using enterprise Linux Server Virtual Machines.
Case Study: How Netflix Scales Its Global Streaming Service
Overview of the scaling challenge
Scaling resources can be a challenge for databases systems because it involves implementing appropriate strategies that can evolve with changing business needs. Early in Netflix’s career, it was apparent that the existing vertical scaling strategy (Netflix’s monolith) was not reliable due to the company’s rapid growth and expansion. This realization motivated the distribution of data centers across different regions, allowing for a more reliable database system that could route users to alternative regions in the event of a failure (Scalability, 2017).
SQL for User Data & Viewing History : To handle data storage needs, Netflix utilizes both MySQL and NoSQL databases. Using two different databases allows Netflix to benefit from the distinct advantages and strengths of each type of database. MySQL excels at supporting transactional data, making it ideal in supporting business operations. Meanwhile, NoSQL is well-suited for systems that require high scalability and processing high-volume data, like the user viewing history feature in the Netflix interface (Saxena, 2023).
Caching : For high-speed data access, Netflix implements a caching method known as EVCache, as previously discussed in the above section on caching. EVCache is a Memcached-based solution designed to store frequently requested data in RAM, allowing reading and writing operations to have low latency. RAM is significantly faster at retrieving data compared to the primary database, enhancing the overall performance of the system (Saxena, 2023).
Load Balancing : To manage network traffic among servers, Netflix uses a two-tier load balancing technique. In the first-tier, a round robin method supports the routing of network traffic across multiple zones. The second-tier then distributes traffic within those zones. For network security, the Zuul edge gateway service oversees inbound and outbound traffic of the platform. Zuul allows Netflix to have centralized control over interactions between Netflix and their clients (Saxena, 2023).
Cloud Infrastructure : To support its cloud infrastructure, Netflix employs a variety of Amazon Webservices, including AWS Route 53 and Amazon S3. AWS Route 53 is used for DNS and global routing, while Amazon S3 is used for large data storage, essential for a large enterprise like Netflix. These, along with other AWS services, contribute to a highly reliable and scalable system (Saxena, 2023).
In conclusion, Netflix’s ability to scale a global system as large as theirs is supported through different methods and best practices that have evolved with the company’s growth. Understanding these strategies is important for designing our distributed DBMS given they provide real world examples for addressing scalability, performance, and reliability. Considering these aspects will be important for us as a business to ensure the delivery of an efficient DBMS and a successful enterprise.
Aswal,
S. (2023). Distributed database systems for large-scale data management. https://doi.org/10.52783/tojqi.v11i4.10020
Blog,
N. T. (2023, March 6). Scaling media machine learning at Netflix. Medium.
https://netflixtechblog.com/scaling-media-machine-learning-at-netflix-f19b400243
Datadog.
(2022, September 27). What is auto-scaling? How it works & use cases. Datadog.
https://www.datadoghq.com/knowledge-center/auto-scaling/
Emerich,
A. (n.d.). Database caching: Overview, types, strategies, and their benefits. Prisma’s
Data Guide. https://www.prisma.io/dataguide/managing-databases/introduction-database-caching
GeeksforGeeks.
(2024, June 15). What is a serverless database? GeeksforGeeks. https://www.geeksforgeeks.org/what-is-a-serverless-database/
HAProxy
Technologies. (2025). Global server load balancing. HAProxy.com.. https://www.haproxy.com/documentation/haproxy-aloha/load-balancing/global-server-load-balancing/
Hellerstein,
J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith, J., Sreekanti, V.,
Tumanov, A., & Wu, C. (2019). Serverless computing: One step forward, two
steps back. UC Berkeley. https://bibbase.org/network/publication/hellerstein-faleiro-gonzalez-schleiersmith-sreekanti-tumanov-wu-serverlesscomputingonestepforwardtwostepsback-2019
Ilić,
M., Kopanja, L., Zlatković, D., Trajković, M., & Ćurguz, D. (2021, June).
Microsoft SQL Server and Oracle: Comparative performance analysis. In The
7th International Conference on Knowledge Management and Informatics (pp.
33–40).
Kaur,
S., Kumar, K., Singh, J., & Ghumman, N. S. (2015). Round-robin-based load
balancing in software-defined networking. 2015 2nd International Conference
on Computing for Sustainable Global Development (INDIACom), 2136–2139. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7100616&isnumber=7100186
Memcached.
(n.d.). Memcached - A distributed memory object caching system. https://memcached.org/
Mohan,
C., Ooi, B. C., & Vossen, G. (2019). Distributed computing with
permissioned blockchains and databases (Dagstuhl Seminar 19261). Dagstuhl
Reports, 9(6), 69–94. https://doi.org/10.4230/dagrep.9.6.69
Rangarajan,
S., & Prudhviraj, K. (2024, October 11). Building a global caching system
at Netflix: A deep dive into global replication. InfoQ. https://www.infoq.com/articles/netflix-global-cache/
Saxena,
S. (2023, June 21). System design of Netflix - Sanket Saxena. Medium. https://saxenasanket.medium.com/system-design-of-netflix-part-1-4d65642ed738
Scalability,
H. (2017, December 11). Netflix: What happens when you press play? High
Scalability. https://highscalability.com/netflix-what-happens-when-you-press-play/
Sid.
(2024, August 3). How load balancers distribute traffic? Medium. https://medium.com/@sidharth_m/how-load-balancers-distribute-traffic-d904cd11d321
Singh,
D., & Reddy, C. K. (2015). A survey on platforms for big data analytics. Journal
of Big Data, 2(1).
Stormacq, S. (2023, November 29). Amazon Redshift adds new AI capabilities, including Amazon Q, to boost efficiency and productivity. Amazon Web Services. https://aws.amazon.com/blogs/aws/amazon-redshift-adds-new-ai-capabilities-to-boost-efficiency-and-productivity/
Comments
Post a Comment