Tech Scribbles

Performance Optimization in Distributed Database Systems: Caching, Load Balancing, and Scaling Strategies

Julissa Paramo

Performance Optimization Techniques

To support an enterprise class distributed database management system, performance optimization techniques must be implemented. Key principles of a high-performing and scalable system include faster data retrieval and efficient traffic distribution. To meet these requirements, techniques such as caching and load balancing can be utilized.

Caching for faster data retrieval

Database caching is a technique that helps increase data retrieval speed, improving overall database performance. A cache is a high-speed data storage layer that stores a subset of data that is frequently requested or of high value (Emerich, n.d.). Caching allows retrieval of data that is faster than constantly accessing the primary database, therefore reducing its workload (Rangarajan & Prudhviraj, 2024).

Modern enterprises like Netflix use caching to ensure data availability on a global scale, specifically through a method called distributed caching, which stores data across multiple servers to ensure quick access and high availability (Rangarajan & Prudhviraj, 2024). Netflix is able to enhance system reliability, minimize latency, and deliver a scalable system through their EVCache caching solution, which is based on popular caching solution, Memcached, which stores data in RAM, offering faster access times while maintaining horizontal scalability, with the ability to add more servers if needed (Memcached - a distributed memory object caching System, n.d.).

After the analysis of caching strategies, utilizing Memcached will best support the global DDBMS to improve data retrieval speeds. A Memcached caching solution will be used within each of the three datacenters in US, Japan, and Argentina primarily for faster data retrieval times.

Load balancing for efficient traffic distribution

Load balancing works to efficiently distribute incoming network traffic amongst servers, ensuring they do not become overwhelmed (Sid, 2024), contributing to the system’s overall performance. Load balancing utilizes hardware known as load balancers, which follow specific algorithms to distribute workloads evenly across servers. Networks in a global distributed database system are required to handle large amounts of traffic (Kaur et al. 2015). Therefore, implementing load balancing techniques such as Round-Robin, Least Connections, and Geolocation Routing can improve database network traffic and optimize performance.

Round-robin load balancing is a technique that distributes requests between servers in a cyclical manner (Sid, 2024). Round robin load balancing works best when servers have similar computing capabilities and storage capacities (What Is Round Robin Load Balancing? Definition & FAQs | VMware, 2024). This technique assumes all servers receive about the same amount of traffic.

Least connections load balancing is a technique that distributes requests to the server having the least number of active connections (Sid, 2024). However, this technique may lose efficiency in scenarios where servers have differing capacities.

Geolocation routing directs requests to the nearest server based on the location of the client (HAProxy Technologies, 2025). This technique helps increase response times and can be effective for content localization.

After the analysis of different load balancing techniques, geolocation routing seems the most aligned to support a global DDBMS to efficiently distribute network traffic. With geolocation routing, the locations from which requests originate are considered to improve the response from the DDBMS. For example, the database network would handle requests from users based on their proximity to datacenters located in the US, Japan, and Argentina.

Cloud-Native Scaling Strategies and Emerging Trends

To ensure that an enterprise class distributed database management system can adapt to varying workloads and changing demands, implementing effective scaling strategies is essential. Modern cloud-based databases are becoming more common and are emerging to enhance scalability. Strategies such as auto-scaling, ai-driven database scaling, and distributed cloud databases are all emerging approaches being used to achieve this.

Serverless databases for auto-scaling

Serverless databases are databases that use service providers for their server management tasks (GeeksforGeeks, 2024). Serverless databases can allocate and deallocate resources automatically, compared to the manual allocation of resources (Hellerstein et. Al, 2019). With auto-scaling, resources are configured to scale automatically in response to an event or threshold that closely correlates with lower performance, as defined by the organization (Datadog, 2022). However, auto-scaling is not recommended in the early stages of database implementations that are still growing. For example, a system that manages performance reviews for a business experiences high traffic and usage during the period when reviews are due. To handle the increased demand, the system implements auto-scaling to allocate more servers during this peak period.

AI-Driven database scaling

AI-Driven scaling features allow for auto-scaling depending on whether the system needs to be optimized for cost, optimized for performance, or a balance of both defined by current needs of the organization. Amazon Redshift Serverless, a cloud data warehousing solution, uses AI techniques to learn patterns like query complexity and data volume to auto scale resources up or down (Stormacq, 2023). Examples of AI-driven database scaling implementations are limited, since this is a newer and developing scaling strategy, so it may not be the best approach to implement our DDBMS.

Distributed databases

Using distributed databases is an effective strategy to handle large amounts of data and user activity. Our DDBMS will need to support the storage of data for over 2,000 employee personnel and be capable of managing increased data volumes as necessary. Choosing a distributed database as compared to a centralized database offers advantages such as increased scalability, speed, and overall higher performance (Aswal, 2020).

Oracle and SQL Server can be used to implement distributed databases. Oracle DBMS is suitable for large enterprises and offers high scalability with architectures such as Oracle Real Application Clusters. Although more costly than SQL Server, it simplifies budget management by charging only for data currently being used / stored. However, SQL Server holds an advantage over Oracle with its faster query execution times (Ilić, 2021). SQL Server is simpler to implement than Oracle because of the SQL language used and its simple security type, different from Oracle which uses multilayer security. Both will be compatible with the distributed database protype using enterprise Linux Server Virtual Machines.

Case Study: How Netflix Scales Its Global Streaming Service

Overview of the scaling challenge

Scaling resources can be a challenge for databases systems because it involves implementing appropriate strategies that can evolve with changing business needs. Early in Netflix’s career, it was apparent that the existing vertical scaling strategy (Netflix’s monolith) was not reliable due to the company’s rapid growth and expansion. This realization motivated the distribution of data centers across different regions, allowing for a more reliable database system that could route users to alternative regions in the event of a failure (Scalability, 2017).

SQL for User Data & Viewing History : To handle data storage needs, Netflix utilizes both MySQL and NoSQL databases. Using two different databases allows Netflix to benefit from the distinct advantages and strengths of each type of database. MySQL excels at supporting transactional data, making it ideal in supporting business operations. Meanwhile, NoSQL is well-suited for systems that require high scalability and processing high-volume data, like the user viewing history feature in the Netflix interface (Saxena, 2023).

Caching : For high-speed data access, Netflix implements a caching method known as EVCache, as previously discussed in the above section on caching. EVCache is a Memcached-based solution designed to store frequently requested data in RAM, allowing reading and writing operations to have low latency. RAM is significantly faster at retrieving data compared to the primary database, enhancing the overall performance of the system (Saxena, 2023).

Load Balancing : To manage network traffic among servers, Netflix uses a two-tier load balancing technique. In the first-tier, a round robin method supports the routing of network traffic across multiple zones. The second-tier then distributes traffic within those zones. For network security, the Zuul edge gateway service oversees inbound and outbound traffic of the platform. Zuul allows Netflix to have centralized control over interactions between Netflix and their clients (Saxena, 2023).

Cloud Infrastructure : To support its cloud infrastructure, Netflix employs a variety of Amazon Webservices, including AWS Route 53 and Amazon S3. AWS Route 53 is used for DNS and global routing, while Amazon S3 is used for large data storage, essential for a large enterprise like Netflix. These, along with other AWS services, contribute to a highly reliable and scalable system (Saxena, 2023).

In conclusion, Netflix’s ability to scale a global system as large as theirs is supported through different methods and best practices that have evolved with the company’s growth. Understanding these strategies is important for designing our distributed DBMS given they provide real world examples for addressing scalability, performance, and reliability. Considering these aspects will be important for us as a business to ensure the delivery of an efficient DBMS and a successful enterprise.

References

Aswal, S. (2023). Distributed database systems for large-scale data management. https://doi.org/10.52783/tojqi.v11i4.10020

Blog, N. T. (2023, March 6). Scaling media machine learning at Netflix. Medium. https://netflixtechblog.com/scaling-media-machine-learning-at-netflix-f19b400243

Datadog. (2022, September 27). What is auto-scaling? How it works & use cases. Datadog. https://www.datadoghq.com/knowledge-center/auto-scaling/

Emerich, A. (n.d.). Database caching: Overview, types, strategies, and their benefits. Prisma’s Data Guide. https://www.prisma.io/dataguide/managing-databases/introduction-database-caching

GeeksforGeeks. (2024, June 15). What is a serverless database? GeeksforGeeks. https://www.geeksforgeeks.org/what-is-a-serverless-database/

HAProxy Technologies. (2025). Global server load balancing. HAProxy.com.. https://www.haproxy.com/documentation/haproxy-aloha/load-balancing/global-server-load-balancing/

Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith, J., Sreekanti, V., Tumanov, A., & Wu, C. (2019). Serverless computing: One step forward, two steps back. UC Berkeley. https://bibbase.org/network/publication/hellerstein-faleiro-gonzalez-schleiersmith-sreekanti-tumanov-wu-serverlesscomputingonestepforwardtwostepsback-2019

Ilić, M., Kopanja, L., Zlatković, D., Trajković, M., & Ćurguz, D. (2021, June). Microsoft SQL Server and Oracle: Comparative performance analysis. In The 7th International Conference on Knowledge Management and Informatics (pp. 33–40).

Kaur, S., Kumar, K., Singh, J., & Ghumman, N. S. (2015). Round-robin-based load balancing in software-defined networking. 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), 2136–2139. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7100616&isnumber=7100186

Memcached. (n.d.). Memcached - A distributed memory object caching system. https://memcached.org/

Mohan, C., Ooi, B. C., & Vossen, G. (2019). Distributed computing with permissioned blockchains and databases (Dagstuhl Seminar 19261). Dagstuhl Reports, 9(6), 69–94. https://doi.org/10.4230/dagrep.9.6.69

Rangarajan, S., & Prudhviraj, K. (2024, October 11). Building a global caching system at Netflix: A deep dive into global replication. InfoQ. https://www.infoq.com/articles/netflix-global-cache/

Saxena, S. (2023, June 21). System design of Netflix - Sanket Saxena. Medium. https://saxenasanket.medium.com/system-design-of-netflix-part-1-4d65642ed738

Scalability, H. (2017, December 11). Netflix: What happens when you press play? High Scalability. https://highscalability.com/netflix-what-happens-when-you-press-play/

Sid. (2024, August 3). How load balancers distribute traffic? Medium. https://medium.com/@sidharth_m/how-load-balancers-distribute-traffic-d904cd11d321

Singh, D., & Reddy, C. K. (2015). A survey on platforms for big data analytics. Journal of Big Data, 2(1).

Stormacq, S. (2023, November 29). Amazon Redshift adds new AI capabilities, including Amazon Q, to boost efficiency and productivity. Amazon Web Services. https://aws.amazon.com/blogs/aws/amazon-redshift-adds-new-ai-capabilities-to-boost-efficiency-and-productivity/

Search This Blog

Tech Scribbles

Comments

Post a Comment