Performance evaluation of various deployment scenarios of the 3-replicated Cassandra NoSQL cluster on AWS

Anatoliy Gorbenko, Andrii Karpenko, Olga Tarasyuk

Abstract


A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been proposed to effectively manage Big Data set whose volume, velocity and variability are difficult to deal with by using the traditional Relational Database Management Systems. Tradeoffs between consistency, availability, partition tolerance and latency is intrinsic to such systems. Although relations between these properties have been previously identified by the well-known CAP and PACELC theorems in qualitative terms, it is still necessary to quantify how different consistency settings, deployment patterns and other properties affect system performance.This experience report analysis performance of the Cassandra NoSQL database cluster and studies the tradeoff between data consistency guaranties and performance in distributed data storages. The primary focus is on investigating the quantitative interplay between Cassandra response time, throughput and its consistency settings considering different single- and multi-region deployment scenarios. The study uses the YCSB benchmarking framework and reports the results of the read and write performance tests of the three-replicated Cassandra cluster deployed in the Amazon AWS. In this paper, we also put forward a notation which can be used to formally describe distributed deployment of Cassandra cluster and its nodes relative to each other and to a client application. We present quantitative results showing how different consistency settings and deployment patterns affect Cassandra performance under different workloads. In particular, our experiments show that strong consistency costs up to 22 % of performance in case of the centralized Cassandra cluster deployment and can cause a 600 % increase in the read/write requests if Cassandra replicas and its clients are globally distributed across different AWS Regions.

Keywords


Cassandra; NoSQL; distributed databases; replication; performance benchmarking; YCSB; data consistency; throughput; latency; deployment scenarios; Amazon AWS

Full Text:

PDF

References


Meier, A., Kaufmann, M. SQL & NoSQL Databases: Models, Languages, Consistency Options and Architectures for Big Data Management, Berlin: Springer Verlag, 2019, 229 p.

IONOS, Apache Cassandra: distributed management of large databases. Available at: https://www.ionos.co.uk/digitalguide/hosting/technical-matters/apache-cassandra/ (accessed 10.10.2021).

Pritchett, D. Base: An Acid Alternative. ACM Queue, vol. 6, no. 3, pp. 48-55.

Kumar, M. S., Jayagopal, P. Comparison of NoSQL Database and Traditional Database-An emphatic analysis. Int. Journal on Informatics and Visualization, 2018, vol. 2., no. 2, pp. 51-55.

Github, Benchmarking Cassandra and other NoSQL databases with YCSB. [Online]. Available at: https://github.com/cloudius-systems/osv/wiki/Benchmarking-Cassandra-and-other-NoSQL-databases-with-YCSB (accessed 10.10.2021).

Carpenter, J., Hewitt, E. Cassandra - The Definitive Guide: Distributed Data at Web Scale, O'Reilly Media, 2020. 400 p.

Klein, J., Gorton, I., Ernst, N., Donohoe, P., Pham, K., Matser, C. Performance Evaluation of NoSQL Databases: A Case Study. Proceedings of the 1st ACM/SPEC Int. Workshop on Performance Analysis of Big Data Systems, Austin, USA, 2015, pp. 5-10.

Haughian, G., Osman, R., Knottenbelt, W. Benchmarking Replication in Cassandra and MongoDB NoSQL Datastores. Proceedings of the 27th Int. Conf. on Database and Expert Systems Applications, Porto, Portugal, 2016, pp. 152-166.

Bajaber, F., Sakr, S., Batarfi, O., Altalhi, A., Barnawi, A. Benchmarking big data systems: A survey. Computer Communications, 2020, vol. 149, pp. 241-251.

Farias, V. A., Sousa, F. R., Maia, J. G. R., Gomes, J. P. P., Machado, J. C. Regression based performance modeling and provisioning for NoSQL cloud databases. Future Generation Computer Systems, 2018, vol. 79, pp. 72-81.

Karniavoura, F. & Magoutis, K. A measurement-based approach to performance prediction in NoSQL systems. Proceedings of the 25th IEEE Int. Symp. on the Modeling, Analysis, and Simulation of Computer and Telecom. Systems, Banff, Canada, 2017, pp. 255-262.

Cruz, F., Maia, F., Matos, M., Oliveira, R., Paulo, J., Pereira, J., Vilaca, R. Resource usage prediction in distributed key-value datastores. Proceedings of the IFIP Distributed Applications and Interoperable Systems Conf., Heraklion, Crete, 2017, pp. 144-159.

Mansouri, Y., Babar M. A. The Impact of Distance on Performance and Scalability of Distributed Database Systems in Hybrid Clouds. ArXiv, 2020, Vol. arXiv:2007.15826, pp. 1-26.

Gorbenko, A., Romanovsky, A., Tarasyuk, O. Interplaying Cassandra NoSQL consistency and performance: A benchmarking approach. Communications in Computer and Information Science. Berlin, Springer Nature, 2020, vol. 1279, pp. 168-184.

Gorbenko, A., Romanovsky, A., Tarasyuk, O. Fault tolerant internet computing: Benchmarking and modelling trade-offs between availability, latency and consistency. Journal of Network and Computer Applications, 2019, vol. 146, pp. 1-14.

Cooper, B., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. Benchmarking Cloud Serving Systems with YCSB. Proceedings of the 1st ACM Symp. on Cloud Computing, Indianapolis, Indiana, USA, 2010, pp. 143-154.

Gilbert, S., Lynch, N. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT News, 2002, vol. 33, no. 2, pp. 51-59.

Abadi, D. Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 2012, vol. 45, no.2, pp. 37-42.

Gorbenko, A., Romanovsky, A. Time-outing Internet Services. IEEE Security & Privacy, 2013, vol. 11, no. 2, pp. 68-71.

Gorbenko, A., Tarasyuk, O. Exploring Timeout as a Performance and Availability Factor of Distributed Replicated Database Systems. Radioelectronic and Computer Systems, 2020, no. 4, pp. 98-105. DOI: 10.32620/reks.2020.4.09.




DOI: https://doi.org/10.32620/reks.2021.4.13

Refbacks

  • There are currently no refbacks.