Explore the challenges of integrating Apache Spark applications with NoSQL databases in this conference talk from Open Source 101 2023. Discover how CapitalOne, the first US bank to fully migrate to the cloud, developed a custom transactions processing application using open-source technologies like Apache Spark, MongoDB, and Apache Cassandra. Learn about the issues encountered when processing data from MongoDB and Cassandra backends to serve millions of customers daily, including the importance of Cassandra key sequences and data modeling, effective use of Cassandra batching with Spark partitions, managing MongoDB connections at the JVM level, and the implications of using the MongoSpark connector. Gain insights into real-world problems faced during application development and deployment, and understand strategies to mitigate these challenges in Spark, MongoDB, and Cassandra applications.
Challenges of Spark Application Coexisting with NoSQL Databases