Главная
Study mode:
on
1
Intro
2
Informatica ETL Pipeline
3
Dealing with buggy pipelines
4
Data Preview - Feature Requirements
5
What spark-submit based data preview achieved?
6
Execution Profiling Results - Spark-submit
7
Compare Spark-submit with Spark Job Server
8
Spark-submit based Architecture
9
SJS based Architecture
10
Execution Flow
11
Spark Job Server vs Spark-submit
12
Setup Details
13
Getting started
14
Environment Variables (local.sh. template)
15
Application Code Migration
16
WordCount Example
17
Running Jobs
18
Handling Job Dependencies
19
Multiple Spark Job Servers
20
Concurrency
21
Support for Kerberos
22
HTTPS/SSL Enabled Server
23
Logging
24
Key Takeaways
25
Timeouts (in local.conf. template)
26
Complex Data Representation in Informatica Developer Tool
27
Monitoring: Binaries
28
Monitoring: Spark Context
29
Monitoring: Jobs
30
Monitoring: Yarn Job
Description:
Explore a 32-minute conference talk from Databricks on leveraging Spark-Jobserver to enhance data integration pipeline execution. Learn how Informatica utilizes Spark-Jobserver's capabilities to solve data visualization challenges for hierarchical data in Big Data pipelines. Discover the benefits of Spark context reuse for faster task execution, integration techniques using REST APIs, and strategies for managing parallel job execution and monitoring. Gain insights into configuring Spark-Jobserver with YARN cluster mode, handling secure SSL-enabled clusters, and managing multiple Spark-Jobserver instances. Delve into topics such as concurrent job execution, dependency resolution, and the journey of adopting Spark-Jobserver in a data integration product.

Faster Data Integration Pipeline Execution Using Spark-Jobserver

Databricks
Add to list
0:00 / 0:00