Central Delta Lake Central Delta Lake subscriptions
9
Data Ingestion On-premises Data External Data Ingest Transfer subscriptions subscriptions
10
Core Data Processing
11
Data Mesh Core Principles
12
Limitations to Data Mesh Implementation
13
Data Product Teams
14
Hybrid Distributed Data Mesh
15
Scalable, collaborative ETL
16
Terraform Managed Databricks Workspaces
17
End to End Data Pipeline
18
ETL Core Code
19
DLT Pipeline Flow View
20
Data Quality Overview
21
Data Quality Details
22
Data Quality: File Arrival Checks Workflow
Description:
Explore a 33-minute conference talk detailing HSBC's journey in constructing a petabyte-scale cybersecurity data mesh using Azure and Delta Lake. Discover how HSBC tackled unique cybersecurity challenges, including high data volumes and strict regulatory requirements, to transform disparate data silos into a unified, scalable platform. Learn about the innovative infrastructure and architecture employed, from landing zone concepts and secure access workstations to data lake structures and isolated data ingestion. Gain insights into the hybrid data mesh leveraging Delta Lake, resulting in a flexible, secure, self-service environment that empowers cyber analysts. Delve into topics such as Azure Landing Zone overview, data ingestion processes, core data processing, and the implementation of data mesh principles. Understand the limitations faced and solutions devised, including the creation of data product teams and the use of Terraform-managed Databricks workspaces. Examine the end-to-end data pipeline, ETL core code, and data quality measures implemented to ensure robust and reliable cybersecurity analytics.
Read more
Building a Petabyte-Scale Cybersecurity Data Mesh in Azure with Delta Lake - HSBC Case Study