Главная
Study mode:
on
1
Intro
2
Aim & Intuition behind variant calling
3
What is GATK?
4
Somatic vs Germline variants
5
GATK best practice workflow steps
6
Data pre-processing steps - alignment
7
A note on Read Groups
8
Data pre-processing steps - mark duplicate reads
9
Data pre-processing steps - Base Quality Score Recalibrator
10
Variant discovery
11
Data used for demonstration
12
System requirements
13
Setting up directories
14
Download data
15
Download reference fasta, known sites and create supporting files .fai, .dict
16
Setting directory paths
17
Step 1: Perform QC - FastQC
18
Step 2: Align reads - BWA-MEM
19
Step 3: Mark Duplicate Reads - GATK MarkDuplicatesSpark
20
Step 4: Base Quality Score Recalibration - GATK BaseRecalibrator + ApplyBQSR
21
Step 5: Post Alignment QC - GATK CollectAlignmentSummaryMetrics and CollectInsertSizeMetrics
22
Create multiQC report of post alignment metrics
23
Step 6: Call variants - GATK HaplotypeCaller
Description:
Dive into a comprehensive tutorial on variant calling from whole genome sequencing (WGS) data using the GATK best practice workflow. Learn how to set up a pipeline in bash (Linux) to pre-process and align reads, ultimately generating a VCF file. Follow step-by-step instructions for quality control with FastQC, alignment using BWA-MEM, marking duplicate reads, performing Base Quality Score Recalibration (BQSR), and calling variants with HaplotypeCaller. Gain insights into the intuition behind each step, runtime expectations, and memory requirements. Access provided code, data sources, and additional resources to enhance your understanding of SAM file formats, SAM flags, and VCF file formats. Perfect for bioinformaticians and researchers looking to master variant calling techniques in genomic analysis.

WGS Variant Calling - Variant Calling with GATK - Part 1 - Detailed NGS Analysis Workflow

Bioinformagician
Add to list
0:00 / 0:00