Play all

Intro

Aim & Intuition behind variant calling

What is GATK?

Somatic vs Germline variants

GATK best practice workflow steps

Data pre-processing steps - alignment

A note on Read Groups

Data pre-processing steps - mark duplicate reads

Data pre-processing steps - Base Quality Score Recalibrator

Variant discovery

Data used for demonstration

System requirements

Setting up directories

Download data

Download reference fasta, known sites and create supporting files .fai, .dict

Setting directory paths

Step 1: Perform QC - FastQC

Step 2: Align reads - BWA-MEM

Step 3: Mark Duplicate Reads - GATK MarkDuplicatesSpark

Step 4: Base Quality Score Recalibration - GATK BaseRecalibrator + ApplyBQSR

Step 5: Post Alignment QC - GATK CollectAlignmentSummaryMetrics and CollectInsertSizeMetrics

Create multiQC report of post alignment metrics

Step 6: Call variants - GATK HaplotypeCaller

Description:

Dive into a comprehensive tutorial on variant calling from whole genome sequencing (WGS) data using the GATK best practice workflow. Learn how to set up a pipeline in bash (Linux) to pre-process and align reads, ultimately generating a VCF file. Follow step-by-step instructions for quality control with FastQC, alignment using BWA-MEM, marking duplicate reads, performing Base Quality Score Recalibration (BQSR), and calling variants with HaplotypeCaller. Gain insights into the intuition behind each step, runtime expectations, and memory requirements. Access provided code, data sources, and additional resources to enhance your understanding of SAM file formats, SAM flags, and VCF file formats. Perfect for bioinformaticians and researchers looking to master variant calling techniques in genomic analysis.

WGS Variant Calling - Variant Calling with GATK - Part 1 - Detailed NGS Analysis Workflow

Bioinformagician

Add to list

#Data Science #Bioinformatics #Engineering #Manufacturing #Quality Control #Computer Science #Operating Systems #Command Line #Shell Scripting #Bash Scripting

0:00 / 0:00