Главная
Study mode:
on
1
Intro
2
Hunting for data
3
Inspecting the VCF
4
Finding population labels for the samples
5
Parsing VCF with pysam
6
Going from alleles to numbers for a numpy array
7
When to work in colab versus python script
8
Saving data with pandas
9
Adding population labels from the panel file
10
To Colab!
11
PCA
12
First plot! Mission accomplished :
13
Using Altair for plotting with labels
14
Second plot with population labels!
15
Merging with the igsr_population.tsv data
16
TSNE
17
Exercise: PCA on the SNPs
18
Conclusion and origin story for this project
Description:
Embark on a comprehensive bioinformatics project walkthrough that explores the relationship between genes and geography through population genotype data analysis. Learn to run Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) on genetic data from the 1000 Genomes project. Follow step-by-step instructions to download and parse VCF files using pysam, create numpy arrays, and utilize pandas for data manipulation. Transition between Python scripts and Google Colab environments while mastering visualization techniques with both matplotlib and Altair. Gain insights into population genetics by coloring data points based on ancestry labels and merging additional population information. Conclude with an exercise on performing PCA on SNPs and discover the origin story behind this illuminating project.

Genes and Geography - A Bioinformatics Project

OMGenomics
Add to list