layout | title | permalink |
page |
Day 3 |
/day3/ |
A large fraction of biological science in these days involves analysis of genomic data.
There are many de facto standards for representing genomic data, including but not limited to, FASTA, FASTQ, SAM/BAM, VCF, GTF, and BED formats. The ability to read and manipulate data files stored in these formats is one of the key skill sets needed for genome scientists.
Effective visualization of the genomic data and its analysis outcome is a key component for successful communication in the research community handling a large amount of genomic data.
Today, we will learn the widely used genomic data formats and practice how to read and manipulate the data in these formats. We will also learn the basics of data visualization with practice on real data.
Session | Time | Topics |
I | 9:00-10:15 AM | Mini-Practice : FASTQ File Manipulation |
10:15-10:30AM | Coffee Break | |
II | 10:30-11:15 AM | Lecture : Data Formats and Conversions |
III | 11:15-12:00 AM | Mini-Practice: Select a subset of variant/genotype calls |
12:00-1:00PM | Lunch | |
IV | 1:00-2:15 PM | Practice : Analysis with Genomic Data Formats |
2:15-2:30 PM | Coffee Break | |
V | 2:30-4:00 PM | Visualization: Overview and Practice |
Hyun Min Kang (HMK)
Jacob Kitzman (JK)
—- Coffee Break [15 mins] —
—- Lunch Break [1 hr] —
- Understanding how to access FASTA files
- Accessing aligned sequence reads in SAM/BAM format
- Representing genes and transcripts using GTF and genePred format
- Working with BED files
—- Coffee Break [15 mins] —
- Data visualization for exploratory analysis
- Useful plotting links
- ipython notebook - basic plotting
- ipython notebook - read counts, X vs autosomes
—- End/Wrap-Up —