Skip to content

Genotype IO and R

Patrick Deelen edited this page Jan 8, 2020 · 3 revisions

Reading in genotype information in R can be a big problem due to file size and file format support. Using rJava and the Genotype-IO API this is no problem anymore.

Note: for reading BGEN files we recommend: https://bitbucket.org/gavinband/bgen/wiki/rbgen

After instalation of rJava first start rJava and load the jar with dependencies. If you get a heapspace error increase the reserved memory by changing Xmx and Xms

library(rJava)
.jinit(classpath="Genotype-IO-0.0.4-SNAPSHOT-jar-with-dependencies.jar", parameters="-Xmx2g -Xms2g")

Using the example method below you can easily read in any supported file format. Just put in the base path of the input and file format. If necessary one can change the cache size and include filters for samples and variants.

loadGenotypeData <- function( basePath, dataType, cacheSize=1000, variantFilter = .jnull(class = "org/molgenis/genotype/variantFilter/VariantFilter"), sampleFilter = .jnull("org/molgenis/genotype/sampleFilter/SampleFilter")){
  dataType <- toupper(dataType)
  genotypeDataFormat <- .jcall("org/molgenis/genotype/RandomAccessGenotypeDataReaderFormats", "Lorg/molgenis/genotype/RandomAccessGenotypeDataReaderFormats;","valueOf", dataType)
  return(.jcall(genotypeDataFormat, "Lorg/molgenis/genotype/RandomAccessGenotypeData;", "createFilteredGenotypeData", basePath, as.integer(cacheSize), variantFilter, sampleFilter))
}

In the code below examples for the variant filter and sample filter are given. "includedSnps" and "includedSamples" are R character arrays.

variantFilter <-  .jcast(.jnew("org/molgenis/genotype/variantFilter/VariantIdIncludeFilter",includedSnps),"org/molgenis/genotype/variantFilter/VariantFilter")
sampleFilter <- .jcast(.jnew("org/molgenis/genotype/sampleFilter/SampleIdIncludeFilter",includedSamples), "org/molgenis/genotype/sampleFilter/SampleFilter")

A short example of a use case of the API in R, we read a small subset of all data in Plink Bed format, select a specific SNP and printing the histogram of dosages. Note the variantFilter and sampleFilter are optional. You can omit these arguments

genotypeData <- loadGenotypeData(basePath = "PathToFiles", dataType = "Plink_BED", variantFilter = variantFilter, sampleFilter = sampleFilter)
snp <- .jcall(genotypeData, "Lorg/molgenis/genotype/variant/GeneticVariant;", "getSnpVariantByPos", "chr", as.integer(position))
hist(as.numeric(.jcall(snp, "[F", "getSampleDosages")))

Use the rJava vignette for more information on rJava and check the java code examples for more help in using the Genotype IO in R.

Clone this wiki locally