You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug report is regarding a new tool, GtfToBed, which was introduced in #8942 PR. The following code creates a reproducible example of the error:
Get the necessary files
Reference genome
if [ !-f'hg38.fa.gz' ];thenecho'Downloading the reference genome'
wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/latest/hg38.fa.gz
fi
sha256sum 'hg38.fa.gz'
if [ !-f'hg38.ncbiRefSeq.gtf.gz' ];thenecho'Downloading the reference genome'
wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
fi
sha256sum 'hg38.ncbiRefSeq.gtf.gz'
Using GATK jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar GtfToBed --gtf-path hg38.ncbiRefSeq.gtf --sequence-dictionary hg38.dict --output blah.bed --verbosity WARNING
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@53125718]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
[February 27, 2025, 12:26:04 PM EET] org.broadinstitute.hellbender.tools.walkers.conversion.GtfToBed done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=134217728
java.lang.NullPointerException: Cannot invoke "org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfGeneFeature.addTranscript(org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfTranscriptFeature)" because "gene" is null
at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.aggregateRecordsIntoGeneFeature(AbstractGtfCodec.java:339)
at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:170)
at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:23)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:344)
at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:311)
at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:531)
at java.base/java.lang.Iterable.spliterator(Unknown Source)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1182)
at org.broadinstitute.hellbender.engine.FeatureWalker.traverse(FeatureWalker.java:97)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1119)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
The text was updated successfully, but these errors were encountered:
Hi @mmahmoudian
This tool is written to comply with Gencode style GTF files. UCSC GTF file that you provided lacks the proper gene level entries to build the map to perform other functions to sort and prioritize based on tags provided in the GTF. Ignoring lack of gene level entries to just create bed file based on gtf coordinates is not the way this tool is implemented so you may need to dig your way through using python or any other scripting language to convert that gtf to bed or you may use options provided by UCSC table browser to extract bed format from refseq table.
@gokalpcelik thanks for the explanation. Considering that this information was not mentioned in the documentation (at least me and my colleague missed it if it is there), and considering that it is generally not a good practice to throw ambiguous errors to user, may I suggest:
Update the documentation (website and --help) to clarify which GTF file is suitable for this tool
Add a part in the GtfToBed function to first check and validate the input, and produce clear and user-friendly error in case something is not up to the standard/expectations of the tool.
This bug report is regarding a new tool, GtfToBed, which was introduced in #8942 PR. The following code creates a reproducible example of the error:
Get the necessary files
Reference genome
GTF file
Prepare files
Unpack the compressed files
Create the dict file
./gatk-4.6.1.0/gatk CreateSequenceDictionary \ --REFERENCE 'hg38.fa' \ --VERBOSITY WARNING
Convert GTF to BED
The text was updated successfully, but these errors were encountered: