You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think your idea of pre-processing index reads in order to correct a certain number of mismatches is interesting! You could potentially avoid throwing out a lot of reads that way, but you'd need to ensure that no index read would ever get "corrected" to the wrong read, like some algorithm that makes sure that there's only one "right" index that an ambiguous index could be interpreted as.
I'm somewhat concerned that you may be building some untenably large dictionaries. If you get errors where it won't allocate enough space for them, you may want to switch over to a line-by-line system for reading the files, so that you don't have to keep it all in working memory.
I like how you track the different kinds of bad reads and report them separately at the end!
I think it would be good to flesh out your sub-functions a little more thoroughly, as far as inputs, returns, effects (file I/O, etc.) and naming them so that you can reference them more easily in your larger algorithm and improve clarity. That would also allow you to generate some quick tests specific to each function to ensure that it's doing what you want it to.
The text was updated successfully, but these errors were encountered:
I think your idea of pre-processing index reads in order to correct a certain number of mismatches is interesting! You could potentially avoid throwing out a lot of reads that way, but you'd need to ensure that no index read would ever get "corrected" to the wrong read, like some algorithm that makes sure that there's only one "right" index that an ambiguous index could be interpreted as.
I'm somewhat concerned that you may be building some untenably large dictionaries. If you get errors where it won't allocate enough space for them, you may want to switch over to a line-by-line system for reading the files, so that you don't have to keep it all in working memory.
I like how you track the different kinds of bad reads and report them separately at the end!
I think it would be good to flesh out your sub-functions a little more thoroughly, as far as inputs, returns, effects (file I/O, etc.) and naming them so that you can reference them more easily in your larger algorithm and improve clarity. That would also allow you to generate some quick tests specific to each function to ensure that it's doing what you want it to.
The text was updated successfully, but these errors were encountered: