Solu Platform's long-read assembly pipeline graduates from experimental status

Summary

We recently launched long-read assembly as an experimental feature.
After fine-tuning the parameters, we've completed an internal validation of Nanopore-only assemblies, benchmarking them against hybrid assemblies (long reads polished with short reads).
In our initial validation with a dataset with 4 different microbial species, Solu platform outperformed 2 popular long-read assembly pipelines.
We are removing the ‘experimental’ tag, but more formal validation will follow.

Creating Nanopore-only assemblies with Solu Platform

Recently, we introduced long-read assembly as an experimental feature on our genomic surveillance platform. This post shares some early internal validation results for Nanopore-only samples.

We used a dataset of Candida auris, Escherichia coli, Pseudomonas aeruginosa, and Acinetobacter baumannii samples with both short and long-read sequencing data. We created Nanopore-only assemblies with 3 different pipelines:

Solu Platform
Dragonflye with Medaka
Hybracter with default parameters

These were compared to hybrid assemblies (Flye + Polypolish) which are considered gold standard.

We conducted genomic characterization and phylogenetic analysis on all of the resulting assemblies and compared the differences in the results.

Nanopore-only assemblies are good enough for AMR genotyping, but have implications for phylogenetics

The choice of assembly pipeline had minimal impact on species identification, clade assignment, MLST, or AMR genotyping. Results were nearly identical across all methods and matched the hybrid assemblies, with one exception: one Pseudomonas isolate where the sequence type was unrecognizable from all long-read assemblies.

More significant differences were seen in SNP distances. Solu's long-read assemblies generally aligned closest to the hybrid results, particularly for E. coli and Acinetobacter baumannii, where SNP distances were minimal. Pseudomonas aeruginosa showed greater variation, likely due to known basecalling errors, but Solu's pipeline produced the least error-free assemblies.

Next steps of Solu's long-read pipeline

It seems that for general genomic surveillance purposes, Solu's long-read pipeline appears to be a strong alternative to popular pipelines, and we are dropping the 'experimental' label for now. The pipeline's performance in our tests may be attributed to the use of Racon for polishing, which seemed to outperform Medaka at least for this limited dataset.

We will follow this up with a larger formal validation with more organisms and broader range of inputs. Stay tuned!

Get started for free