BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Soe, Seokjun | - |
dc.contributor.author | Park, Yoonjae | - |
dc.contributor.author | Chae, Heejoon | - |
dc.date.available | 2021-02-22T07:46:22Z | - |
dc.date.issued | 2018-12 | - |
dc.identifier.issn | 1471-2105 | - |
dc.identifier.uri | https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/4152 | - |
dc.description.abstract | BackgroundBisulfite sequencing is one of the major high-resolution DNA methylation measurement method. Due to the selective nucleotide conversion on unmethylated cytosines after treatment with sodium bisulfite, processing bisulfite-treated sequencing reads requires additional steps which need high computational demands. However, a dearth of efficient aligner that is designed for bisulfite-treated sequencing becomes a bottleneck of large-scale DNA methylome analyses.ResultsIn this study, we present a highly scalable, efficient, and load-balanced bisulfite aligner, BiSpark, which is designed for processing large volumes of bisulfite sequencing data. We implemented the BiSpark algorithm over the Apache Spark, a memory optimized distributed data processing framework, to achieve the maximum data parallel efficiency. The BiSpark algorithm is designed to support redistribution of imbalanced data to minimize delays on large-scale distributed environment.ConclusionsExperimental results on methylome datasets show that BiSpark significantly outperforms other state-of-the-art bisulfite sequencing aligners in terms of alignment speed and scalability with respect to dataset size and a number of computing nodes while providing highly consistent and comparable mapping results.AvailabilityThe implementation of BiSpark software package and source code is available at https://github.com/bhi-kimlab/BiSpark/. | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | BMC | - |
dc.title | BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data | - |
dc.type | Article | - |
dc.publisher.location | 영국 | - |
dc.identifier.doi | 10.1186/s12859-018-2498-2 | - |
dc.identifier.scopusid | 2-s2.0-85058225965 | - |
dc.identifier.wosid | 000452725300002 | - |
dc.identifier.bibliographicCitation | BMC BIOINFORMATICS, v.19, no.1 | - |
dc.citation.title | BMC BIOINFORMATICS | - |
dc.citation.volume | 19 | - |
dc.citation.number | 1 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Biochemistry & Molecular Biology | - |
dc.relation.journalResearchArea | Biotechnology & Applied Microbiology | - |
dc.relation.journalResearchArea | Mathematical & Computational Biology | - |
dc.relation.journalWebOfScienceCategory | Biochemical Research Methods | - |
dc.relation.journalWebOfScienceCategory | Biotechnology & Applied Microbiology | - |
dc.relation.journalWebOfScienceCategory | Mathematical & Computational Biology | - |
dc.subject.keywordPlus | ALIGNMENT | - |
dc.subject.keywordPlus | SEQ | - |
dc.subject.keywordAuthor | DNA methylation | - |
dc.subject.keywordAuthor | Bisulfite sequencing | - |
dc.subject.keywordAuthor | Alignment | - |
dc.subject.keywordAuthor | Apache Spark | - |
dc.identifier.url | https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2498-2 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127
Copyright©Sookmyung Women's University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.