Improving data workflow systems with cloud services and use of open data for bioinformatics research

dc.contributor.authorKarim, Md Rezaul
dc.contributor.authorMichel, Audrey
dc.contributor.authorZappa, Achille
dc.contributor.authorBaranov, Pavel V.
dc.contributor.authorSahay, Ratnesh
dc.contributor.authorRebholz-Schuhmann, Dietrich
dc.contributor.funderScience Foundation Irelanden
dc.date.accessioned2019-09-09T11:59:00Z
dc.date.available2019-09-09T11:59:00Z
dc.date.issued2017-04-16
dc.description.abstractData workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.en
dc.description.statusPeer revieweden
dc.description.versionPublished Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationKarim, M.R., Michel, A., Zappa, A., Baranov, P., Sahay, R. and Rebholz-Schuhmann, D., 2017. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Briefings in bioinformatics, 19(5),(15pp). DOI:10.1093/bib/bbx039en
dc.identifier.doi10.1093/bib/bbx039en
dc.identifier.eissn1477-4054
dc.identifier.endpage1050en
dc.identifier.issn1467-5463
dc.identifier.issued5en
dc.identifier.journaltitleBriefings in bioinformaticsen
dc.identifier.startpage1035en
dc.identifier.urihttps://hdl.handle.net/10468/8487
dc.identifier.volume19en
dc.language.isoenen
dc.publisherNLM (Medline)en
dc.relation.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/en
dc.relation.urihttps://academic.oup.com/bib/article/19/5/1035/3737318
dc.rights© The Author 2017. Published by Oxford University Pressen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectData workflow systemen
dc.subjectSemantic weben
dc.subjectLinked dataen
dc.subjectCloud computingen
dc.subjectGenome sequencingen
dc.subjectDrug discoveryen
dc.titleImproving data workflow systems with cloud services and use of open data for bioinformatics researchen
dc.typeArticle (peer-reviewed)en
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bbx039.pdf
Size:
982.64 KB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description: