Parallel Protein Classification with IBM BigInsights

Büchi, Christof and Mathys, Susanne (2013) Parallel Protein Classification with IBM BigInsights. Student Research Project thesis, HSR Hochschule für Technik Rapperswil.

[img]
Preview
Text
semesterThesisBuechiMathys.pdf - Supplemental Material

Download (1MB) | Preview

Abstract

Big Data is an expanding topic in information technology based on the huge collection of data which is available today on IT systems all over the world. Processing huge amounts of large files and analyzing unstructured data in real time could bring advantages for institutions or enterprise which store a large volume of generated data from their transactions. Dealing with the rapid growth of data and analyzing it is crossing the boundaries of the given IT infrastructures. Google and Yahoo! have introduced their own way how to handle such datasets. A completely new architecture beyond well-known established tools and principles is required to store massive data efficiently in storage and process them with minimal overhead. Big Data systems and frameworks such as IBM BigInsights with Hadoop provide a distributed faulttolerant file system running on commodity hardware. They also allow writing custom applications in Java based on the MapReduce principle. How difficult would it be to perform classification with a given single processing application on a Big Data system? During our research we wanted to show that it is as simple as setting up a cluster and running the tool out of a bash script that is used within a Hadoop streaming job. We took a look at the overhead of using such a complex framework for processing simple applications in a parallel manner. We also had a scope to the scale out characteristics of the cluster size.

Item Type: Thesis (Student Research Project)
Subjects: Area of Application > Industry
Area of Application > Data Mining
Area of Application > Healthcare, Medical Sector
Metatags > ITA (Institute for Internet Technologies and Applications)
Divisions: Bachelor of Science FHO in Informatik > Student Research Project
Creators:
CreatorsEmail
Büchi, ChristofUNSPECIFIED
Mathys, SusanneUNSPECIFIED
Contributors:
ContributionNameEmail
Thesis advisorJoller, JosefUNSPECIFIED
Thesis advisorKienzler, RomeoUNSPECIFIED
Funders: IBM Switzerland
Depositing User: HSR Deposit User
Date Deposited: 23 Jul 2013 09:20
Last Modified: 23 Jul 2013 09:20
URI: http://eprints.hsr.ch/id/eprint/291

Actions (login required)

View Item View Item