Home      Log In      Contacts      FAQs      INSTICC Portal


The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.

Big Data Analytics in Biomedical Informatics
Lecturer(s): Dr. Hesham Ali, University of Nebraska at Omaha, United States
Estimated Session Time: 3 hours

Dr. Hesham Ali
University of Nebraska at Omaha
United States

Big Data Analytics in Biomedical Informatics

The availability of massive medical and biological data in recent years has required an associated increase in the scale and sophistication of the computational systems and intelligent tools to enable biomedical researchers to take full advantage of such available data. In addition, the high degree of heterogeneity associated with the available data continues to represent great challenges as well as unlimited opportunities in biomedical research. Developing innovative data integration and mining techniques along with clever parallel computational methods to implement them will surely play an important role in efficiently extracting useful knowledge from the raw data currently available. In particular, the use of graph modeling and network analysis as the backbone of the big data analytics algorithms will be critical in developing data-driven decision support systems in the next generation of biomedical research. This tutorial focuses on the development of innovative big data analytics tools using graph modeling and network analysis tools along with how to effectively utilize HPC and private/cloud computing in implementing the proposed tools. Case studies illustrating how the proposed tools are used to analyze data associated with infectious diseases that led to new biological discoveries will also be presented. The tutorial will also address several important issues associated with big data analytics in the medical domain such as the need to maintain the necessary level of privacy for executing computationally-intensive biomedical applications and the need to implement the proposed tools in an energy-aware environment.

Big Data Analytics, Bioinformatics, Health Informatics, Data Integration Tools, Network models, Data-Driven Biomedical Research, Cloud Computing.

Aims and Learning Objectives
The field of Biomedical Informatics has been attracting a lot of attention in recent years. The massive size of the current available biological and medical databases and its high rate of growth have a great influence on the types of research currently conducted and researchers are focusing more than ever to maximize the use of these databases. Hence, it would be of great advantage for researchers to utilize the information stored in the available databases to extract new information as well as to understand various biological and medical phenomena. In addition, from the IT point-of-view, the problem of efficiently collecting, sharing, mining and analyzing the wealth of information available in a growing set of the biological and clinical data has common roots in many IT applications. This is particularly critical in managing biological and clinical data since relevant data is available in different shapes and forms, and hence, employing all available data to extract meaningful properties is an enormous task. Heterogeneous data, obtained from microarrays, high throughput sequencers, mass spectrometry experiments and clinical records, can all be used to find potential correlations between genes/proteins and the susceptibility to have a particular disease. The proposed tutorial will address these issues with a particular focus on the following objectives:
  • 1- Provide an overview of the exciting disciplines of Biomedical Informatics, including medical, public health and bio informatics with a focus on the interdisciplinary nature of these fields of study.
  • 2- Introduce the main computational problems in biomedical research with a focus on data collecting, integration and analysis related problems, then survey the current available algorithmic tools and address the advantages and the shortcoming of each tool.
  • 3- Introduce the audience to various big data analytics tools. Such tools are critical to leverage data collected from different resources to produce useful information that can further advance biomedical research and has the potential lead to new discoveries directly related to patient care.
Target Audience
The tutorial is intended primarily for computational scientists who are interested in Biomedical Research and the impact of high performance computing in advancing Biomedical Informatics. Bio-scientists with some background in computational concepts represent another group of intended audience.

Detailed Outline
The proposed tutorial is designed for a half day format and is divided into two parts. The first part covers the introduction, the background and an overview of key problems, algorithms and current tools in the area of Biomedical Informatics. The first part is covered in points 1-3 below. The second part focuses on introducing the audience to models for integrating HPC systems in Biomedical research with a focus on the concept of next generation big data analysis and integration tools. The use, and then a focus on two specific case studies related to efficient utilization of HPC in biomedical research will be covered in details. A demo of how network models can be used for biological discoveries will be used to illustrate the power of advanced data analytics tools. This part is covered in points 4-6 below.
  • 1. Introduction to Biomedical Informatics - Brief discussion on the various aspects of Biomedical Informatics that include Bioinformatics, Medical Informatics, Public Health Informatics, and Biomedical Imaging.
  • 2. Background – The Bioscience aspect and the computational perspective, the need for efficient HPC models for addressing key problems in Biomedical Informatics.
  • 3. Biomedical Informatics now – current state of the emerging discipline and overview of key Biomedical Research problems, plus an overview of selected current, first generation, data analysis tools.
  • 4. Next Generation Big Data Integration and Analysis Tools: Introduction to Intelligent, Collaborative and Dynamic Analytics Tools with a focus on using graph modeling and network analysis tools for analyzing big and heterogeneous biological/medical data.
  • 5. High Performance Computing (HPC) and cloud computing in Biomedical Informatics Research: current practices, pros and cons. A focus on HPC and big data integration and analysis tools. Basic analysis on security and privacy issues as related to private and public clouds will also be introduced.
  • 6. Advanced Big Data Analytics using biological networks –Case Studies - Correlation Networks and the identification of genes and cellular systems associated with various infectious diseases.
A demo of how the network-based tools led to new biological discoveries will be also presented.

3 hours

Biography of Dr. Hesham Ali
Hesham H. Ali is a Professor of Computer Science and the Lee and Wilma Seaman Distinguished Dean of the College of Information Science and Technology (IS&T), at the University of Nebraska at Omaha (UNO). He currently serves as the director of the UNO Bioinformatics Core Facility that supports a large number of biomedical research projects in Nebraska. He has published numerous articles in various IT areas including scheduling, distributed systems, data analytics, wireless networks, and Bioinformatics. He has also published two books in scheduling and graph algorithms, and several book chapters in Bioinformatics. He is currently serving as the PI or Co-PI of several projects funded by NSF, NIH and Nebraska Research Initiative (NRI) in the areas of data analytics, wireless networks and Bioinformatics. He has been leading a Research Group at UNO that focuses on developing innovative computational approaches to classify biological organisms and analyze big bioinformatics data. The research group is currently developing several next generation data analysis tools for mining various types of large-scale biological data. This includes the development of new graph theoretic models for assembling short reads obtained from high throughput instruments, as well as employing a novel correlation networks approach for analyzing large heterogeneous biological data associated with various biomedical research areas, particularly projects associated with aging and infectious diseases. He has also been leading two funded projects for developing secure and energy-aware wireless infrastructure to address tracking and monitoring problems in medical environments, particularly to study mobility profiling for healthcare research.