The mission of our lab is to develop deep learning, machine learning, data mining, and evolutionary algorithms for knowledge discovery and innovation in bioinformatics, genomics, drug design, material informatics, and engineering designs. We especially gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU and Tesla K40 GPU for our research. Our lab is also equipped with the Titan V GPU which touts 5120 cores, 12G memory, and 110 deep learning TeraFlops.


Sponsors of our research


National Science Fundation

NVIDIA GPU Donation

South Carolina Department of Transportation

Reports on the grants that we work on:

NSF CAREER Award: Computational Analysis and Prediction of Genome-Wide Protein Targeting Signals and Localization
NSF $20 million MADE in SC: USC and SC Institutions of Higher Education Receive $20 Million NSF Grant to Enhance Advanced Materials and Manufacturing Research.
South Carolina Researchers Receive $3.1 Million NIH Grant to Reduce HIV Infections and Improve HIV/AIDS Healthcare in South Carolina



Currently, we are working on these problems:

Deep learning and its application in Audio Processing

Audible and inaudible sounds are ubiquitous in our daily life and in manufacturing. There are several interesting problems in modern audio processing: 1) how to separate noise from human vo ice to help listen better; 2) How to detect rare sound events from a recording; 3) how to detect the scene environment based on sound recording; 4) How to detect the emotions from spoken la nguage recording; how to refine voice recording to get HD recordings; how to diagnose faults of manufacturing equipments using audible and ultrasonic sound signals. These are all the interesting questions that we are working on using the latest deep learning techniques, which has led to great success in computer vision.

Bioinformatics: Protein-ligand binding prediction

Accurate determination of protein–ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein–ligand docking. A number of scoring functions have been proposed for the prediction of protein–ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation.

Material Informatics: Material Property Prediction and Phase Mapping of High-throughput XRD data

Progress in materials science is strongly interwined with the progress of human civilization. Instead of the traditional trial-and-error experiments, modern material scientists use a combination of high-throughput calculations and machine learning to discover next-generation materials for energy applications, including batteries, fuel cells and LEDs. A team led by engineers at the University of California San Diego has used data mining and computational tools to discover a new phosphor material for white LEDs that is inexpensive and easy to make. Researchers built prototype white LED light bulbs using the new phosphor. The prototypes exhibited better color quality than many commercial LEDs currently on the market. Computational materials science is a great way to quickly sift through all possible compositions of materials to identify potential new compositions that could be used for a variety of applications Materials informatics seeks to establish structure¨Cproperty relationships in a high-throughput, statistically robust, and physically meaningful manner. Researchers are seeking connections in materials datasets to find new compounds, make performance predictions, accelerate computational model development, and gain new insights from characterization techniques. Although great strides have been made, the field of materials informatics is set to experience an even greater explosion of data with more complex models being developed and increasing emphasis on national and global initiatives related to the materials genome. To these ends, scientists are increasingly utilizing machine learning, which involves the study and construction of algorithms that can learn from and make predictions on data without explicit human construction. Those algorithms can be as simple as an ordinary least squares fit to a data set or as complicated as the neural networks used by Google and Facebook to connect our social circles. Highthroughput material experiments from synchroton machines generate hundreds of samples every day and how to map the samples into phase diagram and figure out the phases synthesized is a bottleneck for material discovery. We develop material informatics and deep learning algorithms to data mine such data sets. See our algorithms Autophase, GPhase. etc.

Breast Cancer Histological Image Analysis

We develop algorithms for automated analysis of breast cancer histopathology images. This research area has become particularly relevant with the advent of whole slide imaging (WSI) scanners, which can perform cost-effective and high-throughput histopathology slide digitization, and which aim at replacing the optical microscope as the primary tool used by pathologist. Breast cancer is the most prevalent form of cancers among women, and image analysis methods that target this disease have a huge potential to reduce the workload in a typical pathology lab and to improve the quality of the interpretation. We need to know tissue preparation, staining and slide digitization processes, different image processing techniques and applications, ranging from analysis of tissue staining to computer-aided diagnosis, and prognosis of breast cancer patients.

Protein localization prediction

The question is how to integrate different sources of information for precise prediction of protein target locations.


Protein Targeting Motif analysis

How can we map out all the protein targeting signals to essentially decode the targeting "zip codes"?

We are developping novel data mining algorithms for this problem.


Heme Protein binding residue prediction

We are developping computational algorithm for predicting heme protein binding residues involved in protein-ligand interaction. Both sequence and structural information are used. The web server can be accessed at hemeBIND


Automated Design of Scoring functions for Virtual screening

There are more than 50 scoring functions available for virtual screening in structure-based drug design. We are asking :Is there any way that we can do better by utilizing data mining and machine learning tools?
Milestone: Guoyu Lu (Summer 2008) has developped a software pipeline that allows us to do large-scale protein-ligand or protein-decoy docking and do scoring for ranking candidate drug molecules. This system can be run on Linux cluster efficiently. Potential collaborators are encouraged to contact us jianjunh@cse.sc.edu for potential projects. At the same time, we are developping novel machine learning algorithms to improve the ranking or scoring.


Computational Synthesis of Systems and Materials Using Genetic Programming

We are developing evolutionary algorithms for automated synthesis of engineering systems and materials based on genetic programming and simulation software. The purpose is to convert raw computational cycles into innovations and inventions.


Evolutionary algorithms, Genetic programming, Genetic Algorithms

We are interested in developing sustainable evolutionary algorithms that can keep the search capability without convergence to local optima.


Text Mining for Knowledge Discovery

How do we build a system that can automatically monitor the literature and build a domain knowledge for references by e.g. material scientists?


Collaborations with the following Labs

USC Normal Molecular Microbial Ecology
USC Toxicogenomics lab
USC biology department, Sean Place Lab on Environmental genomics
SC Colon Cancer Center

Sponsors of our research


National Science Fundation

NVIDIA GPU Donation

South Carolina Department of Transportation