The mission of our lab is to develop deep learning, machine learning, data mining, and evolutionary algorithms for knowledge discovery and innovation in bioinformatics, genomics, drug design, material informatics, and engineering designs. We especially gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU and Tesla K40 GPU for our research

Sponsors of our research

National Science Fundation


South Carolina Department of Transportation

Currently, we are working on these problems:

Protein-ligand binding prediction

Accurate determination of protein–ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein–ligand docking. A number of scoring functions have been proposed for the prediction of protein–ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation.

Material Informatics: Phase Mapping of High-throughput XRD data

Highthroughput material experiments from synchroton machines generate hundreds of samples every day and how to map the samples into phase diagram and figure out the phases synthesized is a bottleneck for material discovery. We develop material informatics algorithms to data mine such data sets. See our algorithms Autophase, GPhase. etc.

Breast Cancer Histological Image Analysis

We develop algorithms for automated analysis of breast cancer histopathology images. This research area has become particularly relevant with the advent of whole slide imaging (WSI) scanners, which can perform cost-effective and high-throughput histopathology slide digitization, and which aim at replacing the optical microscope as the primary tool used by pathologist. Breast cancer is the most prevalent form of cancers among women, and image analysis methods that target this disease have a huge potential to reduce the workload in a typical pathology lab and to improve the quality of the interpretation. We need to know tissue preparation, staining and slide digitization processes, different image processing techniques and applications, ranging from analysis of tissue staining to computer-aided diagnosis, and prognosis of breast cancer patients.

Protein localization prediction

The question is how to integrate different sources of information for precise prediction of protein target locations.

Protein Targeting Motif analysis

How can we map out all the protein targeting signals to essentially decode the targeting "zip codes"?

We are developping novel data mining algorithms for this problem.

Heme Protein binding residue prediction

We are developping computational algorithm for predicting heme protein binding residues involved in protein-ligand interaction. Both sequence and structural information are used. The web server can be accessed at hemeBIND

Automated Design of Scoring functions for Virtual screening

There are more than 50 scoring functions available for virtual screening in structure-based drug design. We are asking :Is there any way that we can do better by utilizing data mining and machine learning tools?
Milestone: Guoyu Lu (Summer 2008) has developped a software pipeline that allows us to do large-scale protein-ligand or protein-decoy docking and do scoring for ranking candidate drug molecules. This system can be run on Linux cluster efficiently. Potential collaborators are encouraged to contact us for potential projects. At the same time, we are developping novel machine learning algorithms to improve the ranking or scoring.

Computational Synthesis of Systems and Materials Using Genetic Programming

We are developing evolutionary algorithms for automated synthesis of engineering systems and materials based on genetic programming and simulation software. The purpose is to convert raw computational cycles into innovations and inventions.

Evolutionary algorithms, Genetic programming, Genetic Algorithms

We are interested in developing sustainable evolutionary algorithms that can keep the search capability without convergence to local optima.

Text Mining for Knowledge Discovery

How do we build a system that can automatically monitor the literature and build a domain knowledge for references by e.g. material scientists?

Collaborations with the following Labs

USC Normal Molecular Microbial Ecology
USC Toxicogenomics lab
USC biology department, Sean Place Lab on Environmental genomics
SC Colon Cancer Center

Sponsors of our research

National Science Fundation


South Carolina Department of Transportation