We combined a machine learning algorithm with knowledge gleaned from hundreds of biological experiments to develop a technique that allows biomedical researchers to figure out the functions of the proteins that turn genes on and off in cells, called transcription factors. This knowledge could make it easier to develop drugs for a wide range of diseases.
Early on during the COVID-19 pandemic, scientists who worked out the genetic code of the RNA molecules of cells in the lungs and intestines found that only a small group of cells in these organs were most vulnerable to being infected by the SARS-CoV-2 virus. That allowed researchers to focus on blocking the virus’s ability to enter these cells. Our technique could make it easier for researchers to find this kind of information.
The biological knowledge we work with comes from this kind of RNA sequencing, which gives researchers a snapshot of the hundreds of thousands of RNA molecules in a cell as they are being translated into proteins. A widely praised machine learning tool, the Seurat analysis platform, has helped researchers all across the world discover new cell populations in healthy and diseased organs. This machine learning tool processes data from single-cell RNA sequencing without any information ahead of time about how these genes function and relate to each other.