bacLIFE: How AI distinguishes the good from the bad and the ugly


Image: © (CC BY-ND 4.0:

by Julian Voet

A study published in Nature last month described how researchers used artificial intelligence to predict the "lifestyle" of certain bacteria (meaning whether they are beneficial or harmful). For this, the researchers created an algorithm named "bacLIFE", which compares the genome of a species with unknown lifestyle, to that of similar species with known lifestyles.

Bacteria come in many shapes and sizes, and they have different lifestyles in relation to their plant or animal host. In plants, some bacteria increase plant growth and survival (e.g. by helping with nutrient uptake or protecting against insects) while others are harmful to plants (e.g. by producing toxins that destroy plant cells). For many bacteria, the genes that determine their lifestyle are unknown. It is for that reason that the researchers developed bacLIFE: a tool that predicts which genes determine the lifestyle of bacteria using artificial intelligence. bacLIFE can be used by genetic scientists everywhere and applied in for example agriculture to estimate the risk of bacterial infections.

How the bacLIFE algorithm works

bacLIFE consists of three steps, with a bacterial genome code as input: in the clustering step (i), the algorithm dissects functional genes from the genome code and classifies them by function. The result is a database of genes extracted from the code. In the lifestyle prediction step (ii), an AI model trained in known bacterial lifestyles predicts the lifestyle of the bacteria using the database from step i. In the analytical step (iii), the prediction is displayed in user-friendly graphs and can be used as a starting point to find new genes related to lifestyle.

Testing the bacLIFE algorithm with case studies

To test the accuracy of bacLIFE's predictions, the researchers performed a case study with the two bacterial families Burkholderia and Pseudomonas. Each family contains multiple species that are classified in one of three lifestyles: environmental (beneficial), animal pathogen, and plant pathogen.

The researchers tested bacLIFE with the "leave one species out" method. Here, they excluded one Burkholderia and two Pseudomonas species from bacLIFE. For all three species, the lifestyle was known. The researchers ran the bacLIFE algorithm several times with the excluded species to test whether the model correctly predicted their lifestyles. For Burkholderia, bacLIFE's prediction was correct in 90% of all cases, and for the Pseudomonas species the accuracy was 70%.

Another method that the researchers applied was a "Principal Coordinate Analysis" (PCoA). Here, the genomes of all species with known lifestyles were categorised by similarity. They found that each lifestyle formed their own clusters. The researchers let bacLIFE make predictions for species with unknown lifestyles, and added them to the PCoA. They found that the predicted lifestyles neatly overlapped with the known clusters, at an average accuracy of 85%.

Finding genes associated with lifestyle
With the mystery of lifestyles resolved, the researchers set out to find the genes that are responible for these lifestyles. First, they tested whether bacLIFE can accurately predict known lifestyle-genes in one Burkholderia and one Pseudomonas species. In total, bacLIFE predicted 786 genes in Burkholderia and 377 in Pseudomonas. About 70% of these genes were known to be involved in plant toxicity. In Burkholderia, they found that some of these genes corresponded with plant toxins, structures involved in releasing toxins, and signaling molecules to communicate with other bacteria ("quorum sensing"). In Pseudomonas, they found several genes that corresponded with toxin-release and copper resistance (copper is toxic to Pseudomonas). There is even an overlap of 1-3% between Burkholderia and Pseudomonas; these genes correspond with sugar transporters, which are needed to draw sugars from plants.

The remaining 30% of predicted lifestyle-genes had no known function. Out of these genes, the researchers selected 13 for lab-testing. In the lab experiment, they tested how they affected the growth of young rice plants. They found that 5 out of 13 genes decreased plant growth. 3 of these 5 genes were predicted with high accuracy by bacLIFE. The researchers noted that these 3 genes were located near many other lifestyle-genes, and implied that locations rich in lifestyle-genes are a promising lead for future reseach.

The future of bacLIFE

The development of bacLIFE was done by a consort of researchers from Leiden University, the Netherlands Institute of Ecology, and the University of Málaga. At this time, the consort is working to expand bacLIFE to also include the genomes of fungi and yeasts as potential input.