If you ask a biologist where the future is heading in life science research, the answer beyond doubt would be omics. So, what is omics? Why do we just append omics to everything? How do scientists use omics to decipher the workings of the fascinating human machine?
“If human and chimp DNA is 98.8 percent the same, why are we so different? ”
The answer to that is just 1.2 percent of the difference is equal to around 35 million differences.
What is Omics?
Omics or Multi-omics is an informally coined terminology that refers to the study of information-rich areas in biology which ends with a suffix of the same term ‘-omics’. Members of this group include genomics, transcriptomics, proteomics, metabolomics, and phenomics to name a few. Omics allows scientists to look into life from a bigger picture. It is a relatively new area with a lot of possibilities to be explored and problems to be addressed. Omics collectively refers to technologies used to explore the role, structure, and the various types of molecules in a biological system. The impact of omics is mostly seen in medicine but has also recently been exploited in space biology research.
“Surprising to know we share 60% of our DNA with bananas, right? Who knew a mere 40% difference in DNA between a human and a banana would make so much of a difference and make humans such a complex organism?”
Human Genome Project(HGP)
Doesn't that ring a bell? The Human Genome Project (HGP) is considered to be one of the greatest scientific achievements in history. An international group of researchers embarked on this enlightening biological journey to examine the entire genome, the entire set of DNA of a few organisms. The HGP, which started in October 1990 and ended in April 2003, is best known for producing the very first sequence of the human genome. This feat gave scientists important insights into the human genetic code and has since accelerated the research in human biology and medical care. It has enabled the development of personalised medicine, where treatments can be altered to one’s genetic makeup. Researchers can now determine how a patient will respond to a certain drug using the genetic information helping in designing a better, effective and safer treatment. The mapping of the human genome has helped researchers to pinpoint the mutations or deviations from the normal set in many diseases like cystic fibrosis, cancer and Alzheimer’s to name a few. It has helped the biotechnology industries bloom and give rise to gems like gene therapy, CRISPR- Cas9 etc. which have given patients a ray of hope. How fascinating is it to know that you can tweak a gene to help your body fight against a disease? Silencing or upregulating a gene can help your body survive the ravages of a disease. This came as a blessing in disguise. But did we reach the endgame? Or are we only looking at the tip of the iceberg?
“Did you know that the entire information encoded in our DNA which is fundamentally the code of life can be fit into a standard CD-ROM, not that anybody uses it anymore?”
Yes, you read that right. The raw nucleotide sequence of DNA when encoded in a 2-bit binary value for the four nucleotides, takes up almost about 715 megabytes of storage. But in reality, to store genetic information after sequencing and assigning quality scores to each read, the file size usually sums up to 200 gigabytes. A text-based bioinformatic data format called FASTA is used to store amino acid or nucleotide sequences.. Here’s how a FASTA file looks,
Let’s look into some of the most commonly studied ‘omes’. You can jump to a specific ome of interest to learn in great detail.
Genomics: The cell’s hardware or the Life’s code.
To state it in simple terms, genomics refers to the information stored in the genetic material of an organism. All living things from tiny bacteria to large trees have a genome, and humans in particular have it as DNA. Genes are nothing but sections of DNA like beads on a string which are responsible for carrying out a specific function. Tiny changes in the sequence of genes called mutations can lead to a disease. Of the 100,000 estimated genes in humans, only 20,000 genes are identified and assigned a protein-coding function. This just opens the horizon for genomics research leading to immense impact in medicine and how we treat disease.
Genes are just a trailer and not the whole story. Genomics accounts for the entire protein-coding and non-coding sections of DNA while genetics focuses on genes, heredity and variation in living organisms. With the advancements in Next Generation Sequencing, exciting avenues are being opened in genomics research. The cost of sequencing the entire genome has seen a significant decrease from $1000 in 2014 to $600 at present. The cost is only going to decrease as advancements are being made in NGS. But there is a catch, as costs go down, sequence data will only rise exponentially. Will data analysis catch up with the speed of data being generated? This means we have to gear up to handle the data flood in the field of genomics which will eventually occur in the near future.
Transcriptomics - The raven that brings the message.
Though it's the gene that holds the information for life to sustain, it directly can’t perform any of the functions it is intended for. Proteins are the molecular machines which get the work done. Genes transcribe the information like a clue to make the proteins which carry out the function. Transcriptomics entails information on the structure and function of the gene transcripts, the RNAs. RNAs are genetic material that differs from DNA by a single nucleotide residue; thymine is replaced by its analogue, uracil. It is the RNA that is being used widely in diagnostic applications. We all have heard the term RT-PCR during the pandemic which is nothing but a diagnostic tool to quantity the amount of COVID-19 RNA present in our body.
RNAs are versatile and volatile; they can produce multiple protein copies and a variety of protein structures that can perform a range of functions. RNAs accomplish this task by something called post-transcriptional modifications. In certain diseases, when the gene responsible for the abnormal phenotype is not known, RNAs act as a guiding source. By determining the relative expression of an RNA in diseased conditions, the genes can be mapped and contribute to the treatment of such diseases. RNAs are studied by sequencing them just like DNA but since RNAs are single-stranded and unstable; they have to be converted to DNA molecules using specific enzymes called reverse transcriptases. Microarray experiments are carried out to determine the relative expression that is whether the gene is overexpressing or under expressing itself.
“Angelina Jolie, an American actress underwent a preventative double mastectomy when she underwent genetic testing and it reported a faulty BRCA1 gene that had an 87% risk of developing breast cancer.”
Proteomics - Proteins get it done.
Proteins are molecular machines which fundamentally perform the function that the gene codes for. While transcriptomics informs whether the gene is turned on or off, proteins once formed can modify their structure to carry out a range of functions. In 99% of diseases, proteins are usually the targets to cure them. When we are sick we take paracetamol which reduces the fever by binding to a protein that is responsible for causing the fever. It does so by reducing the activity of the protein or entirely stopping it from performing its function. Since, proteomics techniques face sensitivity issues, and interpreting the results can be difficult, genomics studies are carried out and also cheap.
Compared to proteins, genes can be easily detected and sequences as proteins are mobile structures and can be present anywhere in the cell. But with technology advancing, the gap is only becoming narrow as the days pass by. To study proteome in a tissue sample, scientists use mass spectrometry which breaks down the protein into smaller fragments and detects them from their characteristic charge-to-mass ratio.
With the use of advanced techniques like multi-omics, scientists can now examine living systems from a different perspective by integrating information from many biomolecular levels like proteins, DNA, RNA, metabolites and epigenetic markings. In biomedical research, multi-omics has been used to resolve many problems pertaining to discoveries of disease, drugs, and treatments. The identification and validation of novel drug targets for numerous diseases is one of the most important applications of multi-omics. Drug targets are molecules that can be can be tweaked to change the phenotype or state of the disease[Ivanisevic and Sewduth, 2023].
“99.9% of the genome is the same in all humans. It is that 0.1% difference that makes us who we are.”
Properties of Omics Data
Omics data is
- Multidimensional - Omics data is multidimensional meaning it has multiple columns/attributes/features. These features are complex covering the cellular conditions, profiles, physical parameters, etc.,
- High throughput - With advancements in next-generation sequencing technologies, biological molecules can be read in less time and cost than ever before. In the current scenario, researchers can sequence the entire genome for $1000 in less than a day.
- Heterogeneous - Omics data is highly heterogeneous with data being stored in different formats, structures, types and sources. Datasets within a database can be stored in various task-specific formats.
- Geographically Distributed - Different ethnic groups have different genetic profiles. People in India are more prone to diabetes than the Western countries. Omics data profiled in different geographical areas are entirely different and need to be analysed with the genetic variations in mind.
“Cheetahs are genetic clones of each other. This is because their ancestors almost went extinct during the last ice age. The remaining small group of wild cats interbred, and hence, we now have genetically identical cheetahs.”
Challenges with Omics data analysis
- Often data is obtained from a variety of sources which is of variable quality and makes the integration of the datasets difficult.
- Data infrastructure is another major issue in storing and managing large amounts of data.
- Data visualisation is crucial to interpret the information in omics data and current bioinformatics tools fall short to effectively visualise data.
- The diversity of omics data makes it difficult to preprocess and analyse. Researchers spend more time on data wrangling than spending time on extraction of novel insights.
- Most of the time in omics data, the number of attributes exceeds the number of samples which leads to data sparsity.
- Omics data have a lot of missing values. One possible reason can be attributed to the fact that when tests get expensive to obtain data, the samples are fragmented leaving only data in fragments.
What is GenAI?
Generative AI is a subset of AI that learns patterns on the training data given as input and generates various forms of task-specific content for a given prompt. Since the advent of OpenAI’s ChatGPT, the use of generative AI has been profound in the field of biological sciences.
“Did you know that GenAI has been around since the 1960s? it's not new”
How are traditional AI and Generative AI different?
Traditional AI or discriminative AI concentrates on the recognition of patterns and makes data-driven decisions. Generally, these systems are designed in such a way that they can group data, predict results or make decisions. A typical AI model can determine if an email is spam or not based on past data. Pretty basic!
Tom Freston rightly said, “Innovation is taking two things that exist and putting them together in a new way”. Similarly, Generative AI, aka GenAI uses the patterns observed in the data to create new content. Instead of simply grouping or predicting, it produces new data which is similar to the input data. Amusingly, it can write, compose music or draw.
GenAI’s application in Omics technology
By analysing large data from omics studies, GenAI can:
- Generate plots to visualise omics data.
- Determine biomarker signature profiles from RNA-seq data.
- Assist in functional genomics where function is associated with newly discovered genes.
- Analyse significant differences in metabolite profiles and identify disease-causing pathways.
- Assist in personalised treatment and care for patients from different ethnic groups and races.
- GenAI offers an effective way to integrate the information spread across multiple datasets of various formats.
In the ever-changing landscape of life sciences, clinical investigators working in proteomics or genomics are provided with an option to lead the drug discovery process from the protracted conventional omics approaches to AI-powered solutions. Agilisium’s Gene Inspector leverages the abilities of GenAI to predict accurate gene targets for drug action which is one of the main factors leading to a 90% failure rate in clinical trials. The research study is organised and the target identification is concise. With these, the loopholes in target selection or poor choices made in target selection are covered.
Life is made easy, thanks to Gene Inspector!