Bio-informatics in Drug Development

BioinformaticsBioinformatics – Introduction
The genomic era has seen a massive explosion in the amount of biological information available due to huge advances in the fields of molecular biology and genomics.
Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyze and merge biological data.
Bioinformatics – the genomic era
The genomic era has seen a massive explosion in the amount of biological information available due to huge advances in the fields of molecular biology and genomics.
Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyse and merge biological data.
Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. The ultimate goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of data and obtain a clearer insight into the fundamental biology of organisms. This new knowledge could have profound impacts on fields as varied as human health, agriculture, the environment, energy and biotechnology.
Why is bioinformatics important?
The greatest challenge facing the molecular biology community today is to make sense of the wealth of data that has been produced by the genome sequencing projects. Traditionally, molecular biology research was carried out entirely at the experimental laboratory bench but the huge increase in the scale of data being produced in this genomic era has seen a need to incorporate computers into this research process.
Sequence generation, and its subsequent storage, interpretation and analysis are entirely computer dependent tasks. However, the molecular biology of an organism is a very complex issue with research being carried out at different levels including the genome, proteome, transcriptome and metabalome levels. Following on from the explosion in volume of genomic data, similar increase in data have been observed in the fields of proteomics, transcriptomics and metabalomics.
The first challenge facing the bioinformatics community today is the intelligent and efficient storage of this mass of data. It is then their responsibility to provide easy and reliable access to this data. The data itself is meaningless before analysis and the sheer volume present makes it impossible for even a trained biologist to begin to interpret it manually. Therefore, incisive computer tools must be developed to allow the extraction of meaningful biological information.

There are three central biological processes around which bioinformatics tools must be developed:
• DNA sequence determines protein sequence
• Protein sequence determines protein structure
• Protein structure determines protein function
The integration of information learned about these key biological processes should allow us to achieve the long term goal of the complete understanding of the biology of organisms.
Biological databases
Biological databases are archives of consistent data that are stored in a uniform and efficient manner. These databases contain data from a broad spectrum of molecular biology areas. Primary or archived databases contain information and annotation of DNA and protein sequences, DNA and protein structures and DNA and protein expression profiles.
Secondary or derived databases are so called because they contain the results of analysis on the primary resources including information on sequence patterns or motifs, variants and mutations and evolutionary relationships. Information from the literature is contained in bibliographic databases, such as Medline.
It is essential that these databases are easily accessible and that an intuitive query system is provided to allow researchers to obtain very specific information on a particular biological subject. The data should be provided in a clear, consistent manner with some visualization tools to aid biological interpretation.
Specialist databases for particular subjects have been set-up for example EMBL database for nucleotide sequence data, UniProtKB/Swiss-Prot protein database and PDB a 3D protein structure database.
Scientists also need to be able to integrate the information obtained from the underlying heterogeneous databases in a sensible manner in order to be able to get a clear overview of their biological subject. SRS (Sequence Retrieval System) is a powerful, querying tool provided by the EBI that links information from more than 150 heterogeneous resources.
Biological applications

Once all of the biological data is stored consistently and is easily available to the scientific community, the requirement is then to provide methods for extracting the meaningful information from the mass of data. Bioinformatics tools are software programs that are designed to carry out this analysis step.

Factors that must be taken into consideration when designing these tools are:
• The end user (the biologist) may not be a frequent user of computer technology
• These software tools must be made available over the internet given the global distribution of the scientific research community
The EBI provides a wide range of biological data analysis tools that fall into the following four major categories:
• Similarity Searching Tools
• Protein Function Analysis
• Structural Analysis
• Sequence Analysis
Similarity Searching Tools
Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured while their homology is a case of being either true of false. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and function have been elucidated.
Protein Function Analysis
This group of programs allow you to compare your protein sequence to the secondary (or derived) protein databases that contain information on motifs, signatures and protein domains. Highly significant hits against these different pattern databases allow you to approximate the biochemical function of your query protein.
Structural Analysis
This set of tools allows you to compare structures with the known structure databases. The function of a protein is more directly a consequence of its structure rather than its sequence with structural homologs tending to share functions. The determination of a protein’s 2D/3D structure is crucial in the study of its function.
Sequence Analysis
This set of tools allows you to carry out further, more detailed analysis on your query sequence including evolutionary analysis, identification of mutations, hydropathy regions, CpG islands and compositional biases. The identification of these and other biological properties are all clues that aid the search to elucidate the specific function of your sequence.
Real world applications of bioinformatics

The science of bioinformatics has many beneficial uses in the modern day world.

1. Molecular medicine
The human genome will have profound effects on the fields of biomedical research and clinical medicine. Every disease has a genetic component. This may be inherited (as is the case with an estimated 3000-4000 hereditary disease including Cystic Fibrosis and Huntingtons disease) or a result of the body’s response to an environmental stress which causes alterations in the genome (eg. cancers, heart disease, diabetes..).
The completion of the human genome means that we can search for the genes directly associated with different diseases and begin to understand the molecular basis of these diseases more clearly. This new knowledge of the molecular mechanisms of disease will enable better treatments, cures and even preventative tests to be developed.
2. Microbial genome applications
Microorganisms are ubiquitous, that is they are found everywhere. They have been found surviving and thriving in extremes of heat, cold, radiation, salt, acidity and pressure. They are present in the environment, our bodies, the air, food and water.
Traditionally, use has been made of a variety of microbial properties in the baking, brewing and food industries. The arrival of the complete genome sequences and their potential to provide a greater insight into the microbial world and its capacities could have broad and far reaching implications for environment, health, energy and industrial applications. For these reasons, in 1994, the US Department of Energy (DOE) initiated the MGP (Microbial Genome Project) to sequence genomes of bacteria useful in energy production, environmental cleanup, industrial processing and toxic waste reduction.
By studying the genetic material of these organisms, scientists can begin to understand these microbes at a very fundamental level and isolate the genes that give them their unique abilities to survive under extreme conditions.

3. Agriculture
The sequencing of the genomes of plants and animals should have enormous benefits for the agricultural community. Bioinformatics tools can be used to search for the genes within these genomes and to elucidate their functions. This specific genetic knowledge could then be used to produce stronger, more drought, disease and insect resistant crops and improve the quality of livestock making them healthier, more disease resistant and more productive.
4. Animals
Sequencing projects of many farm animals including cows, pigs and sheep are now well under way in the hope that a better understanding of the biology of these organisms will have huge impacts for improving the production and health of livestock and ultimately have benefits for human nutrition.
The following articles contain more information:
• Ensembl genome browser
• animal databases at the Roslin Institute
5. Comparative studies

Analyzing and comparing the genetic material of different species is an important method for studying the functions of genes, the mechanisms of inherited diseases and species evolution. Bioinformatics tools can be used to make comparisons between the numbers, locations and biochemical functions of genes in different organisms.

Organisms that are suitable for use in experimental research are termed model organisms. They have a number of properties that make them ideal for research purposes including short life spans, rapid reproduction, being easy to handle, inexpensive and they can be manipulated at the genetic level.An example of a human model organism is the mouse. Mouse and human are very closely related (>98%) and for the most part we see a one to one correspondence between genes in the two species. Manipulation of the mouse at the molecular level and genome comparisons between the two species can and is revealing detailed information on the functions of human genes, the evolutionary relationship between the two species and the molecular mechanisms of many human diseases.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s