Epigenomics is the study of the complete set of epigenetic modifications on DNA such as DNA methylation. DNA methylation is the process by which a methyl group is added to DNA. Data represents the methylated-cytosine sites identified by high-throughput sequencing technologies. The methylation patterns of the experimental group can be analyzed by comparing the control group with the experimental group.
The genomic profile is the main characteristic of individuals that decides phenotypes. Mutations like SNP (Single Nucleotide Polymorphism), CNV (Copy Number Variation) are detected by the sequencing techniques and represented as the binary vectors. These feature can be used to diagnose whether the sample will get a disease or predict the prognosis of tumor.
RNA is the main carrier of genetic information that is responsible for the process of converting DNA into the protein. The expression strength represents the density of sequencing reads corresponding to each sample. Therefore, data means the expression level of each gene to each sample. Differentially expressed gene can be analyzed by comparing the control group with the experimental group.
The metabolome represents the complete set of metabolites in a biological cell, tissue, organ or organism, which are the end products of cellular processes. The data represents the amount of metabolites obtained by liquid chromatography mass spectrometry or NMR spectroscopy. Biomarker metabolites can be discovered by comparing the control group with the experimental group.
As biological processes in a cell is a densly connected large map, it can be represented as the graph. The interactome is the graph-based data that represents those biological process; interactions between proteins, interactions between a protein and a compound. Furthermore the relationship between a disease and a gene, a drug can be good example of interactome. Those interactome helps to analyze the result from other type of data. As verified relations in the interactome are the small parts of real biological process, predicting unverified interactomes is another important problem in the computational biology field.
A drug is a chemical substance(compound). Because compound (molecule) does not have a start point and endpoint, to represent compound in computer, SMILES or graph are often used. SMILES(simplified molecular-input line-entry system) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. The graph is a set of vertices and edges where the atom is represented as node and bond as the edge.
A human has DNAs, which contain genetic information. DNAs are transcribed to RNA. RNAs are translated into protein. Proteins consist of chains of amino acid residues. To represent proteins in the computer, sequence information such as AAC(Amino Acid Composition) and protein domain is used.