- 
              ANNOVAR
            ANNOVAR is an efficient software tool to utilize update-to-date information to functionally
            annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as
            well as mouse, worm, fly, yeast and many others). Given a list of variants with
            chromosome, start position, end position, reference nucleotide and observed
            nucleotides, ANNOVAR can perform: (i) Gene-based annotation: identify whether
            SNPs or CNVs cause protein coding changes and the amino acids that are affected.
            (ii) Region-based annotations: identify variants in specific genomic regions, for
            example, conserved regions among 44 species, predicted transcription factor binding
            sites, segmental duplication regions, GWAS hits, database of genomic variants, DNAse
            I hypersensitivity sites, ENCODE H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks,
            RNA-Seq peaks, or many other annotations on genomic intervals. (iii) Filter-based
            annotation: identify variants that are reported in dbSNP, identify the subset
            of common SNPs (MAF>1%) in the 1000 Genome Project, identify subset of non-synonymous
            SNPs with SIFT score>0.05, find intergenic variants with GERP++ score>2, or many
            other annotations on specific mutations.
          
           - 
              bedtools
            Collectively, the bedtools utilities are a swiss-army knife of tools for
            a wide-range of genomics analysis tasks. The most widely-used tools enable
            genome arithmetics: that is, set theory on the genome. For example, bedtools
            allows one to intersect, merge, count, complement, and shuffle genomic
            intervals from multiple files in widely-used genomic file formats such
            as BAM, BED, GFF/GTF, and VCF.
          
           - 
              Bowtie
            Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short
            DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp
            reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep
            its memory footprint small: typically about 2.2 GB for the human genome
            (2.9 GB for paired-end).  
          
           - 
              Circos
            Circos is a software package for visualizing data and information. It
            visualizes data in a circular layout for exploring relationships between
            objects or positions. Circos creates publication-quality infographics and
            illustrations with a high data-to-ink ratio, layered data and symmetries.
          
           - 
              Cluster 3.0
            Cluster 3.0 is an implementation of k-means clustering, hierarchical clustering and self-organizing
            maps in a single multi-purpose open-source library of C routines, callable
            from other C and C++ programs. This library is an improved version of
            Michael Eisen's well-known Cluster program for Windows, Mac OS X and
            Linux/Unix. Additionally a Python and a Perl interface to the C Clustering
            Library is implemented to combine the flexibility of a scripting language
            with the speed of C.
          
           - 
              DAVID
            DAVID is able to extract biological features and meanings associated with large gene lists.
            DAVID is able to handle any type of gene list, no matter which genomic platform or software
            package generated them. DAVID systematically maps a large number of interesting genes in a
            list to the associated biological annotation (e.g., gene ontology terms), and then
            statistically highlights the most overrepresented (enriched) biological annotation out
            of thousands of linked terms and contents. 
          
           - 
              FANMOD
            FANMOD is a tool for fast network motif detection. It relies on recently developed
            algorithms to improve the efficiency of network motif detection by orders of magnitude.
            This facilitates the detection of larger motifs in bigger networks than previously
            possible. Additional benefits of FANMOD are the ability to analyze colored networks,
            a graphical user interface and the ability to export results to a variety of machine-readable
            and human-readable file formats, including comma-separated values and HTML.
          
           - 
              F-seq
            F-seq is a software package that generates a continuous density estimation of sequence
            tags mapped to a reference genome, which can be displayed using the UCSC Genome Browser.
            The continuous density plots are more intuitive than discrete histogram-like plots used
            by some applications. Using kernel density estimation, F-seq can aid the identification
            of biologically meaningful sites.      
          
           - 
              GERP
            GERP identifies constrained elements in multiple alignments by quantifying substitution deficits.
            These deficits represent substitutions that would have occurred if the element were neutral DNA,
            but did not occur because the element has been under functional constraint. These deficits,
            or rejected substitutions, are a natural measure of constraint that reflects the strength of
            past purifying selection on the element. GERP estimates constraint for each alignment column;
            elements are identified as excess aggregations of constrained columns. A false-positive rate
            (which is user-settable) is calculated using 'shuffled' alignments in which the order of columns is randomized.
          
           - 
              GFS
            GFS is a program that maps peptide mass fingerprint data directly to raw genomic sequence,
            enabling rapid low-cost identification of proteins in genomes for which annotation is lacking.
            An experimentally obtained peptide mass fingerprint is entered into the program, which then scans
            a genome sequence of interest and outputs the most likely regions of the genome from which
            the mass fingerprint is derived.
          
           - 
              GOrilla
            GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes,
            without requiring the user to provide explicit target and background sets. It also employs a
            flexible threshold statistical approach to discover GO terms that are significantly enriched
            at the top of a ranked gene list. Building on a complete theoretical characterization of
            the underlying distribution, GOrilla computes an exact p-value for the observed enrichment,
            taking threshold multiple testing into account without the need for simulations. The output
            of the enrichment analysis is visualized as a hierarchical structure, providing a clear view
            of the relations between enriched GO terms.
          
           - 
              GOstats
            GOstats is a  set of tools implemented in R Bioconductor for interacting with GO and microarray data.
            It provides a variety of basic manipulation tools for graphs, hypothesis testing including
            hypergeometric tests, and visualization tools.
          
           - 
              GREAT
            GREAT assigns biological meaning to a set of non-coding genomic regions by analyzing the
            annotations of the nearby genes. Thus, it is particularly useful in studying cis functions
            of sets of non-coding genomic regions. Cis-regulatory regions can be identified via both
            experimental methods (e.g., ChIP-seq) and by computational methods (e.g. comparative genomics).
          
           - 
              GSC (Genome Structure Correction)
            Assessing the significance of observations within large scale genomic studies using
            random subsampled genomic region is a difficult problem because there often exists a
            complex dependency structure between observations. GSC is a data subsampling approach
            based on a block stationary model for genomic features to alleviate the hidden dependencies.
            This model is motivated by earlier studies of DNA sequences, which show that there are global
            shifts in base composition, but that certain sequence characteristics are locally unchanging.
          
           - 
              HiveR
            The hive plot is a visualization method for drawing networks. Nodes are mapped to and
            positioned on radially distributed linear axes. Edges are drawn as curved links. Hive
            plots can give quantitatively understanding for important aspects of a network's structure.
            Hive plots can also manage the visual complexity arising from a large number of edges and
            expose both trends and outlier patterns in a network structure.  
          
           - 
              Java Treeview
            Java Treeview is an open source, cross-platform gene expression visualization tool
            and an interactive display of clustered gene expression data, similar to Eisen's treeview.
            It is also an extensible starting point for other gene expression visualization tools.  
          
           - 
              KING
            KING is a rapid algorithm for relationship inference using high-throughput genotype
            data typical of GWAS that allows the presence of an unknown population substructure.
            The relationship of any pair of individuals can be precisely inferred by robust
            estimation of their kinship coefficient, independent of sample composition or population
            structure (sample invariance). KING performs properly even under extreme population
            stratification, while algorithms assuming a homogeneous population give systematically
            biased results. KING performs relationship inference on millions of pairs of individuals
            in the order of minutes.
          
           - 
              lumi package
            The lumi package in R provides an integrated solution for the Illumina microarray data
            analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality
            control, BeadArray-specific variance stabilization, normalization and gene annotation at
            the probe level. It also includes the functions of processing Illumina methylation microarrays,
            especially Illumina Infinium methylation microarrays.
          
           - 
              mfinder
            mfinder is a software tool for network motifs detection. Network motifs are defined as
            basic interaction patterns that recur throughout biological networks, much more often
            than in random networks. In order to detect network motifs mfinder implements two methods:
            a full enumeration of subgraphs and a sampling of subgraphs for estimation of subgraph
            concentrations. mfinder generates random networks based on the switching method,
            the stubs method and "Go with the winners" algorithm.
          
           - 
              Peppy
            Peppy is software that integrates several critical tasks of proteogenomic searching and proteogenomic
            mapping such as: Full 6-frame translation and digestion of a genome, peptide/spectrum
            matching and quality assessment, and calculation of false discovery rates.
          
           - 
              RuleFit3
            RuleFit3 is a predictive learning method and interpretational tool. It is based on
            general regression and classification models, which are constructed as linear combinations
            of simple rules derived from the data. Each rule consists of a conjunction of a small number
            of simple statements concerning the values of individual input variables.
          
           - 
              Webgestalt
            WebGestalt is a "WEB-based GEne SeT AnaLysis Toolkit". It is designed for
            functional genomic, proteomic and large-scale genetic studies from which a large number
            of gene lists (e.g., differentially expressed gene sets, co-expressed gene sets, etc.)
            are continuously generated. WebGestalt incorporates information from different public
            resources and provides an easy way for biologists to make sense out of gene lists.
          
           
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010 Sep;38(16):e164. PMID: 20601685; PMCID: PMC2938201
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. PMID: 20110278; PMCID: PMC2832824
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. PMID: 19261174; PMCID: PMC2690996
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information aesthetic for comparative genomics. Genome Res. 2009 Sep;19(9):1639-45. PMID: 19541911; PMCID: PMC2752132
de Hoon MJ, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004 Jun 12;20(9):1453-4. PMID: 14871861
Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44-57. PMID: 19131956
Wernicke S, Rasche F. FANMOD: a tool for fast network motif detection. Bioinformatics. 2006 May 1;22(9):1152-3. PMID: 16455747
Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8. PMID: 18784119; PMCID: PMC2732284
Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. PMID: 21152010; PMCID: PMC2996323
Giddings MC, Shah AA, Gesteland R, Moore B. Genome-based peptide fingerprint scanning. Proc Natl Acad Sci U S A. 2003 Jan 7;100(1):20-5. PMID: 12518051; PMCID: PMC140871
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009 Feb 3;10:48. PMID: 19192299; PMCID: PMC2644678
Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007 Jan 15;23(2):257-8. PMID: 17098774
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010 May;28(5):495-501. PMID: 20436461
Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR. Subsampling methods for genomic inference. Annals of Applied Statistics. 2010;4(4):1660-1697
Krzywinski M, Birol I, Jones SJ, Marra MA. Hive plots--rational approach to visualizing networks. Brief Bioinform. 2012 Sep;13(5):627-44. PMID: 22155641
Saldanha AJ. Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004 Nov 22;20(17):3246-8. PMID: 15180930
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010 Nov 15;26(22):2867-73. PMID: 20926424; PMCID: PMC3025716
Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008 Jul 1;24(13):1547-8. PMID: 18467348
Kashtan N, Itzkovitz S, Milo R, Alon U. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics. 2004 Jul 22;20(11):1746-58. PMID: 15001476
Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W741-8. PMID: 15980575; PMCID: PMC1160236
