The Challenge

Increases in cloud computing power over the past two decades have driven far more sophisticated data analyses in the field of genetics. Much of this research involves massive files – as large as 20 Gbps or more. A leading graduate educational institution, Dartmouth College wanted to provide its genetics students with up-to-date cloud computing resources that would not only speed execution of their projects but enable new and highly sophisticated analyses.

In 2005, the College’s Computational Genetics Lab began an effort to build a supercomputing environment and hired Peter Schmitt as the Lab’s technical director. A former programmer with no prior experience in building out these HPC platforms, Schmitt had a fast and steep learning curve. He interviewed several providers and system integrators while evaluating management software, educating himself about how these large cloud-based systems were built and run.

The Solution

NZO Cloud delivered a High Performance Private Cloud Instance integrated with NVIDIA Telsa Graphical Processor Units (GPU). The High Performance Private Cloud Instancewas delivered as a turn key solution with all necessary hardware and a custom developed software platform. This allowed Dr. Pien to simply install his own software packages and begin running jobs immediately.

In addition to the NZO Cloud HPC instances for the genomics analysis, Schmitt added some additional instances resources for the engineering, physics, and chemistry students. “We share the resources with the rest of the community,” he says. “We have a buy-in process where these other departments actually purchase resources and get four years of access. But there’s always enough performance left over for the genetics jobs.”

Students and professors in the Computational Genetics Lab develop and run their own applications using standard tools such as C++, FORTRAN, Perl, Python, and Java. Students from the engineering, physics and chemistry departments use applications such as Fluent (a computational fluid dynamics tool), EMAN (a set of image/volume processing tools that perform single particle reconstructions to determine the 3- dimensional structures of molecules), and MatLab (a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation).

Impact

Schmitt and his team have built the largest cloud-based computing instance at Dartmouth and one of the largest educational cloud computing instances in New England. This facility enables world-renowned research into the genetic causes of cancer and other diseases and also provides high-performance cloud computing resources for engineering, physics, and other programs.

We share the resources with the rest of the community; We have a buy-in process where these other departments actually purchase resources and get four years of access to the cloud instance. But there’s always enough performance left over for the genetics jobs.

Dr Schmitt Dartmouth College

Organizational Profile

Founded in 1769, Dartmouth College is an Ivy League school that offers an outstanding undergraduate education along with world-famous graduate institutions, including the Tuck School of Business, Dartmouth Medical School, and The Thayer School of Engineering.

To enhance its medical research capabilities, the College’s Norris-Cotton Cancer Center hired Jason Moore, a renowned genetics research scientist, to build a large cloud computing instance at the College in 2004. The instance became known as the Dartmouth Initiative for Supercomputing Ventures in Education and Research, or DISCOVERY. Moore had overseen development of a similar instance at Vanderbilt University, and Dartmouth wanted to provide similar or better facilities for its Computational Genetics Computing Laboratory.