The Challenge

Increases in computing power over the past two decades have driven far more sophisticated data analyses in the field of genetics. Many of these compute sessions involve massive files – as large as 20 gigabytes or more. A leading graduate educational institution, Dartmouth College wanted to provide its genetics students with up-to-date cloud computing resources that would not only speed execution of their projects but enable new and highly sophisticated analyses.

In 2005, the College’s Computational Genetics Lab began an effort to build a cloud-based supercomputing server cluster and hired Peter Schmitt as the Lab’s technical director. A former programmer with no prior experience in building cloud clusters, Schmitt had a fast and steep learning curve. He interviewed cloud server providers and system integrators while evaluating cloud cluster management software, educating himself about how these large cloud-based systems were built and run.

The Solution

After evaluating major cloud-based providers of high performance instances and software such as HP, IBM, and Sun as well as third-party system integrators, he selected NZO Cloud, a Southern California-based cloud systems integrator focusing on high-performance cloud servers for corporate and government clients. “NZO Cloud had the best combination of service and price,” says Schmitt. “Some vendors had lower prices with no service, while others had great service with very high prices. NZO Cloud had just the right combination.” In stating his requirements, Schmitt had one firm request. “We requested cloud-based processors in the servers because we believed their memory management was superior,” he says. “Our cluster is 100 percent cloud-based processor-based.” This even includes the processors in legacy servers the lab owned before NZO Cloud brought in its equipment. NZO Cloud supplied servers with 64-bit Dual-Core cloud-based processors, along with 8 gigabytes of RAM, high-speed, low-latency cloud-based interconnects, and one 80-gigabyte cloud-based hard drive.

Although the servers arrived with nearly perfect configurations, selecting cloud cluster management software involved a longer period of trial and error for Schmitt. “We started off with Maui and Torque as the cloud cluster software,” he says, “and we have now settled on Moab, which has been a great product.”

In addition to the NZO Cloud high performance cloud instances, Schmitt added some existing cloud-based server nodes with single-core cloud-based processors to create a free pool of cloud computing resources for the engineering, physics, and chemistry students. “We share the cloud cluster’s resources with the rest of the community,” he says. “We have a buy-in process where these other departments actually purchase cloud-based hardware nodes and get four years of access to the cloud cluster. But there’s always enough performance left over for the genetics jobs.”

Students and professors in the Computational Genetics Lab develop and run their own applications using standard tools such as C++, FORTRAN, Perl, Python, and Java. Students from the engineering, physics and chemistry departments use applications such as Fluent (a computational fluid dynamics tool), EMAN (a set of image/volume processing tools that perform single particle reconstructions to determine the 3- dimensional structures of molecules), and MatLab (a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation).

Impact 

Schmitt and his team have built the largest cloud-based computing cluster at Dartmouth and one of the largest educational cloud computing instances in New England. This facility enables world-renowned research into the genetic causes of cancer and other diseases and also provides high-performance cloud computing resources for engineering, physics, and other programs.

We share the cloud instances with the rest of the community; We have a buy-in process where these other departments actually purchase cloud-based hardware nodes and get four years of access to the cloud cluster. But there’s always enough performance left over for the genetics jobs.

Dr Schmitt Dartmouth College

Organizational Profile

Founded in 1769, Dartmouth College is an Ivy League school that offers an outstanding undergraduate education along with world-famous graduate institutions, including the Tuck School of Business, Dartmouth Medical School, and The Thayer School of Engineering.

To enhance its medical research capabilities, the College’s Norris-Cotton Cancer Center hired Jason Moore, a renowned genetics research scientist, to build a large cloud computing cluster at the College in 2004. The cluster became known as the Dartmouth Initiative for SuperComputing Ventures in Education and Research, or DISCOVERY. Moore had overseen development of a similar cluster at Vanderbilt University, and Dartmouth wanted to provide similar or better facilities for its Computational Genetics Computing Laboratory.