Holyoke Data Center

MIT is one of five partner universities (along with Harvard, UMass, Northeastern, and BU, as well as the state government) in the Massachusetts Green High-Performance Computing Center, which is located on the Connecticut River in downtown Holyoke, MA. MIT's participation in MGHPCC is managed jointly by IS&T and the VP of Research; in the VPR's office, Chris Hill (EAPS) manages space allocation in MIT's section of the facility.

Facility information

MGHPCC is a modern, new-construction, LEED Platinum-certified, 90,000-square-foot data center facility, with 24-hour security and strictly controlled access. A limited number of personnel from each member institution have ID cards that allow access to the security desk, at which point they may pick up a set of keys allowing access to their institution's facilities. There is in addition a shared office space with a kitchen and a private conference room. A staging room is provided for unboxing and initial setup of equipment. Guests and service providers may work in the facility when accompanied by an authorized tenant representative.

Presently, approximately one quarter of the facility is unoccupied and un-built-out, but physically within the machine room and within the power and cooling capacity of the existing MGHPCC building. In addition, the parcel on which MGHPCC is located has enough land to build a second data center building adjacent to the existing facility, and there is sufficient electrical power to support a new building, should the need arise.

Network connections

MIT provides network connectivity for all institutions at MGHPCC (except UMass) via the regional optical network. The long-haul network consists of two paths, one that runs parallel to the Massachusetts Turnpike (the "short path") and one that runs parallel to I-95 through Chepachet, R.I., and New York City (the "long path"). The optical network uses DWDM to multiplex multiple 10-gigabit circuits over each path, and optical equipment at 300 Bent St. demultiplexes these circuits for delivery to each institution. Each institution's network traffic is backhauled separately from Holyoke -- only the physical layer is shared -- and terminates on their own equipment. There is a shared "meet me" switch, operated by the facility managers, which can be used for institutions to interconnect at layer 2 or layer 3, but use of this has been stymied by policy differences at the various institutions.

CSAIL maintains its own network at MGHPCC, which is backhauled to Stata on 10-gigabit optical circuits via 300 Bent St. and building 24. This is a network-layer (layer 3) connection; TIG operates a 10-gigabit switch in MGHPCC (with out-of-band remote access and UPS power) to provide connectivity for CSAIL users in the facility. CSAIL users will not be on the same network as they currently use in Stata. The following networks are currently assigned:

Network Usage
128.30.192.0/24 CSAIL client machines in R5-PC-C05
128.30.193.0/24 CSAIL client machines in R5-PC-C07
128.30.194.0/24 CSAIL client machines in R5-PB-C18
128.30.195.0/24 Vision cluster in R5-PB-C13/15/17/19
128.30.196.0/24 CSAIL client machines in R5-PC-C01
128.30.197.0/24 CSAIL client machines in R5-PC-C10
128.52.60.0/22 OpenStack and Mass Open Cloud infrastructure
128.52.64.0/18 OpenStack client VMs
172.17.0.0/16 TIG remote management network

Note: All newly occupied cabinets in MGHPCC will require purchase of a new Juniper EX4300 switch to provide connectivity to the equipment. It is currently up in the air how this will be paid for.

Electrical power

MGHPCC has a 15-megawatt utility connection from Holyoke Gas & Electric, about 70% of which is sourced locally from hydroelectric facilities on the Connecticut River. Of that, 10 MW is dedicated to computing systems (as opposed to infrastructure services like cooling, lighting, and security). Approximately one third of the facility capacity has UPS backup; the remaining two thirds receive conditioned utility power. The UPS system consists of a flywheel energy storage system to provide short-term (several seconds) emergency power, coupled with a generator system for longer-term outages. Note well: the MGHPCC electrical system lacks a redundant maintenance bypass. As a result, once a year there is a planned 24-hour full facility shutdown (including UPS power) for electrical-system maintenance. The MGHPCC partners have been discussing whether to make a further investment in the facility to provide an additional maintenance bypass to obviate the need for a full shutdown, but as on 2016-08-01 the universities have preferred to spend the facility budget on expanding capacity instead.

Each cabinet is fed by two 208-volt, 3-phase power distribution units, which are remotely controllable; each outlet can be individually monitored. All power outlets are IEC 320 C13 type, which requires a special cable; appropriate cables are usually available on site, but when ordering equipment, requesting the right cable will make installation easier and faster. If higher-power connectors are required (very unusual), the cabinet PDUs can be replaced. "Wall wart"-type power supplies generally cannot be used, because there are no NEMA-standard receptacles within the conditioned space. (Standard NEMA 5-20 (120-volt) outlets are available for maintenance outside the end of each "pod" of cabinets.)

Air handling

MGHPCC uses a "hot aisle" containment design with in-row cooling. Each pod consists of two rows of cabinets, facing each other back-to-back across a service aisle which is enclosed at both ends and both above and below. Four (six?) cabinets in each pod are dedicated to cooling, with thermostatically controlled variable-speed fans pulling air out of the hot aisle into the room. The system is engineered to maintain an ambient air temperature of 70–75°F. Normally mounted equipment should be designed for front-to-back ventilation; if any equipment is to be mounted on the rear rails of the cabinet, it should either be passively cooled, or be designed for back-to-front ventilation.

Shipping details

A loading dock is open to accept deliveries of equipment during regular business hours, and users are encouraged to have their equipment shipped directly there when possible. The shipping address is:

ATTN: Name of contact person (ask TIG)
MIT CSAIL
c/o Massachusetts Green High-Performance Computing Center
100 Bigelow St.
Holyoke, MA 01040
phone: 413-552-4900

When identifying the location of equipment for inventory purposes, the MIT building number for MGHPCC is OC40.

Remote access

"Remote hands" services in Holyoke are available from private contractors. In addition, three TIG sysadmins have access to the facility and can be on site in the case of an emergency, but this is effectively an all-day exercise because of the travel time. For this reason, servers that require day-to-day hands-on access should remain on campus. TIG also has an out-of-band console server to manage CSAIL infrastructure at the facility; however, this server is not available to end users, and you should plan on providing your own console access (using IPMI or similar remote-access management systems) over a local (unrouted) private network. TIG will provide a remote-console VLAN and a bastion host for client use.

CSAIL facilities at MGHPCC

In addition to the network switches and remote console mentioned above, TIG operates NTP, DNS, and DHCP servers for client use at MGHPCC, as well as a scratch NFS server. We currently have the following cabinets:

Cabinet Used
R5-PB-C18 Network, infrastructure, and NFS sever (UPS power)
R5-PB-C13 Vision cluster
R5-PB-C15 Vision cluster
R5-PB-C17 Vision cluster
R5-PB-C19 Vision cluster
R5-PC-C01 client machines
R5-PC-C05 client machines
R5-PC-C07 SLS file server and GPU cluster
R5-PC-C06 OpenStack xg-* nodes
R5-PC-C08 OpenStack os-1g-* nodes
R5-PC-C10 Sontag cluster
TIG will coordinate with Chris Hill to allocate additional racks for CSAIL users as needed.

Process for getting space in MGHPCC

CSAIL members interested in placing equipment at MGHPCC should send email to help@csail with the following information:
  1. What sort of equipment will be installed (size, function, power requirements)?
  2. What CSAIL or MIT services are required for operation (e.g., storage and network interfaces)?
  3. What are the scheduling constraints?
  4. Is the equipment moving from Stata or elsewhere on campus, or will it be delivered new-in-box directly to MGHPCC?
  5. In the latter case, how does it need to be set up and who will be responsible for doing so?

Once TIG receives the request, we will coordinate the allocation of space and other resources, and arrange for movers (if necessary) to deliver the equipment to MGHPCC once the space is ready to accommodate it.

IPMI (Remote Connection to Management Port on Machines In Holyoke)

For each machine send to Holyoke we configure IPMI (The Intelligent Platform Management Interface). Among other things, this is will give users the ability to hard reboot machines that are unresponsive.

More information on the IPMI wiki page.

-- GarrettWollman - 14 Jan 2016
Topic revision: 31 May 2017, GarrettWollman
 

MIT Computer Science and Artificial Intelligence Laboratory

 

  • About CSAIL
  • Research
  • News + Events
  • Resources
  • People

This site is powered by Foswiki MIT: Massachusetts Institute of Technology