Secured Data Environment

Computing in Secured Medium Risk Environments

TIG can provide support for creating secured data and compute clusters on user purchased hardware sufficient for Medium Level Confidential Information’ such as deidentified medical or financial datasets.

High Level data should never be stored on CSAIL research systems as we do not have security staffing to provide appropriate audit and training.

Standard CSAIL Linux security practices

minimum password complexity:

Per-host failed ssh connections:

Reactive port filtering blocks remote hosts by IP after 4 failed ssh Reactive port filtering blocks remote hosts by IP after 4 failed ssh authorization attempts within 20min (may be for one user or four different users). Lock clears at minimum of 7 minutes of last failed attempt.

System wide failed password attempts:

To prevent distributed password guessing across a large number of connected systems individual password authentication is centrally blocked using the following policy:

Linux security patches:

Any vendor updates marks as “security updates” are automatically applied daily and at reboot. Systems are not automatically rebooted for kernel patches though the message of the day displayed on login is updated to indicate if there is a “reboot needed”. Optionally a class of system could be configured to reboot either as soon as kernel updates happen or at a given time of day if a kernel update has been applied.

Physical access

CSAIL Physical access

Server rooms in Stata are prox card controlled cabinets are locked but keying is common (all keys open all racks). After business hours there are two more locked prox carded doors, but during the day these doors are unlocked.

MGHPCC Physical access

It is strongly recommended secured data be stored in MGHPCC

CSAIL also has server space in the Massachusetts Green High Performance Computing Center (MGHPCC https://www.mghpcc.org) which has extensive physical security and access controls, which unfortunately require a facility access account to even see, but the short version is:

There’s a security desk which checks identification against an restrictive access list to check out the following keys:

TIG (or most of TIG) has access. Path to a server involves security check, leaving ID at desk, general access locked door, machine-room locked door, “pod” locked door.

Enhanced security practices for sensitive data

Requires dedicated hardware from research group for gateway system, storage, and computation.

Current recommended practice for sensitive but unregulated data sets such as deidentified medical records.

Data sets including regulated PII are not appropriate for storage on any CSAIL Research systems.

Management practice

Access granted only to PI specified group and The Infrastructure Group (TIG)

All system level configuration updates via version controlled TIG configuration management system.

All data encrypted at rest.

All systems that store or mount secured data to be on isolated private network.

Access to private network through dedicated gateway host.

Access to gateway requires 2 factor authentication (2FA)

Gateway does not store or mount secured data.

All systems log to central log-server with 90 day log retention policy (may be extended)

No system connected to the private network may mount shared storage from the public CSAIL network including AFS and NFS shares.

Network isolation

For small clusters TIG may provide a private vlan on shared switching for network isolation.

As with all clusters this is dependent of port availability and groups may be required to purchase additional switching for larger clusters.

Optionally dedicated switch can be purchased for smaller clusters if physical isolation is desired.

Gateway system

At least on gateway system must be provided.

Only this class of host may have interfaces on both the private network and the public network.

The only network service provided on the public network will be ssh.

Ssh access to the gateway requires 2FA. The first factor may be either ssh public key or CSAIL Kerberos ticket no password based access will be allowed. The second factor may be DUO (https://duo.com/) or Yubikey hardware token (https://www.yubico.com/)

The gateway will also provide the following network services on the private network:

The gateway system will not act as log-server

The gateway will not access storage systems and thus is not appropriate as a compute node.

Storage subsystem

All storage will be encrypted at rest. There are two supported ways of doing this:

  1. Linux native LUKS whole disk encryption.
  2. A dedicated storage array which supports self encrypting disks.

The server hosting the storage may either be a dedicated system or a compute system.

It may not have any connection to public networks except through the gate way host.

It may share its storage using any desired protocol on the private network.

Compute Systems

Compute systems may not have any connection to public networks except through the gate way host.

They may mount (or host) storage subsystems as described above.

Logging

The log-server may be inside or outside the private network as the gateway and other hosts will push their logs.

If inside it must be a dedicated physical system.

If outside it may be a virtual machine on the CSAIL OpenStack cloud or a dedicated hardware systems.

All log traffic will be encrypted over the wire.

Ssh access will require 2FA in the same way as the gateway host. Access will be no more permissive than the gateway host, but may be more restrictive provided TIG access is maintained.

Backup

TIG provides no support for secured backups at this time. Additional storage subsystems may be added and standard backup tools (such as rsync) applied to keep backups within the protected environment.

If off site backups are required data must be encrypted during backup process and decryption keys may not be stored in same location as backups. This would require significant resources and could not be supported by TIG so current recommendation is to try not to do anything requiring both off site backup and secure data handling.

Data Ingress

Data must be encrypted in transit.

Data must not transit any CSAIL systems outside the secured network. This means no hooking up encrypted drives to your regular CSAIL workstation and scp’ing files.

Acceptable methods include https, sftp, scp or other encrypted data transfer protocol initiated from secure storage nodes or connecting an encrypted drive to the storage node and using cp.

Data Egress

It is incumbent upon PI responsible for secured data to see that all authorized users understand what data is “secure” and what results generated form computation on secured data may be legitimately moved out of the secured area for publication or other purposes.

Moving “de-secured” data out may simply be done using scp of sftp to the desired external location.

Moving secure data out of the secured area must be authorized by the PI, and is presumably bound by agreements with the original data provider.

Within whatever additional constraints that agreement may provide, the data must be encrypted in transit either using secure network protocols, encrypted disk technologies, or PGP file encryption.

In the case of encrypted disks which require sharing a passphrase, that passphrase may not be transmitted in clear text. It may be transmitted orally via telephone or encrypted internet messaging. It may also be sent in email using public key cryptography such as gnupg, pgp, or x509 certificates.

System commissioning

Any system entering the secured network will be reinstalled at first power on using automated installation services provided by the gateway host.

In the case of a new gateway it will initially only be connected to the public network and installed from the TIG provided installation server. Once it is verified correct the private interface may be connected.

System decommissioning

When secured systems are removed from service their unencrypted discs shall be immediately removed and scheduled either to be drilled through or shredded. Disk shredding may incur charges.

Encrypted disks may simply have their encryption keys deleted. This is certainly sufficient however applying the same policy to all disks is recommended to minimize human error unless there are extenuating circumstances.