Groups at CSAIL

What’s a “group”?

CSAIL’s computing infrastructure has a bunch of different things called “groups”, and it can be confusing sometimes. These functions arose over time and some of them reflect historical circumstances that don’t necessarily apply in the way you expect, especially if you have experience with computing facilities at other instutions. This guide will attempt to explain what the different kinds of groups are and how they are used.

Fundamentally, for computing purposes, a group serves two functions: access control and delegation of administration. Different services have different access-control policies and take notice of different kinds of groups (sometimes internal to the service, most of the time external).

All of these different kinds of groups are “free”, except insofar as they are limited by prior claims on names and the potential for confusion. CSAIL users should therefore take advantage of them in whatever way fits with the needs of your research. TIG will try to offer guidance as to what is the most appropriate way to set up groups to meet your needs, and if you find that things are not working the way you want, please ask for assistance.

History

Historically, groups were aligned with what in the Laboratory for Computer Science were called “research groups”: one or more faculty or principal investigators, an ongoing research activity and budget, research staff, administrative assistants, and students. Most LCS research groups were multi-PI and had their own computing and storage resources and system administrators. Most groups in the Artificial Intelligence Laboratory were single-PI and shared a common computing infrastructure, storage, and system administration group with the rest of the AI Lab. When the two labs merged in 2003, services that one lab had originally implemented generally retained whatever “native” access control model they had started with.

As a result of a security incident, it was decided that an authenticated, secure shared storage infrastructure was required, and the obvious answer at the time was to deploy AFS, which was already in use on the Athena system and therefore familiar to most faculty and students on campus. AFS had the additional benefit of allowing delegated access control so that system administrators would not be spending all their time editing a centralized /etc/group file: users could set access lists on their own.

Some researchers found the security mechanism of AFS too onerous for their computing needs, and insisted on keeping the old insecure NFS storage around. In order to support this usage without losing the benefits of delegated administration, TIG developed a mechanism to automatically populate a Unix-style groups database (originally /etc/group but now in an LDAP directory) with the memberships of a select subset of AFS groups, so the contents of the AFS protection database would largely reflected in Unix (and therefore NFS) permissions after a propagation delay.

Meanwhile, the INQUIR account management system had its own notion of research groups, based on the LCS model. (While INQUIR existed as a PDP-10 program on the AI ITS machines, it had been abandoned with the machines themselves.) With many PIs no longer employing their own system administrators, there needed to be a way for users to sign up for accounts online, using the web, rather than having a sysadmin manually create one by running a terminal application on a central server. But we could not just allow anyone to sign up for an account, without the approval of someone responsible, since we knew that this would lead very quickly to abuse. Yet, PIs did not want to have to manually approve their new students’ accounts, especially when they were traveling and the account setup was urgently needed. So the existing INQUIR research group mechanism, which had already included an “administrator flag” for the group system administrators, was extended to use the same mechanism (and the same flag) to allow for delegated user account administration, including new user signup, account record editing, and (when user expiration was introduced) account renewals.

INQUIR groups

Used for:
Access control for administrative updates to INQUIR records; account signup; INQUIR record audit notifications
Visible in:
Controls write access to INQUIR records and generation of audit email for updates
Source of truth:
A table in the INQUIR database
Who can create?
TIG sysadmins
Who can modify?
TIG sysadmins; those designated as group administrators can add and remove other administrators
Special restrictions:
Names must be unique and meaningful to new users signing up; should abbreviate to something that is a valid AFS protection group

The original INQUIR database represented groups as a free-form text field. To allow for easy keyboard completion, this was converted into a controlled vocabulary, which also made it possible for users (primarily sysadmins, consultants, and administrative assistants) to belong to multiple groups, although they still can only have one supervisor. Because the set of historic LCS research groups was significantly smaller than the set of potential supervisors, INQUIR functions including account signup were implemented as “select a group first, then select a supervisor from a group-specific list”, rather than having users select a supervisor first and then inheriting the supervisor’s group membership. (This also makes it easier to deal with new users who have no supervisor, such as new faculty hires.)

Associated with each user’s group membership is a three-way status flag: it can be “administrator”, “administrator (notifications disabled)", or “regular group member”. Members of a group who have either administrator status can approve new users in that group, change the status of other group members, and can change most of the fields in individual user records. This is primarily used for the annual account expiration cycle. Group members with “administrator” status receive an email notification for all changes to members of that group, and also receive account expiration reminders during the annual account renewal cycle.

Filesystem groups

AFS group volumes

Used for:
Low-cost, moderate-volume bulk storage of research data; group web sites
Visible in:
/afs/csail.mit.edu/group; AFS mount points (fs command)
Source of truth:
AFS mount points in the group volume; AFS file server metadata
Who can create?
TIG system administrators
Who can modify?
Determined by the AFS access list on each directory
Special restrictions:
Must be a Portable Filename as defined by the POSIX standard, except that the . (dot) character is not allowed; TIG recommends all-lower-case and avoidance of underscores for ease of typing.
Relevant commands:
fs examine, fs listquota, fs lsmount

When a PI requests the creation of a new INQUIR group (see above), TIG creates and mounts an AFS volume such that the groups web server (see below) will automatically recognize the group and serve its content. In addition, an AFS protection group will be created with write access to the volume, which will either be self-administered or have a separate -admin group created to own it.

By convention, the AFS volume name is group._name_, which imposes some restrictions on the length of the name; it will be mounted at /afs/csail.mit.edu/group/_name_.

Note that the AFS volume location database and AFS file servers are globally accessible and volume status information can be queried by unauthenticated remote users.

AFS protection groups

Used for:
AFS access controls, including filesystem access lists; a subset is propagated into the Unix group database
Visible in:
AFS protection database; AFS access lists (fs command)
Source of truth:
AFS protection database
Who can create?
User-scoped groups can be created by anyone; system-scoped groups can only be created by members of the group system:administrators.
Who can modify?
AFS protection groups can be created, modified, examined, listed, and deleted by the group’s owner, using the pts utility. In addition, AFS groups have a set of access flags that control whether list members, or any authenticated user, can add members or themself, remove members or themself, and view the group membership.
Special restrictions:
Must be unique with all existing protection group, user, and mail alias names; . (dot) characters and leading hyphens are not allowed; : (colon) characters are forbidden except where mandatory.
Relevant commands:
pts add, pts remove, pts examine, pts members; fs listacl, fs setacl

AFS protection groups are not to be confused with AFS group volumes as described above; both “group” and “project” volumes will ordinarily correspond one-to-one with an AFS protection group. (Same naming, different types of objects: one is a storage volume and one is a principal for access control purposes.)

Note that the AFS protection database is globally accessible, and some information about users and groups may be queried by unauthenticated remote users.

CSAIL practice for AFS groups generally divides them into two classes: “self-administered” groups allow any group member to add or remove other group members, but the group is formally owned by system:administrators (that is, TIG sysadmins) so that it cannot be accidentally deleted; “regular” groups (which are actually the minority) are owned by a separate -admin group, which is itself self-administered. Self-administered groups have access flags S-Mar; regular groups will normally be S-M--.

(To ensure uniqueness, an INQUIR user with relation type Namespace reservation should be created for each system-level AFS protection group, including the -admin groups if used.)

Autofs mount points

Used for:
Automatically mounting centrally managed NFS filesystems
Visible in:
/data and /archive; defined in /etc/auto.d/auto.data and /etc/auto.d/auto.archive
Source of truth:
Puppet servers; TIG-maintained configuration management repository
Who can create?
TIG sysadmins
Who can modify?
TIG sysadmins
Special restrictions:
TIG recommends avoiding capital letters and underscores

CSAIL NFS uses the Linux automounter, autofs, to provide for some flexibility in assignment of filesystems to NFS servers. CSAIL NFS paths start with /data or /archive, and the second component of the pathname is an arbitrary string that normally identifies the group, project, or principal investigator. What is mounted on that path may be an actual filesystem, but more commonly it will be a second-level automount map, which then in turn mounts the individual filesystems. (Some groups have more complex structures.) There is no technical significance to the name; it is merely an organizational convenience to group together filesystems belonging to the same PI or activity.

Unix/LDAP groups

Used for:
NFS and local access controls on CSAIL Ubuntu servers and workstations; some web applications
Visible in:
ldap.csail.mit.edu directory; Unix system databases; /afs/csail.mit.edu/service/inquir
Source of truth:
INQUIR cross-walked with the AFS protection database
Who can create?
TIG sysadmins
Who can modify?
Anyone who can modify the underlying AFS protection group
Special restrictions:
Same as for system-scope AFS protection groups
Relevant commands:
getent, groups, id, ldapsearch

In addition to the normal filesystem access control function, Unix groups are also used for system access controls, enforced for local logins, SSH logins, and sudo, each using different requirements logic and configuration files.

At present, the process that crosswalks INQUIR with the AFS protection database and the process that synchronizes LDAP with the result are run by two separate (and unsynchronized) cron jobs. The crosswalk runs every 15 minutes, but the LDAP update (which is much slower) only runs every half hour, so there is a maximum propagation delay of 45 minutes. It is planned to fix this some day.

Be aware that NFS access control is performed entirely on the client, and clients are free to lie to the NFS server about the identity of the user; moreover, any superuser on any client can change their group IDs to arbitrary values.

Also note that the CSAIL LDAP directory is public and globally accessible, including all group memberships. The information in the directory is also globally readable via AFS in /afs/csail.mit.edu/service/inquir.

The selection of which AFS protection groups are reflected into Unix groups is gated on the existence of an INQUIR user with relation type Magic AFS group pseudo-user; the underlying AFS group must exist and have at least one member before the INQUIR entry is added.

Web server groups

groups.csail.mit.edu

Used for:
Web sites
Visible in:
https://groups.csail.mit.edu/([[:alnum:]][-_[:alnum:]]*)/
Source of truth:
/afs/csail.mit.edu/group/$1/www/data
Who can create?
TIG system administrators
Who can modify?
Determined by the AFS access list on each directory
Special restrictions:
As defined for AFS group volumes (see above); should also be valid as a label in a DNS name and not require encoding in a URL.

The shared web server groups.csail.mit.edu was created with the classic LCS “research group” model in mind, as contrasted with smaller “projects” served from projects.csail.mit.edu. Today, the division between “groups” and “projects” in CSAIL web space is fairly arbitrary, but the intuition at least should be that a “group” has more people and is longer-lived than a “project”. The groups web server always serves content from the AFS directory /afs/csail.mit.edu/group/$1/www/data, where $1 is the first pathname component in the URL; this directory structure is created by TIG (and configured with read-only access for the web server) as a part of creating an AFS group volume (see above).

CSAIL OpenID Connect

Used for:
Web site access control
Visible in:
groups claim returned by oidc.csail.mit.edu
Source of truth:
LDAP directory
Who can create?
See Unix/LDAP groups, above
Who can modify?
See Unix/LDAP groups, above
Special restrictions:
None

Any web application that uses the CSAIL OpenID Connect service for user authentication can request the groups scope. (This includes TIG-supported shared web servers such as groups, projects, and people, as well as applications like WebDNS.) If the authenticated user approves the release of this information, the application will receive a groups claim when querying the OIDC user information endpoint, which will consist of an array of group names to which the user belongs. There is no inherent meaning assigned to this information, and it is not a standard OpenID Connect scope or claim, but applications or access lists may be written to make use of it.

Note that the OIDC servers cache the results of the LDAP query, and therefore the information returned can be stale.