Overview

TIG currently offers Object Storage as a part of our OpenStack IaaS platform. The service is provided by Ceph (which is the same software that powers the Dreamhost DreamObjects platform). It is currently possible to access the object store via Amazon S3 and OpenStack Swift compatible APIs.

The object store is not a typical POSIX filesystem. Rather than a typical folder structure with directories and subdirectories, it provides flat namespaces ("buckets") and files are treated as individual "objects" with their own associated metadata and unique identifier. It provides an easily accessible programming interface for storage, which suits the needs of various "cloud" application architectures. Overall, if you find Amazon S3 useful, you may also find Ceph useful.

What does S3 provide that Ceph does not?

S3 is a very large deployment, and can provide lots of storage space to you (provided you can pay for it) that TIG can't. It's also globally distributed and gives you the option to store copies of your data in different geographical regions, making it more redundant to outages and/or natural disasters.

Since TIG isn't globally distributed, and our cluster is quite small in comparison, we can't really provide the same level of service as S3 in those areas. But, if you're more concerned with performance than geographically distributed data, Ceph might work out well for you -- it's located very close to our OpenStack cloud, and is backed by 10Gbps networking, meaning you'll get much better performance for applications inside CSAIL.

Usage

There are several ways to access the Ceph object store. All of them require authenticating with your OpenStack credentials, so make sure you have an account before starting.

To obtain the required credentials for non-web UI access, you can download the appropriate RC files from the OpenStack web interface (See the Compute -> Access & Security -> API Access tab) and download the shell configs via the buttons on the right.

All API access happens through the REST endpoint at https://ceph.csail.mit.edu -- though most of the programming libraries simply need the hostname "ceph.csail.mit.edu", and told that SSL should be enabled (http access will not work).

Swift CLI

The Swift CLI requires that your shell is configured with the settings from your OpenStack RC file, obtained as described above. You can obtain the CLI program by installing the python-swiftclient python module (which is also available as a package from Apt on CSAILUbuntu systems, though it's somewhat outdated). Once you've installed the client and configured your shell, you can test your connectivity to the cluster by running swift stat. The documentation for the CLI is available here: http://docs.openstack.org/cli-reference/content/swiftclient_commands.html

Python Bindings

Official Swift Module

The official Swift API python bindings are installed along with the python-swiftclient package. You should also install the python-keystoneclient package to make authentication work properly. An example of a connection using the python bindings is below:

import swiftclient

connection = swiftclient.client.Connection(
    authurl        = "https://nimbus.csail.mit.edu:5001/v2.0",
    user           = "username",
    tenant_name    = "tenant name",
    key            = "an openstack password",
    # you can alternatively provide a token if you've already auth'd
    # preauthurl   = "https://ceph.csail.mit.edu/swift/v1",
    # preauthtoken = "a token uuid",
    auth_version   = '2'
    )

container = connection.get_container('container-name')

print container

Further documentation is available here: http://docs.openstack.org/developer/python-swiftclient/swiftclient.html

S3 API

If you'd like to access Ceph via the S3 interface, the recommended method is via the 'boto' AWS module. You'll need your AWS key and secret key from the EC2 RC file, obtained as described above:

import boto.s3.connection

connection = boto.connect_s3(
        aws_access_key_id     = "access key",
        aws_secret_access_key = "secret key",
        host                  = 'ceph.csail.mit.edu',
        calling_format        = boto.s3.connection.OrdinaryCallingFormat()
        )

bucket = conn.get_bucket('bucket-name')
for i in bucket.list():
    print i

Further documentation is at http://docs.openstack.org/developer/python-swiftclient/swiftclient.html and http://boto.readthedocs.org/en/latest/s3_tut.html

-- StephenJahl - 05 Jan 2015
Topic revision: 05 Dec 2017, JasonDorfman
 

MIT Computer Science and Artificial Intelligence Laboratory

 

  • About CSAIL
  • Research
  • News + Events
  • Resources
  • People

This site is powered by Foswiki MIT: Massachusetts Institute of Technology