titan Simple serverless deployments for Data Science

Introduction

alt text

Welcome!

Titan is serverless deployment and service orchestration engine for Data Science and Machine Learning projects.

Here you will find comprehensive information to start using Titan in order to easily deploy and scale AI/ML (Artificial Intelligence / Machine Learning) models.

Why Titan?

At Akoios we build technology to help data scientists in their day-to-day tasks.

Using Titan, our flagship product, data scientists can forget about the complexity of AI/ML DevOps workflows, and focus in what is really important: data and models.

Titan automagically transforms models into services.

alt text

How does Titan work?

Titan seamlessly transforms all type of models into scalable and ready-to-use REST APIs with one single command.

alt text

Once the model has been transformed into a service, it can be easily integrated with any software of your choice: ERPs, CRMs, Websites…

This allows Data Scientists to easily iterate and integrate their models into the business processes of their companies.

Titan tools

Titan comes with a handy set of tools to facilitate data science workflows:

Apart from this, Titan comes with several features to boost the productivity in AI/ML projects:

  • Model versioning
  • Team management
  • Usage tracking
  • Monitoring of arbitrary model metrics

Installation

Windows

Open the PowerShell terminal (CTRL+R and type powershell) and run:

iwr -useb install.akoios.com/beta/w | iex

The installer requires PowerShell 5+ and having .NET framework installed. If you system does not satify the requirements, try with the manual installation.

Alternative manual installation

  1. Download titan.exe.

  2. Open a PowerShell terminal: CTRL+R and type powershell.

  3. Run the following command:

    move "${Env:USERPROFILE}\Downloads\titan.exe" "${Env:LOCALAPPDATA}\Microsoft\WindowsApps"
    
  4. Run titan help to get started.

Mac OSX / Linux

The quickest way to install titan is by running the following command:

curl -sf https://install.akoios.com/beta | sh

Alternatively, in case you do not have curl installed:

wget -qO - https://install.akoios.com/beta | sh

By default Titan is installed to /usr/local/bin, to specify a directory use BINDIR, this can be useful in CI where you may not have access to /usr/local/bin. Here’s an example installing to the current directory:

curl -sf https://install.akoios.com/beta | BINDIR=. sh

You can also install Titan by downloading the binary from different platforms from install.akoios.com

After downloading it run the following commands:

chmod +x titan
sudo mv titan /usr/local/bin

Verify the installation

You can verify the installation with:

titan help

Upgrades

Later when you want to update titan to the latest version simply use the following command:

titan upgrade

Troubleshooting

If you hit permission issues, you may need to run the following, as titan is installed to /usr/local/bin/titan by default.

sudo chown -R $(whoami) /usr/local/bin/

Preparing the models

Instrumentalization basics

Instrumentalization is how we call the changes needed in the code of your models so Titan can deploy and scale them. This changes consist in specifying which cells in our Jupyter Notebook you want to transform into an API endpoint.

In order to instrumentalize a model, you must prefix the cell with a single line comment. The comment describes the HTTP method and resource, as in the following Python example:

# GET /hello/world
print("hello world")

The annotation above declares the cell contents as the code to execute when the kernel gateway receives a HTTP GET request for the path /hello/world

It is possible to define as many arbitrary endpoints as desired in a single model.

Available HTTP methods are:

  • GET
  • POST
  • PUT
  • PATCH
  • DELETE

For other languages, the comment prefix may change, but the rest of the annotation remains the same.

Multiple cells may share the same annotation. Their content is concatenated to form a single code segment at runtime.

Getting the request data

Before the gateway invokes an annotated cell, it sets the value of a global notebook variable named REQUEST to a JSON string containing information about the request. You may parse this string to access the request properties.

For example, in Python:

# GET /hello/world
req = json.loads(REQUEST)
# do something with req

You may specify path parameters when registering an endpoint by prepending a : to a path segment. For example, a path with parameters firstName and lastName would be defined as the following in a Python comment:

# GET /hello/:firstName/:lastName</span>

The REQUEST object currently contains the following properties:

  • body - The value of the body, see the Body And Content Type section below

  • args - An object with keys representing query parameter names and their associated values. A query parameter name may be specified multiple times in a valid URL, and so each value is a sequence (e.g., list, array) of strings from the original URL.

  • path - An object of key-value pairs representing path parameters and their values.

  • headers - An object of key-value pairs where a key is a HTTP header name and a value is the HTTP header value. If there are multiple values are specified for a header, the value will be an array.

Request Content-Type and Request Body Processing

If the HTTP request to the kernel gateway has a Content-Type header the value of REQUEST.body may change. Below is the list of outcomes for various mime-types:

  • application/json - The REQUEST.body will be an object of key-value pairs representing the request body

  • multipart/form-data and application/x-www-form-urlencoded - The REQUEST.body will be an object of key-value pairs representing the parameters and their values. Files are currently not supported for multipart/form-data

  • text/plain - The REQUEST.body will be the string value of the body

  • All other types will be sent as strings

Setting the Response Body

The response from an annotated cell may be set in one of two ways:

  1. Writing to stdout in a notebook cell

  2. Emitting output in a notebook cell

The first method is preferred because it is explicit: a cell writes to stdout using the appropriate language statement or function (e.g. Python print, Scala println, R print, etc.). The kernel gateway collects all bytes from kernel stdout and returns the entire byte string verbatim as the response body.

The second approach is used if nothing appears on stdout. This method is dependent upon language semantics, kernel implementation, and library usage. The response body will be the content.data structure in the Jupyter execute_result message.

In both cases, the response defaults to status 200 OK and Content-Type: text/plain if cell execution completes without error. If an error occurs, the status is 500 Internal Server Error. If the HTTP request method is not one supported at the given path, the status is 405 Not Supported. If you wish to return custom status or headers, see the next section.

In addition, it is possible to customize the response by annotating a separate cell with ResponseInfo and the status code, along with any other information such as the mimetype of the response, in a JSON format. This information will be used in the response to the client’s HTTP request.

# ResponseInfo PUT /route
print(json.dumps({
    "status" : status,
    "headers" : {
        "Content-Type" : "application/json"
    }
}))

Swagger Spec

The resource /_api/spec/swagger.json is automatically generated from the notebook used to define the HTTP API. The response is a simple Swagger spec which can be used with the Swagger editor, a Swagger ui, or with any other Swagger-aware tool.

Currently, every response is listed as having a status of 200 OK.

Example

The following example shows how to prepare and expose a function in titan.

It is recommended to prepare a local mock request to make local testing as shown below:

# Mock request object for local API testing
headers = {
    'content-type': 'application/json'
}
# Input data for the function
body = {
    'data': [[2],[5],[10],[20],[25]]
}
REQUEST = json.dumps({ 'headers': headers, body': body })

Once the request is ready, the POST method can be defined:

# POST /prediction
body = json.loads(REQUEST)['body']
response = function_to_be_exposed(body.data)
print(json.dumps(response))

Please note how in this case the response is provided through Python print command:

print(json.dumps(response))

Using Titan

Authentication

In order to start using titan in the cloud, you should authenticate as user by running:

titan login

Please, use the credentials provided by our sales team.

Using Titan

Once your model has been duly instrumentalized, it is posiible to use Titan to deploy and transform it into a ready-to-use REST API.

Using titan is very easy just run the following CLI command from the folder where your model is stored in:

titan deploy

The CLI will give you the option to select any of the available *.ipynb models in the folder.

After that you will be prompted to select the suitable running environment for your model. You can check the details of the available environments in the next section

And that’s it! Titan will manage everything from here and build, upload and deploy the model in the cloud to make it available through the specified endpoints at:

Deployed at: https://services.customer_name.akoios.com/model_name

alt text

For your convenience, a swagger UI interface is available at the aforementioned URL to:

  • Easily check and interact with the defined endpoints
  • View the deployed Notebook
  • Check the source code of the Notebook

alt text

Logging info

You can access to any titan running service logs by running:

titan logs

Command-line help

You can get more information about how to use the command-line tool by running:

titan help
Usage:

  titan [<flags>] <command> [<args> ...]

Flags:

  -h, --help       Output usage information.
  -C, --chdir="."  Change working directory.
  -v, --verbose    Enable verbose log output.
      --version    Show application version.

Commands:

  help                 Show help for a command.
  config               Show current config manifest.
  deploy               Deploy a Jupyter Notebook in the cloud.
  docs                 Open documentation website in a browser.
  login                Authorize user.
  logs                 Inspect service logs.
  open                 Open service endpoint URL in a browser.
  services             Manage services.
  upgrade              Install the latest or specified version of Titan.
  version              Show version.

Examples:

  Deploy the project to the staging environment.
  $ titan

  Deploy the project to the production stage.
  $ titan deploy

  Open documentation in a browser.
  $ titan docs

  Open service endpoint URL in a browser.
  $ titan open

  Tail project logs.
  $ titan logs -f

  Show error or fatal level logs.
  $ titan logs 'error or fatal'

  Show help and examples for a command.
  $ titan help

  Upgrade titan, if a new version is available.
  $ titan upgrade

  Show current titan version.
  $ titan version

Dashboard

Titan Dashboard

Apart from the CLI, Titan provides a web dashboard designed to monitor, control and manage the models and deployments of a Titan installation.

Titan dashboard is not yet publicly available.

Features

As aforementioned, Titan has been designed to boost the productivity in AI/ML projects and to help Data Scientists in their day-to-day tasks. Titan Web Dashboard provides information at three different levels:

General information

In the main view, general information about the current Titan instalaltion is shown:

  • `# of current deployed models
  • `# of current active deployments
  • Actionable list of active deployments (start/stop/restart)

alt text

Model information

The model view provides relevant information regarding a model:

  • Arbitrary model metrics
  • Exposed cells
  • List of model versions
  • Jupyter Notebook viewer

alt text

Deployment information

The deployment view shows parameters regarding a particular model deplyment:

  • Status
  • Uptime
  • Request per hour
  • Environment
  • Processor
  • Hardware
  • Memory

alt text

Environments

Introduction

Titan uses a set of Docker image definitions. The following sections describe these images including their contents, relationships, and versioning strategy.

Base

Base is a small image supporting the options common across all core stacks. It is the basis for all other stacks.

  • Minimally-functional Jupyter Notebook server (e.g., no pandoc for saving notebooks as PDFs)
  • Miniconda Python 3.x in /opt/conda
  • No preinstalled scientific computing packages
  • Unprivileged user jovyan (uid=1000, configurable, see options) in group users (gid=100) with ownership over the /home/jovyan and /opt/conda paths
  • tini as the container entrypoint and a start-notebook.sh script as the default command
  • A start-singleuser.sh script useful for launching containers in JupyterHub
  • A start.sh script useful for running alternative commands in the container (e.g. ipython, jupyter kernelgateway, jupyter lab)
  • Options for a self-signed HTTPS certificate and passwordless sudo

Minimal

Minimal adds command line tools useful when working in Jupyter applications.

  • Includes everything in Base
  • Pandoc and TeX Live for notebook document conversion
  • git, emacs, jed, nano, tzdata, and unzip

Scipy

Scipy includes popular packages from the scientific Python ecosystem.

  • Includes everything in Minimal and its ancestor images
  • pandas, numexpr, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh, sqlalchemy, hdf5, vincent, beautifulsoup, protobuf, and xlrd packages
  • ipywidgets for interactive visualizations in Python notebooks
  • Facets for visualizing machine learning datasets

Datascience

Datascience includes libraries for data analysis from the Julia, Python, and R communities.

  • Includes everything in Scipy and their ancestor images
  • The Julia compiler and base environment
  • IJulia to support Julia code in Jupyter notebooks
  • HDF5, Gadfly, and RDatasets packages

Pyspark

Pyspark includes Python support for Apache Spark, optionally on Mesos.

  • Contains everything in Scipy and its ancestor images
  • Apache Spark with Hadoop binaries
  • Mesos client libraries