Storage Providers Overview

Plamen
Plamen
  • Updated

Overview

This document gives an overview of different storage types used for the Version Control, Hybrid DAM, Shared Content Metadata, and Web Catalogs features in Connecter. It covers these storage providers:

Different storage types are designed to fit different needs, so there are significant differences in how they operate. Each storage has a storage model. Some storage providers have hierarchical models based on containers (folders) and files. Others have a flat model that allows users to mimic a hierarchical model.

Based on the needs that storage is designed to satisfy, we will define two types of storage:

  • End-user targeting cloud storage – designed to be used by end-users.
  • Application targeting cloud storage – designed to be used by applications.

End-user targeting cloud storage

Storages like Dropbox, OneDrive, and Google Drive are created to fit the needs of end-users. We'll call them end-user cloud storage. Their storage model is hierarchical, closer to that of the file system that uses folders as containers. They also provide applications that allow syncing of user data from their devices (desktop PCs, laptops, tablets, etc.). 

Essentially they're designed so that people can upload, download and work with files in their day-to-day activities. They can operate in both home and work environments. These storage types can be used by applications also, but their main goal is to be used by people.

Applications targeting cloud storage

Amazon S3, Google Object Storage, and Azure Blob Storages are designed to be used by applications. We'll call them applications targeting cloud storage. The end-user can't use them directly (there are workarounds, but at this point, they suck). Applications use them for various needs like storing large files, storing images that can be linked from web applications, etc.

They're designed to store binary blobs (files) of data. This data is usually organized in containers called buckets. Each bucket can contain large amounts of data. Their data model is a flat structure: You create a bucket and the bucket stores files. These files have names (also called keys) and may have a unique id. 

Using only names to reference these blobs will make it difficult to store them without name collisions. To mitigate this issue, a blob has a path (prefix + name) instead of only a name. This essentially mimics the hierarchical file system. The difference is that there aren't any folders used as containers. Buckets are the only containers and the only way to find a file by its path or unique id, if available.

Storages

Folder on the file system

Folders on the file system have a well-known hierarchical structure based on directories/folders and files. Each file has a location composed of the directory path and its filename. This path retrieves files. Connecter can use the file system to store Version Control data. 

The file system can be located on one or more devices: dedicated file servers, NAS, etc. The user must specify a root folder that Connecter will use as a repository to store the data.

It's assumed that this file system is on-premises and is managed and maintained by the client. This storage isn't accessible on the internet, so it can't be used by web applications to access and distribute files. They can be made available, but it's a huge hassle, so we assume it can't be.

Dropbox

Dropbox is an end-user cloud application that started as a simple service to allow users to upload files from the file system on one PC to the Dropbox cloud and sync these files to other PCs. It is created to precisely mimic the file system's file and folder model. All files and folders are stored under a Root Dropbox Folder.

The Dropbox API allows files to be retrieved by their path, shared, used as links on web pages, and downloaded.

Amazon S3

Amazon S3 is an application targeting cloud storage. It uses the concepts of objects and buckets. A bucket is a container for objects. An object is a file and any metadata that describes that file. Each object has a key associated with it. The keys can be formed in a path-like way to mimic the hierarchical ordering of objects. More on this can be found here.

Additional resources:

Google

Google Drive

Google Drive organizes files in collections, describes files by types, and provides specific attributes for each file to facilitate file manipulation. The types of files are Blob, Folder, Shortcut, Third-party shortcut, and Google Workspace document. Everything in G-Drive is basically a file, even folders. All files have a unique ID associated with them.

One difference in Drive's API is that files don't have a location or path associated with them. Developers can't retrieve files by their path. Files are retrieved by either FileID or by searching with various parameters. If someone needs to get a file from Drive by name, a search must be performed. The search allows parameters like FileName and FolderID of the file folder.

Additional resources:

Object Storage

Object Storage is an application targeting cloud storage designed to store large amounts of files. It's the same as Azure Blob Storage and Amazon S3.

Additional resources:

Microsoft

OneDrive

OneDrive is end-user cloud storage similar to Dropbox. It's pre-installed with Windows and has native integration with it. It mimics the file system's structure with files and folders and allows for synchronizing files across devices and PCs. 

Each Microsoft account has 5 GB of free space. All files and folders are stored under a Root OneDrive Folder. Smaller teams will probably use OneDrive. Larger companies that will save significant amounts of data will probably use Azure Blob Storage.

Azure Blob Storage

Blob Storage is an application targeting cloud storage designed to store large amounts of files. It's the same as Google's Object Storage and Amazon S3.

Additional Resources:

How applications store files

Let's see how applications can store files on a storage provider. This depends on the type of storage provider and how users can give access to applications for their data.

Storing data in end-user targeting cloud storage

There are three permission levels for applications that these storages provide: 

  • Application-specific folder.
  • All data.
  • Folder.

The different storage providers support different permission levels or have slightly different definitions of them. The main goal is to limit what an application can access depending on its needs. 

The application-specific folder and folder permissions allow an application to store data isolated from the other data that a user has and from other applications. This is what we want in Connecter, so we will concentrate on these application levels.

Dropbox

Dropbox supports application-specific folder permissions levels, and Connecter uses it. This means that Connecter will have its own folder in the user's Dropbox and will store all data there.

Google Drive

Google Drive has a different meaning for application-specific folder permissions. These permissions give access to an application's own specific folder, but this folder isn't meant to hold large amounts of data. 

The application should store only configuration data in this folder.

Drive supports the folder permissions level to allow applications to access only specific folders that the user has created and selected. This is the permissions level that Connecter will use for Google Drive.

OneDrive

OneDrive also supports application-specific folder permissions levels, and Connecter stores data there.

Storing data in an application targeting cloud storage

These storages allow the user to set up access to one or more buckets that an application can use. We can say that an application has bucket-level permissions.

Integrating with storage providers

For a web or desktop application or web service to store files on storage that a user or organization owns, it needs to be given access to it. There are several ways this can be achieved, and it varies with storage providers. In general, applications can either have their own application (or service) account or act on behalf of (also called impersonating) a user, i.e., using a user account.

A system administrator will give the account used by the application permissions to access storage or a specific part (bucket, container, folder, etc.) of storage. After that, the application can access this storage using the credentials provided for an application account or an access token when authorized by a user account. The goal of using an access token is to hide third-party applications' actual credentials.

End-user targeting storages tend to use the approach of impersonating the user. Application targeting storage may support more than one approach, depending on the specific use case. Some storage providers require an application or project to be created and validated/published. 

For end-user targeting storage types, when an application requires access to the user's files, information like the name, publisher, website, what the application can access, etc., will be shown. This way, the users can decide if they want to use the application. 

A system administrator can use this registration as an application account for some application targeting storage services and give it permissions.

Integration with Google

Connecter uses workload identity federation to integrate with Google. 

Resources:

Amazon S3

Amazon has its own AWS Identity and Access Management (IAM). An admin account can create users and roles and give them permissions by using policies. An AWS user can represent an application account or a user account for a person.

Resources:

Configuring storages

The user will need to configure each storage. Some storages are straightforward to set up, while others are trickier. Permissions are always an issue with cloud storage services.

Folder on the file system

The only thing needed here is a folder path, so it's easy to set up this one. Read about local storage setup.

Dropbox

Connecter uses an App-Specific Folder to store data. A user needs to authenticate Connecter to access their Dropbox. Dropbox uses OAuth 2.0, so Connecter will use an access token and a refresh token. The setup is more straightforward as it only requires the user to give Connecter access. Read about Dropbox storage setup.

Amazon S3

Read about Amazon S3 storage setup.

Google Drive

For Google Drive, the users must create and select a folder that Connecter will use to store data. Hence, the setup has the additional step of selecting a folder once the user authenticates Connecter. Read about Google Drive storage setup.

Google Object Storage

Read about Google Object storage setup.

Additional resources:

OneDrive

Connecter uses an App-Specific Folder to store data. OneDrive also uses OAuth 2.0, just like Dropbox. The setup is more straightforward, too, as it also only requires the user to give access to Connecter. Read about OneDrive storage setup.

Azure Blob Storage

Read about Azure Blob Store storage setup.

Performing changes to storage providers or deleting data

Changing or deleting storage is difficult to perform. These operations require coordination between the people managing the storage and the users that use Connecter. It's best to stop all running Connecter instances before making any changes, but this isn't always possible. Once the storage admin starts to work, users will lose access to the storage, so errors will appear in Connecter.

Changing a storage provider

The main steps to perform when changing storage are:

  1. Stop all running Connecter instances if possible.
  2. Disable access to the storage. How this is achieved depends on the provider used. If there are any running instances of Connecter, all operations will start to fail.
  3. Transfer data from the old storage to the new one. This is the tricky part. This is up to the sysadmins to perform and will depend on the types of source storage and target storage. Because we do use the same schema, the users don't have to perform any renames of folders or buckets.
  4. Update or change the storage provider configuration in the team portal.
  5. Restart all running Connecter instances so they can pick up the new credentials.

Because of the differences in the storage models and retrieval options provided by their APIs, changing some storage providers to others is more complicated.

Deleting data from storage

Data can be deleted manually by the user or sysadmin. At this point, we don't support deleting data from the Teamwork web server, but we will add it in the future.

The main steps for manually deleting data are:

  1. Stop all running Connecter instances if possible.
  2. Disable access to the storage. How this is achieved depends on the provider used. If there are any running instances of Connecter, all operations will start to fail.
  3. Go to the teamwork portal to find out where the data is stored.
  4. Delete the data.
  5. Delete the storage provider from Connecter.