The Central Go Modules Repository

To use GoCenter:
export GOPROXY=https://gocenter.io
0
Stars
UNKNOWN
License
2
Downloads
January 1st 0001
Last Modified
Version:
Loading...

Tundra

Introduction

Tundra is the initial implementation of an object store for the dovecot imap server. This object store makes it possible to move local email files to remote object storage such as Amazon S3 and fetch them on demand when requested by the imap/pop servers; all transparently to the clients and to dovecot. The implementation is pretty simple and takes advantage of various known preconditions within dovecot.

Preconditions

The Tundra implementation relies on certain attributes, or preconditions to function.

First; dovecot must be configured to use the sdbox storage format so that each email is stored as an immutable single file in either the primary mail storage location or the alternative storage location.

Second; that all the email file open() calls in the dovecot source code can be readily and comprehensively identified. Fortunately it appears that there is a single piece of code in src/lib-storage/index/dbox-common/dbox-file.c where these files are opened for reading.

Third; that the expectations of the upper layers when getting a file-descriptor from dbox-file.c are minimal. Specifically that they do not expect to do anything more than seek and read the content with no other expectations such as what an fstat() call might return. See the Seekability section for more details.

The final pre-condition is reliance on the fact that dovecot always tries to open the email file from the primary storage location first and only if that open fails does it attempt to open the same named email file from the alternative storage location. This ‘open order guarantee’ makes it easy to safely replace email files at any time by simply ensuring the primary is only removed after the alternate file is created. Or if the alternate is removed, that the primary is created first.

Other important considerations

Seekability

Initially our limited POC suggested that dovecot could survive with a non-seekable file-descriptor which simplified the implementation considerably. Unfortunately, not so.

It turns out that while the dovecot pop server only requires a non-seekable mail file-description the imap server requires a seekable one. For reasons unknown it wants to seek backwards after reading part of a message. This is unfortunate as it means that amoProxy has to support the notion of fully caching the remote object as a local file before passing the fd back to the client. This complicates amoProxy significant, incurs a fairly serious performance impact to the local file system and increases latency before the imap code receives the first byte of the message.

The net result is that amoProxy has to manage a local file system cache of recently fetched objects.

Dovecot code-base risk

Anyone who has examined the dovecot code-base knows that it is huge, brittle, very vulnerable to change and/or refactoring, poorly documented and thus largely inscrutable.

The intent is not to criticize dovecot, after all, it is a very good imap server and such beasts are few and far between. And it is open-source and free to use by anyone.

No, the intent is merely to recognize that adding code deep into dovecot is risky. Unless you are prepared to invest in a long-term deep-dive into the huge code-base you may well end up having no clue what you are doing or why your code is failing. An examination of the main dovecot mailing list confirms that very few if anyone outside the original authors ever make any code changes. The collective hive-mind has spoken.

This code-base risk is the reason why this approach has been chosen because it allows for minimal changes to the code based with no internal knowledge of dovecot program structure needed.

Performance and Costs

This document does not discuss performance and cost implications. Obviously there are many, but it’s out of scope for this document. Please see the project and deployment documentation for details on that front.

Implementation

The Tundra system takes advantage of the aforemention preconditions and makes a significant effort to minimize dovecot code changes due to the code-base risk. There are two components to the Tundra implementation which interact with dovecot: amoClient the shim-library and amoProxy. amoClient detects object store files and sends fetch requests to amoProxy.

Component Relationship Diagram

Component Relationship

Mail File Open Sequence

  1. The open() calls in dbox-file.c are replaced with calls to a shim-library
  2. If the shim-library determines an email file represents a remote object it sends a fetch request to an on-system amoProxy daemon
  3. amoProxy manages the fetch of the remote object and fd-passes the read fd of a pipe back to the shim-library
  4. The shim-library passes the read fd back to dovecot as the result of the replacement open() call
  5. amoProxy writes the incoming remote object to the write fd side of the pipe
  6. dovecot reads the read fd of the pipe as if its a local mail file

The main advantage of this approach is that most of the intelligence is in amoProxy rather than modified dovecot code. The second advantage is that it’s a very minor code change to dovecot which can easily be applied to newer versions with minimal effort.

Perhaps most importantly, the final advantage is that if there are any debugging issues with dovecot then by the simple expedient of making the relevant mail files local, you are back to a situation where you can run a completely standard dovecot and completely eliminate amoProxy from the debugging equation.

But how does the shim-library determine whether an email file represents a remote object or not?

Symlinks to the rescue

When it is determined that a local email file is suitable for remote storage, the file is first copied to the remote object store, a symlink with the remote URL is then created in the alternate storage location and finally the original local email file in the primary storage is deleted. This migration sequence relies on the ‘open order guarantee’ mentioned earlier.

With that in mind, all the shim-library has to do on each open request is detect whether the path is a suitable symlink. If so, send a fetch request with the embedded URL to amoProxy and expect a readable pipe file-descriptor in return. If not, open it as usual and return the opened file-descriptor to the caller.

If you’re wondering about embedding URLs in symlinks; wonder no more. No Unix-like operating system cares about the destination data of a symlink. Sure it may try to open it if you ask, but otherwise a symlink is nothing more than a container of bytes that can have anything in them. Importantly that container of bytes is embedded in the parent directory and takes up no additional file-system space.

If you want to create “URL-type” symlinks for yourself, try this sequence of shell commands:

$ ln -s 'http://www.yahoo.com' myurl

$ ls -l myurl 
lrwxr-xr-x  1 markd  staff  20  3 Mar 18:54 myurl -> http://www.yahoo.com

$ echo My URL link is `readlink myurl`

$ curl `readlink myurl`

Hopefully you can see that a symlink can contain anything we want it to contain.

Symbolic Link Format

The object store symlinks are not just raw URLs as shown above, rather they have a structure which wraps the various functions and helps protect against mis-detecting real symlinks as object store symlink.

Here’s an example of a mailbox folder where some messages have been relocated to a remote object on AWS S3 while other messages are still stored as local files.

$ ls -l u.*
lrwxr-xr-x  1 markd  staff   23 16 Jan 16:17 u.1 -> amo:s:s3://amob1/u1/u.1
lrwxr-xr-x  1 markd  staff   25 21 Jan 08:38 u.154 -> amo:s:s3://amob1/u1/u.154
-rw-r--r--  1 markd  staff  840 19 Jan 16:30 u.2
lrwxr-xr-x  1 markd  staff   23 21 Jan 08:10 u.3 -> amo:s:s3://amob1/u1/u.3
-rw-r--r--  1 markd  staff 4122 03 Feb 13:45 u.36

Symbolic Link Content semantics

The symbolic link consists of three components separated by a colon. These three components are “magic pattern”, “URL Type” and “URL” respectively. In the above examples it can be seen that ‘amo’ is the magic pattern to identify object store symbolic links.

The second component in the example is ’s’ which indicates that the “URL Type” is an AWS S3 URL with the third component being the actually S3 URL.

Other “URL Types” are “/” for file system with the “URL” being a Unix path and “a” for Azure storage with the “URL” being an Azure blob store URL. Note that Azure fetching has not been implemented.

Future Considerations

It may be that we want to embed some other attributes in the symbolic link, such as message size, compression algorithm, encryption key index and so on. If so, it might be better to either reserve some positional colon parameters prior to the “URL” before the format is set in stone.