Pixaven Blog

Ephemeral Storage Solution with Node and Redis

Building ephemeral storage solution with Node and Redis

As software engineers we want to automate most of the processes and tasks we have at hand. We aim to build robust solutions that (in theory) should require little to no mainenance efforts and “just work”. Let’s see how we can apply those principles to automatic object expiration and explore different angles to approach the problem of ephemeral object storage.

Ephemerality vs. volatility

Ephemerality is often mistaken for volatility. In computer science, the former means that we are in control of the TTL of storage objects, database records or any other data format that we want to expire in the future. We explicitly set the TTL values on those objects and expect them to expire when the time comes. It guarantees the data is there when we need it as long as the TTL set on the data itself is in the future.

Volatility usually applies to the data that can disappear at any time. Applications making use of volatile data are prepared for the scenario when the requested data is gone. A good example of this is cache. It can be evicted by the operating system whenever it needs more resources or explicitly invalidated by the user. If the data in the cache is gone, application simply fetches fresh data from the source.

"Ephemerality (from Greek εφήμερος – ephemeros, literally 'lasting only one day') is the concept of things being transitory, existing only briefly. Typically the term ephemeral is used to describe objects found in nature, although it can describe a wide range of things, including human artifacts intentionally made to last for only a temporary period, in order to increase their perceived aesthetic value."
— Wikipedia

Naïve approach to automatic object removal

Now that we know what ephemerality is and how it differs from volatility let’s have a quick look at an ingenuous approach that comes to mind when dealing with automatic object removal. Let’s assume we would like to remove directories with creation date older than one hour. We could setup a system cron job to fire every minute or so and execute the following command:

find . -type d -path "./*" -mmin +60 -exec rm -rf {} \;

That will work for directories with a few thousand objects to remove. If in the future our storage expands to hundreds of thousands of files rm -rf will throw Error: Too many open files. Even after optimizing the command to remove 16 files in one go (below), the call is still suboptimal as find has to recursively traverse the directory tree and essentialy touch every single object to check its eglibility for removal even if the object was created just a few seconds ago.

find . -type d -mmin +60 -prune -print0 | xargs -n 16 -0 rm -rf

What about having something to tells us when to remove a given object instead of inefficiently asking all the entries “are you ready to be removed”?

Applying TTL to files with Redis

If you are unfamiliar with Redis, it is an amazing piece of technology written by Salvatore Sanfilippo, released under BSD license and used by virtually every IT company on this planet. It is an in-memory key-value data store that (unlike Memcached for example) supports multiple data types such as strings, lists, sets or hashes.

What is interesting for us is that Redis also supports automatic key expiration. Any key, of any type, can be scheduled for expiration by using either expire or expireat commands. The former expects seconds (to live) as it’s argument while the latter takes an absolute Unix timestamp (seconds since January 1, 1970).

Redis strings are binary safe which means that we theoretically could use Redis to store binary data, such as a JPEG or animated GIF file and set TTL on those keys and expire them by setting a TTL. Storing binary data (blobs) in any database is usually not the best idea. While there are databases specifically built for blobs, such as MongoDB GridFS, binary data should be stored on disks and only its location referenced in the database.

If we do not store our files in Redis, what do we need Redis for? We will use Redis in a very smart way by utilizing its Keyspace Notifications. It is a mechanism built into Redis that notifies subscribers when certain events in a keyspace occur.

"Keyspace notifications allow clients to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way. Events are delivered using the normal Pub/Sub layer of Redis, so clients implementing Pub/Sub are able to use this feature without modifications."
— Redis docs on Keyspace Notifications

Enabling Keyspace Notifications

Keyspace Notifications in Redis are disabled by default. To turn them on, we need to add one line to the Redis server config file or issue a CONFIG SET command in the CLI. What is very convenient, is that Redis allows us to specify what type of events to send. Enabling all possible events to be broadcasted by Redis server would be a complete waste of CPU time.

Every character in the config string for notify-keyspace-events has a special meaning:

K     Keyspace events, published with __keyspace@<db>__ prefix.
E     Keyevent events, published with __keyevent@<db>__ prefix.
g     Generic commands (non-type specific) like DEL, EXPIRE, RENAME, ...
$     String commands
l     List commands
s     Set commands
h     Hash commands
z     Sorted set commands
x     Expired events (events generated every time a key expires)
e     Evicted events (events generated when a key is evicted for maxmemory)
A     Alias for g$lshzxe, so that the "AKE" string means all the events.

Since we are only interested in enabling Keyevent events (E) and Expired events (x) we set Ex (as per the table above) as the value of our new config entry, such as:

notify-keyspace-events "Ex"

Or, with Redis-CLI, the command would be as follows:

redis-cli config set notify-keyspace-events Ex

Real world example

Armed with the knowledge about Keyspace Events, we can finally move over to the meaty part and get our hands dirty with some code. The natural choice of programming language for us at Pixaven is JavaScript, so we will build a small Node application to illustrate the proposed solution.

The moment a new file is stored on disk, we need to save the location of the file in Redis with a given TTL, say an hour. We also need to subscribe to the Keyspace Notifications in order to intercept events raised by Redis server when keys expire. When the event is raised, we parse the key name and remove the file from filesystem. The first part of the task is pretty straightforward:

const Redis = require("ioredis");

const client = new Redis({
    // redis client config
});

const location = "/srv/storage/files/image.jpg";

client.setex(location, 3600, "", (err) => {
    if (err) {
        logger.error(err);
    }
});

We have just saved a (dummy) location of a file in Redis, set its TTL to 3600 seconds and provided an empty string as a value of newly created key. We could save an additional data about the file in the value field (for example file’s metadata for further usage) but Keyspace Events only operate on keys and we will not have access to the value when the event is raised.

Redis will broadcast keyspace events on a standard Pub/Sub layer so we have to subscribe to a specific channel in order to intercept those messages. In our case the channel name will have pattern of "__key*__:*". In order to subscribe to a pattern we use PSUBSCRIBE command:

const fs = require("fs-extra");
const gfs = require("graceful-fs");
const path = require("path");
const Redis = require("ioredis");

gfs.gracefulify(fs);

const subscriber = new Redis({
    // redis client config
});

subscriber.psubscribe("__key*__:*", (err) => {
    if (err) {
        return console.error(err);
    }
});

subscriber.on("pmessage", (pattern, channel, message) => {
    if (!message.startsWith("/")) {
        return;
    }

    const filePath = message;

    fs.remove(path.dirname(filePath), (err) => {
        if (err) {
            return console.error(err);
        }
    });
});

As you may have noticed, we instantiated a new Redis client just for Pub/Sub. That is because when a client issues a SUBSCRIBE or PSUBSCRIBE command, that connection is put into a “subscriber” mode and at that point, only commands that modify the subscription set are valid. We have also used fs-extra module for its syntactic sugar and patched it with graceful-fs in order for it to become more resilient to errors.

Using a dedicated Redis DB

The code above is a bare minimum to lift our solution off the ground. But it can be optimized a bit with a use of a dedicated Redis database. The subscriber from the code above will intercept all events generated every time a key expires, so also those not related to our storage system (like for example expiring client session information).

A default Redis instance supports 16 databases and each database provides a distinct keyspace, independent from the others. We can use that information to set a dedicated database just for the needs of our ephemeral storage solution. The if statement if (!message.startsWith("/")) will become obsolete as we will soon be sure that we are dealing only with expired files.

Our client instantiation will become:

const subscriber = new Redis({
    db: 1 // select DB 1
});

and Pub/Sub listener will look like so:

subscriber.on("pmessage", (pattern, channel, file) => {
    fs.remove(path.dirname(file), (err) => {
        if (err) {
            return console.error(err);
        }
    });
});

Bonus round: Improving expire accuracy

Redis docs are pretty clear on the TTL accuracy and that is an impressive 0 to 1 milliseconds. However, that value can increase when dealing with dozens of millions of keys that are not frequently targeted by any command. We can solve this issue by using a fast, non-blocking SCAN command to iterate over all the keys in our dataset. By issuing a simple GET command against the keys in the stream we will force Redis to refresh the TTL on keys more frequently. However, if you do not need a millisecond precision on your new ephemeral storage system, I would suggest sticking to the default TTL management.

const scanStream = client.scanStream();

scanStream.on("data", (keys) => {
    scanStream.pause();

    Promise.all(keys.map((key) => {
        return client.get(key);
    })).then(() => {
        scanStream.resume();
    }).catch((e) => {
        return console.error(e);
    });
});

Conclusion

In this tutorial, you learned how to enable and leverage Redis Keyspace Events for expiring files from the filesystem. Based on this knowledge you can now build robust event systems (not only for managing files) with fine grained time control and resolution as low as one millisecond.

About the author

Przemek Matylla is the Founder and CEO of Pixaven currently living in Berlin. You can follow him on Twitter or connect with him on LinkedIn.