I’m not looking to tackle this right away, but I want to support at-rest encryption for PliantDb
. The requirements for me on such a system would be:
- A hard drive containing a
PliantDb
database would not contain enough information to decrypt it (assuming the administrator took care to store the keys in another location). - Documents should be able to be encrypted with a specific key.
- Users should be granted access to be able to use each key.
- The server necessarily needs to have access to the decrypted data to enable the
View
functionality. Thus, it is OK to assume that if the machine is compromised while the server is running, the data is partially accessible. The focus is at-rest encryption.
I want to focus first on a high-level approach – it seems like we’ll want to support multiple algorithms in the long run, so specific algorithms can be picked later.
Key Storage
To ensure a leaked database (or a discarded hard drive) cannot be used on its own to decrypt the stored data, encryption keys should be provided externally. A common solution is vault. Others might have access to something like the parameter store in AWS. Because of this, it seems clear that a trait should abstract access to stored keys.
However, my thoughts of how I would do this securely with a single-server model are tricky. If I simply installed vault
on the server, the secrets would still be stored on the same disk. Additionally, hardware TPM support is lacking in most general-purpose hosting environments.
Thus, in line with some of my own planning, I would like to create a key storage system that works with any S3 compatible API. This would need to store per-server encrypted copies of the key, as it would be critical to be able to remove old keys for decommissioned drives/servers without breaking any current servers.
The last thought on this is that when PliantDb is operating as a cluster, it should be possible to store part of each key on each server, allowing the cluster to operate as a secure key storage. Each cluster node would need to store copies of the encrypted keys for each other node so that any two nodes could begin decrypting data.
Encrypting Data
My thought is to have a feature flag that enables encrypting documents. Document
would gain a key_id
field, and upon reading from storage, the server would check that the current user has access to that key before decrypting the contents and returning it to the caller. Upon saving a document, if a key_id
was present the document would be encrypted before being saved (again after checking permission to access the key).
Internally, ViewEntry
instances will also need to be encrypted and decrypted. However, it will have to be explicitly noted that the Key
data will not be encrypted at rest. These bytes are used as the tree index within sled
, and for sled
to function correctly with ranges and iteration, this data cannot be encrypted. Whole-disk encryption would be the only solution to encrypt this data.
Thinking about how this could grow
Thinking about how you can build upon a new feature is one measurement that can help test how good an idea is. As such, here’s some brainstorming ideas that either tie-in or support this idea:
- General purpose secret storage is something a lot of applications need. If we can provide reliable and secure at-rest encryption,
PliantDb
can act as avault
of sorts. This would enable access to server-side secrets with the existing permissions model. For Cosmic Verge, this could include OAuth secrets – e.g., allowing logging in with Twitch. - Users can be granted permission to create their own keys, allowing user-independent data access.
- A setting could be added to allow encrypting all data by default. It would only affect newly written documents, although a task could be written to encrypt existing documents.
- I already want to add
S3
compatible remote access for easy single-page-app hosting that loads resources from a CDN rather than off the local server. As part of that, I had imagined even allowingPliantDb
to expose its own file-storage API that could use anS3
bucket transparently. In theory, some of this code can be shared for theS3
key storage idea.
Overall, I think this would be a pretty solid addition. Can anyone think of any concerns with this general approach? Ultimately, I’d love to use Custodian as a secure key storage mechanims on the server, but I’m not anticipating having a TPM available.