At-rest encryption

I’m not looking to tackle this right away, but I want to support at-rest encryption for PliantDb. The requirements for me on such a system would be:

  • A hard drive containing a PliantDb database would not contain enough information to decrypt it (assuming the administrator took care to store the keys in another location).
  • Documents should be able to be encrypted with a specific key.
  • Users should be granted access to be able to use each key.
  • The server necessarily needs to have access to the decrypted data to enable the View functionality. Thus, it is OK to assume that if the machine is compromised while the server is running, the data is partially accessible. The focus is at-rest encryption.

I want to focus first on a high-level approach – it seems like we’ll want to support multiple algorithms in the long run, so specific algorithms can be picked later.

Key Storage

To ensure a leaked database (or a discarded hard drive) cannot be used on its own to decrypt the stored data, encryption keys should be provided externally. A common solution is vault. Others might have access to something like the parameter store in AWS. Because of this, it seems clear that a trait should abstract access to stored keys.

However, my thoughts of how I would do this securely with a single-server model are tricky. If I simply installed vault on the server, the secrets would still be stored on the same disk. Additionally, hardware TPM support is lacking in most general-purpose hosting environments.

Thus, in line with some of my own planning, I would like to create a key storage system that works with any S3 compatible API. This would need to store per-server encrypted copies of the key, as it would be critical to be able to remove old keys for decommissioned drives/servers without breaking any current servers.

The last thought on this is that when PliantDb is operating as a cluster, it should be possible to store part of each key on each server, allowing the cluster to operate as a secure key storage. Each cluster node would need to store copies of the encrypted keys for each other node so that any two nodes could begin decrypting data.

Encrypting Data

My thought is to have a feature flag that enables encrypting documents. Document would gain a key_id field, and upon reading from storage, the server would check that the current user has access to that key before decrypting the contents and returning it to the caller. Upon saving a document, if a key_id was present the document would be encrypted before being saved (again after checking permission to access the key).

Internally, ViewEntry instances will also need to be encrypted and decrypted. However, it will have to be explicitly noted that the Key data will not be encrypted at rest. These bytes are used as the tree index within sled, and for sled to function correctly with ranges and iteration, this data cannot be encrypted. Whole-disk encryption would be the only solution to encrypt this data.

Thinking about how this could grow

Thinking about how you can build upon a new feature is one measurement that can help test how good an idea is. As such, here’s some brainstorming ideas that either tie-in or support this idea:

  • General purpose secret storage is something a lot of applications need. If we can provide reliable and secure at-rest encryption, PliantDb can act as a vault of sorts. This would enable access to server-side secrets with the existing permissions model. For Cosmic Verge, this could include OAuth secrets – e.g., allowing logging in with Twitch.
  • Users can be granted permission to create their own keys, allowing user-independent data access.
  • A setting could be added to allow encrypting all data by default. It would only affect newly written documents, although a task could be written to encrypt existing documents.
  • I already want to add S3 compatible remote access for easy single-page-app hosting that loads resources from a CDN rather than off the local server. As part of that, I had imagined even allowing PliantDb to expose its own file-storage API that could use an S3 bucket transparently. In theory, some of this code can be shared for the S3 key storage idea.

Overall, I think this would be a pretty solid addition. Can anyone think of any concerns with this general approach? Ultimately, I’d love to use Custodian as a secure key storage mechanims on the server, but I’m not anticipating having a TPM available.

I guess using the OS key store could still be better then nothing.

Currently AWS (KMS, CloudHSM), Google Cloud (Cloud Key Management) and Azure (Dedicated HSM) all offer support actually, but I don’t know anything outside of those.

We should still use Custodian, because it implements software fallback which we should definitely use, it’s protected at-rest by the OS user password.

I think that sounds like a great idea, especially the cluster idea is amazing :+1:.

External Key Storage > Cluster Key Storage > Local (at minimum protected by the OS key store).

I want to clear up something about vault – it actually is set up exactly as I was describing how I would set this up. The vault is sealed on-disk and during startup, vault must acquire the master key from an external source.

So, if we decided we’d rather integrate with vault in the short term, that’s a viable approach too.

Phew. At that price, I consider it not available. AWS’s current price for CloudHSM in US East (Ohio) is $1.45 an hour, or roughly $1k a month. Edit: Oh, my! Azure’s is $4.85 an hour!!!

But I was more speaking of direct-attached, per-machine hardware TPMs, like the ones in our phones, such that they’d show up to Custodian as a native TPM. Even without the restrictive pricing, I’m not sure that those linked solutions offer that capability. It looks more like a centralized high-volume HSM.

But, I realized something – is there a way to store and retrieve a key with no user interaction from a TPM? If not, then using a TPM on the server isn’t a great idea.

The only concern I have about that is setting up an unattended boot scenario securely. Database servers need to be able to withstand an unexpected reboot without user intervention to allow the server to come back online. Is there actually a software-only way to allow that flow that doesn’t put all the secrets on the hard drive?

This was a misunderstanding on my part, I didn’t get that you are talking about a product, I thought you were talking about a credential vault that we will build.
Using HashiCorp Vault is a good idea for sure.

But the reason why I was proposing to at least use the OS key store is to protect the secret key memory, I have no clue if HashiCorp supports asymmetric encryption without exposing the memory for example.

I’m not sure TPMs in the cloud will be a thing, HSMs is all we get for now. TPMs might be available in bare-metal though.
What’s wrong with remote HSMs though? They might not be ideal for latency sensitive-data retrieval, but they might still be used to store the master key for example.

TPMs don’t require user interaction on Linux. Windows doesn’t either, I didn’t try macOS yet.

I’m not sure how cloud-init works exactly and where it is stored, but potentially secrets could be stored in there.
I can’t think of any other way honestly. Unless we work with remote encryption like a HSM, which has the problems we already pointed out above.