When I started working with Kubernetes and Infrastructure as Code, I quickly found out that I needed a secrets management solution, but when I googled around there didn’t seem to be a solid consensus on a best practice approach that could be universally applied to all situations. So, earlier this year I set a goal for myself to discover what application and infrastructure secret management solutions exist, come up with which one I thought was best and develop a working mastery of it. While pursuing this goal, I came to the conclusion that HashiCorp Vault is overhyped and Mozilla SOPS with KMS and Git is massively underrated.

I think SOPS is underrated for two main reasons:

  1. The majority of people don’t have a technical to layman’s-terms understanding of what a Cloud KMS is or does.
  2. Those same people don’t realize that although in the past there was no safe way to store encrypted secrets in Git, Cryptography has evolved since then, and now safely storing encrypted secrets in Git is possible thanks to Cloud Provider KMS Solutions such as AWS KMS, Azure KeyVault, or GCP KMS.

Most of Vault’s hype is warranted as for decades there were no good secrets management solutions, and then here comes Vault from the makers of Terraform, with built-in secrets rotation, actively maintained over time, with great docs, support, and a community, and Vault was the only* solution that met my requirements for what the ideal secrets management solution looks like. I say Vault’s overhyped because I often see it recommended as the gold standard cure all that should be applied to all secrets management scenarios.

Ideal Secrets Management Solution Requirements

  1. Works universally (any cloud and on-prem)
  2. Integrates nicely with any tech stack via REST API or platform independent CLI binaries. (Bonus if it has smooth integrations with Terraform, Ansible, Kubernetes, and CICD Pipelines)
  3. Future Proof
    1. Open Source/Free (No risk of disappearing service offerings or price hikes)
    2. Large Community or a history of being maintained over time (Don’t want abandonware, unless it’s could be timeless/feature complete abandonware like Unix Utilities)
    3. Scales well and offers High Availability
  4. Truly secure (I should be able to convince any security head that it’s bulletproof enough to pass a security audit.)
    1. Encryption at Rest
    2. Encryption in Transit
    3. Access should be revocable
    4. Vulnerabilities should be pre-researched and countermeasures should be applied.
  5. Support for Granular ACLs + Dev secret creation self-service options
    1. Devs should be able to manage dev secrets but not prod secrets.
    2. Ops should be able to manage dev and prod secrets.
    3. Project level isolation: Ops of project A, shouldn’t see project B’s prod secrets.
  6. Versioned secrets (Can assist with staging and automating, deployments, rollbacks, and supporting technical debt scenarios where secrets and configuration are intertwined in config files and database connection strings.)

*Note: Mozilla SOPS also met my requirements, but I didn’t realize it at the time because I originally thought there was no safe way to do git encrypted secrets.

Security Challenges with storing secrets in a git repo

  • Many tools involve storing the decryption key in the user’s home directory or keyring, which leads to the encrypted data and the key being on the same machine.
  • In that scenario compromised decryption keys are a statistical inevitability (Vulnerabilities multiplied by clones of the repo multiplied by time)
  • It’s impossible to revoke a leaked decryption key. If you’re worried a decryption key could have been compromised, but the probability that it was compromised is low, revoking the key isn’t an option, due to git’s distributed history. Even if you could purge the history of the git server and re-encrypt all the secrets with new encryption keys, there would still exist a historic clone of the repo that could be decrypted with the old key.
  • If a compromise is suspected the only viable countermeasure is to rotate all the credentials, which is an expensive operation that management usually isn’t willing to back on a hunch.
  • Some of the git encryption tools are footgun solutions: Run command to decrypt secret, then forget to encrypt it before pushing it up to the repo.

Whenever I found a secrets management solution I noticed I could group it into 4 main categories:

  1. Specific to a single cloud provider (I dismissed these for reasons 1 and 3)
  2. Specific to a single tech stack (Ansible, Chef, Puppet, Salt, Jenkins) (I dismissed these for reasons 2 and 5)
  3. Encrypted Git Repo (I dismissed these for reasons 4 and 5)
  4. Roll your own Secrets Management Service (There were a few potentially viable options, but each introduced it’s own complexity so it made sense to focus on one. Hashicorp’s Vault was the clear winner given its number of features, documentation, big community, and track record for long term support and development.)

With my analysis complete, I spent a month of spare time working on a Vault Server for storing static secrets to help me gain a working mastery of Vault, I wanted it to be secure, easy to maintain, and easy to use. I did my best to achieve this by enabling TLS, adding Vault Configuration, Roles, Policies, and Kubernetes Infrastructure as Code for a highly available Vault/Consul Cluster to a git repo, using KMS auto unseal, writing good readme documentation, enabling versioned key-value store, LDAP authentication, the web GUI, and a third party desktop GUI called Cryptr by Adobe.

While learning Vault I noticed many drawbacks to its usage:

  1. Vault still needs a place to store its secrets. (Where does Vault store its Infrastructure as code secrets? HTTPS cert and IAM password for KMS Auto Unseal)
  2. Vault’s very expensive in more ways than one. (You have to pay for infrastructure and storage. It’s not simple enough that you could set it up from scratch, write a readme, and train a few people on how to use it within an hour, using Infrastructure as Code and a premade readme in a git repo can help, but even then there’s still a lot to learn. Ops will need to spend time maintaining it with backups, upgrades, and monitoring. Devs need to spend time writing custom wrapper scripts to authenticate and pull the desired data.)
  3. Vault makes life harder for people who need to store secrets, so they’ll avoid using it, which hurts its goal of being a central secrets repo. (Devs need to learn several new commands to interface with Vault or rely on slow Vault GUIs. The majority of preexisting tools are designed to interface with files on a file system. So using tools like vimdiff now require extra steps of logging in, fetching the secret, converting it to a file, and removing the file when done.)
  4. The default implementation has a security vulnerability that’s expensive to secure. (If someone gets root access to a Vault Server, they can get the master decryption key by doing a memory dump. Hosting Vault on Kubernetes or Cloud VMs leads to more opportunities to get root access. In order to fully mitigate the risk of root access, you’d need to provision machines with Intel Software Guard Extensions, and run your Vault Servers on those in SCONE Security Enclaves (containers running in encrypted RAM). Adding these will add more infrastructure and research costs. Twistlock, Aqua, or SysDig are alternative options for partially mitigating this risk.)

Given these drawbacks, I decided to dive deeper and research further, that research lead me to Soluto’s Kamus where I was introduced to 2 cool concepts: GitOps and zero-trust secrets encryption. That got me leaping through a rabbithole of encryption techniques. At the end of the journey I came up with the following mental schema.

Abridged Evolution of Cryptography

1.) Symmetric Encryption Keys:

  • Long password is used for both encryption and decryption.

2.) Asymmetric Encryption Public-Private Key Pairs:

  • Public key encrypts data, private key decrypts data encrypted with the public key.

3.) HSMs (Hardware Security Modules):

  • Make it so the private key doesn’t get leaked.
  • HSMs are expensive.
  • HSMs are not user or automation friendly.

4.) Cloud KMSs (Key Management Services):

  • KMS is a trusted service that encrypts and decrypts data on behalf of clients, it basically allows a user or machine to encrypt and decrypt data using their identity instead of encryption/decryption keys. (A client authenticates against a KMS, which checks their identity against an ACL, if they have decryption rights, the client can send encrypted data in a request to the KMS, which will then decrypt the data on behalf of the client, and send the decrypted data back to the client over a secure TLS tunnel.)
  • KMSs are cheap.
  • KMSs are exposed via REST API, which makes them user and automation friendly.
  • KMSs are extremely secure, they make it feasible to go a decade without leaking decryption keys.
    • KMS encryption technique’s invention introduced 3 killer pieces of functionality:
      1. When responding to a known breach:
        Before KMS decryption keys would get leaked: you can’t revoke a decryption key, which means you’d need to rotate several decryption keys, re-encrypt all data with the new keys, and try your best to purge old encrypted data. While doing all of this you’ll need to fight with management to get approval to cause downtime to several production systems, minimize said downtime, and even if you do everything right, you may be unable to completely purge the old encrypted data, like in the case of git history, and backups.
        After KMS it’s identity credentials that get leaked: Identity credentials can be revoked, when they’re revoked they’re worthless. The nightmare of re-encrypting data and purging old encrypted data goes away. You still need to rotate secrets (identity credentials vs decryption key), but the act of rotation becomes cheap enough that it can be automated and scheduled as a preventative measure.
      2. Management of encrypted data shifts from an impossible task involving distributed decryption keys, to a trivial task of managing a centralized ACL. It now becomes possible to easily revoke, edit, and assign granular access to encrypted data; and as a bonus since Cloud KMS, IAM, and SSO Identity Federations integrate together, you can leverage preexisting user identities.
      3. Crypto Anchoring techniques become possible:
        • Network ACLs can be applied to KMS to make it so data can only be decrypted in your environment.
        • KMS decryption rates can be monitored for a baseline, when an anomalous rate occurs, alerts and rate limiting can be triggered.
    • KMS’s decryption keys can be secured by an HSM.
    • Opportunities for decryption keys to get leaked are near zero because clients don’t interact directly with decryption keys.
    • Cloud Providers can afford to hire the best security professionals and implement expensive operational processes that are required to keep the backend systems as secure as possible, so backend key leakage opportunities are also near zero.

My new understanding of advanced encryption techniques lead me to realize that KMS could be leveraged to prevent decryption keys from being leaked. That plus the ability to revoke decryption rights without needing to make any changes to encrypted files makes truly secure encrypted files in Git a reality. I revisited a few Git based encryption solutions I’d previously dismissed and discovered that Mozilla SOPS satisfied all of my criteria for an ideal secrets management solution. It also integrates well with CICD pipeline tools: There’s a SOPS Terraform Provider, Helm Secrets is just a wrapper for SOPS, and you can always fallback to:

Bash# sops --decrypt mysecret.yaml | kubectl apply -f -

(where kubectl could have been any CLI Utility that accepts standard input (-))

SOPS has none of the drawbacks of other Git based encryption solutions:

One of the footguns in other Git based encryption solutions was that someone could accidentally push a decrypted secret to the git repo. With SOPS when you want to edit a file, the file stays encrypted on disk, gets decrypted in RAM where you can edit it with vim, and when you save the edited file it gets re-encrypted before being written to disk. At the same time, it does offer the flexibility to quickly decrypt a few files so you can use a tool like vimdiff.

SOPS has none of the drawbacks of Vault:

It doesn’t require infrastructure and it’s as cheap as KMS. You could easily set it up, train a few people, and write a readme file within an hour, here’s an example of how easy it is to setup and use:

Bash# aws kms create-key --description "Mozilla SOPS” | grep Arn
"Arn": "arn:aws:kms:us-east-1:020522090443:key/4882a19d-5a98-40ae-a1ad-a60423afbddb",
Bash# cd $repo
Bash# vim .sops.yaml

(Create a file named .sops.yaml, with the following 2 lines of text)

creation_rules:
- kms: 'arn:aws:kms:us-east-1:020522090443:key/4882a19d-5a98-40ae-a1ad-a60423afbddb'
Bash# sops mysecret.yaml

This will open the vim editor so you can type what you want to store in the secret. This simple command is used to both create and edit files.

Bash# cat mysecret.yaml

Will show you an encrypted yaml

Bash# sops --decrypt mysecret.yaml

Will show you the decrypted yaml

SOPS will use your AWS credentials stored in ~/.aws to authenticate against KMS so you can encrypt and decrypt without a password. SOPS will also recursively look for .sops.yaml files so it’ll auto discover metadata about how it should encrypt and decrypt things, which has two important ramifications: first, a user doesn’t have to learn a ton of commands or flags. Second, an additional .sops.yaml file can be added to a subfolder representing a production environment or different project, that .sops.yaml file could have a different encryption/decryption key. You could give different Cloud IAM users different rights to each KMS key, to achieve granular access control. If you’re worried about someone deleting your AWS KMS key you can configure SOPS so the data can be encrypted or decrypted by AWS, GCP, or Azure KMS solutions, so you can keep a secondary backup KMS that few people have access to.

SOPS encourages workflows patterns that make life easier. Devs can store their secrets encrypted right next to and in sync with the version of the code that uses it. Secrets management suddenly gains all the benefits of git: auditable change management, peer reviews via pull requests, diffs of edits to secrets are meaningful because only edited values will get updated on edits, vs the entire file getting re-encrypted, this also makes merge conflicts less likely. Consistency and standardization always make automation and CICD Pipeline development easier, which makes the Ops folks happy. SOPS allows code, configuration, and secrets to be stored in a consistent location, which makes GitOps workflows easier to achieve.

Hashicorp Vault will have trouble achieving its goal of being a centralized secret repo for your organization because users will find it hard to use, devops will find it troublesome to maintain, and management could find it expensive. SOPS, on the other hand, is pain-free to use, easy to learn, cheap to maintain, and supports workflow patterns that make life easier! These things together mean as long as someone can pitch it to the organization, there will be no barriers to adoption, which means an increased security posture for the entire organization is more likely to occur. This is why SOPS with KMS and Git is massively underrated.

I’d like to clarify that the purpose of this article isn’t to say Vault’s bad and you should use SOPS and KMS instead. I wrote this article for 3 reasons: One, I love to teach. Two, I wanted to point out some shortcomings of Vault. Three, KMS with SOPS is an amazing combo that’s massively underrated: no one seems to know about it, I never encountered a proper explanation of either during my research, and according to Google Trends there are not many searches of SOPS compared to Vault.

I’d like to end this article by saying that I wholeheartedly recommend everyone learn SOPS, KMS, and Vault. Why learn Vault if it’s hard and SOPS with KMS does the same thing with ease? Two reasons really: One, Vault is among the best in class when it comes to PKI and secrets rotation, both of which can be needed to satisfy many government and banking security compliance standards. Two, Vault gets easier to use every year: The community has accepted it as a clear winner and added Vault support into several products: Jenkins, cert-manager, and Kubernetes. Kubernetes, in particular, works nicely with Vault, a lot of the pain points have been abstracted and automated to the point where they worth together smoothly. The Vault team also has a proven track record of being committed to making Vault easier to use over time by improving documentation, offering some IaC, and responding to the needs of the community: After the community made auto unseal solutions, backend storage migration solutions, and 3rd party web GUIs; Vault’s Developers decided to bake these functionalities into the open source version. Given this, it wouldn’t surprise me if in the future Vault’s Transit Secrets Engine (Vault’s KMS solution) was made to integrate smoothly with Mozilla SOPS.