Designing highly secure software is by no means a simple task, and therefore shouldn’t be taken lightly. If you treat security as an afterthought or as something you’ll get to later, you may find yourself in the hot seat of a post-mortem trying to explain yourself. Instead, it’s much better to enter into a new project having a security first mindset.
In this article, we will consider every aspect of the product lifecycle through a security lens:
- How developers access source code
- How code changes make it into the main branch
- How artifacts are built, stored, and retrieved
- How applications are deployed
- How to control inbound and outbound access to and from resources
- How applications get their config values, keys, and secrets
- How users are authenticated
Full disclosure: I’ve tried to make this article as platform-agnostic as possible, but it might be helpful to keep in mind that the bulk of my recent experience has been using GitHub for source control, GitHub Actions for CI/CD, and AWS for cloud infrastructure.
Source Code
Developer Access
At the organization level, it’s easy enough to grant or revoke access on a per developer and per repository basis when using most modern source code repository providers such as GitHub, GitLab, and Azure DevOps. It is generally understood that using SSH to access source code repositories is far more secure than passing usernames and passwords over HTTPS. And as an added bonus, the developer doesn’t have to keep entering their credentials over and over.
BrANching Strategy
When you hear the term “branching strategy” you might think of GitFlow, GitHub Flow, trunk-based development, etc. And while that’s certainly an important thing to consider, I want to talk about a branch protection strategy. This allows an organization to put gates between the developer and the code that gets built and deployed.
I believe in keeping the main branch protected from any push or merge actions taken directly by the developer. Nothing should get into the main branch without going through a pull/merge request first. Rules can be configured such that one or more developers (other than the author) must perform a code review and grant approval before completing the merge. This can protect the product from malicious code being added by a disgruntled developer, or just outright bad code being added by an inebriated one.
Static Code Analysis
Another thing that branch protection provides is the ability to require automated checks to pass before allowing the merge to go through. A couple of obvious checks might be ensuring that the project successfully builds and that all unit tests pass. But having a security first mindset, we should also try to prevent hardcoded keys and secrets from making it into the codebase. There are several static code analysis tools out there that offer this kind of protection, with the added benefit of flagging code smells and even spotting bugs. Check out tools like SonarQube and Coverity.
TFSec is another great tool that I’ve relied on for scanning my Terraform code for security vulnerabilities and best practices.
Deploy Pipeline
IAM Roles
Regardless of which tools you use for your deployment pipeline, you’re going to want to lock down which resources it can access. I’ve seen some projects that just give their deployment pipeline full administrative privileges. And I shouldn’t need to explain why that’s such a horrible idea!
The best way to go is to have dedicated IAM (Identity and Access Management) roles assumed by your deployer(s). The policies attached to these roles need to employ the principle of least privilege. A strategy I’ve used in multiple projects is to have a separate IAM role for each environment. And if these roles are managed in Terraform, it’s smart to put them in a separate Terraform project. It would be a shame if a deployer tried to delete it’s own role!
Run Time
Here is where you’ll arguably find the largest attack vector, so it’s important to be extra diligent when it comes to this stuff.
Networking
In the context of the cloud, everything should be deployed inside a Virtual Private Cloud (VPC). This is what keeps your infrastructure completely separate from the rest of the cloud.
But in the context of the VPC, you very well may need a mix of public and private subnets. Essentially, a public subnet has a connection to an internet gateway, whereas a private subnet does not.
Some resources may only need to talk to other resources within the VPC (think of a serverless function that reads from a database). Other resources may need to reach out of the VPC to talk to the internet (think of a container that needs to call a third party API). Other resources may need to be accessed from outside the VPC (think of a publicly accessible web server).
Depending on how big the project is, chances are there’s complex combinations of these scenarios throughout all of your resources. That’s where security groups come into play. A security group consists of a CIDR block (a block of IP addresses), and a set of rules governing ingress and egress (inbound and outbound traffic). Each rule consists of a protocol, a port range, and an IP range. This allows you to fine-tune all network traffic to and from the resources that reside in the security group.
Resource Permissioning
In addition to network security, you also want to manage resource permissions. Every resource should have “least privilege” IAM policies attached. Each policy should call out which other resources have access and what actions they can take.
For instance, a document database may need two policies. One that allows read and write actions from a serverless function, and another that limits the web server to read-only actions.
I know first hand how tempting it is to just leave these things wide open and rely on your networking rules to keep you safe. But it’s really important to take time to map out exactly what resources need what. You’ll thank me later.
Secrets
We talked earlier about how to use static code analysis to find and prevent hard-coded API keys and secrets from making it into the codebase. So, how should applications access these sensitive pieces of data?
Well, it depends on a number of things. For instance, a container may need to access it differently than a serverless function. It’s also important to discern which values should or shouldn’t be in the environment. Perhaps certain sensitive data needs to be securely fetched, decrypted on the fly, and then immediately purged from memory after use.
I’ve gotten a lot of use out of AWS Systems Manager’s Parameter Store (SSM params). The param values can be stored as plain text, or as a secure string for more sensitive values.
For even more control over how sensitive values are encrypted and what keys are used, there’s also AWS Secrets Manager. It allows you to use the default AWS encryption key, or to use your own. It’s safer to use your own, however it comes with the added responsibility of remembering to cycle the keys regularly.
User Auth
And finally, there’s the subject of user auth. To begin, let me quickly define the terms authentication and authorization, because there’s often confusion surrounding them.
Authentication is basically the user announcing who they are, like showing a valid driver’s license. Once the user successfully authenticates, the system can then say, “Ok. I believe you are who you say you are.”
Authorization is the process of the system determining whether or not the user has permission to do what they’re trying to do. The system might say, “Ok. I believe you are who you say you are, but you don’t have permission to do what you’re trying to do.”
Much of the confusion between these two comes from the standard HTTP status codes. A 401 status code is the result of a failed authentication, however the spec defines it as “401 UNAUTHORIZED”. And the HTTP status code that is returned as a result of failed authorization happens to be “403 FORBIDDEN”. Go figure.
So, how exactly can we secure our APIs? Well, there are several valid options, but the industry standard since 2012 is called Open Authorization, or OAUTH 2.0. A common implementation of OAUTH 2.0 is the JSON Web Token (JWT).
The great thing about JWTs is that they come with a lot of built-in niceties. One of which is scopes, which comes in handy for the authorization process mentioned above. Another feature is that it can have an assigned expiration date. You can then implement a refresh token strategy that makes it an even smoother experience for the user.
In Conclusion
So, there you have it; a quick look at software engineering through the lens of security. There’s such a massive surface area that you can’t approach it in a prescriptive way. You really have to tackle it by looking at every step of the product lifecycle with a security first mindset.