webratz.de

Random notes about random cloud and photo stuff

Deep dive on CDK: Tradeoffs when developing constructs & libraries

2023-09-19

Last week I gave a talk at the AWS Community Day DACH 2023 in Munich. During the presentation I showed the Code live on screen, but I have added the shown examples to the slides as there is no recording of the talk.

When developing CDK constructs or libraries there is often several ways to achieve similar outcomes. This talk will dive deep on some of these decisions, compare the options and with that give the audience information to pick the best option for themselfes.

Examples for these options are eg project setup (projen or not), structuring constructs and their properties or how to hook into CDK for custom checks.

The slides are below and the example code is at GitHub.

    / [pdf]

Meetup: Infrastructure as Code for a fast growing Engineering Organization

2023-03-01

Personio hosted its first “Platform & Pizzas” Meetup recently, where I presented my talk “Enabling teams in a fast-growing company to manage their own infrastructure - How we work with CDK in Personio”.

In the presentation I explain why we are using CDK at Personio, why its the right choice for us and what modifications we made to it, for it to be useful in our context.

    / [pdf]

Privilege Escalation in Build Pipelines

2019-11-24

I recently gave a talk at O’Reilly Velocity Conference in Berlin. My initial plan was to create a blog version of it also, but I didn’t have enough time. Luckily there is a recording and the slides are available! Here’s the abstract:

CI/CD systems are usually tightly coupled, and inherit for the CD part a lot of administrative privileges combined with network access to production systems. We tend to believe that we only execute trusted software within those systems, but it quickly becomes clear that code from a huge variety of sources is loaded and executed in that system that isn’t under your control.

In the talk I will walk you through how to identify the most relevant issues along the steps of actual pipelines. You’ll take a deep dive on the confused deputy, a trusted third-party that can be tricked into abuse of its privileges, which will explain how the direct association of code with access permissions on a public cloud provider can help to eliminate the need to trust components in between.

The slides are at Speakerdeck

You can watch the recording on Youtube

    / [pdf]

AWS multi account infrastructure

Back in October 2016 I held a talk in the Munich AWS User Group about managing AWS Multi account infrastructure at glomex. This post is summarizing the talk and appeared initially on the glomex techblog.

You can find the slides at speakderdeck

Why we implemented an AWS multi account strategy

As for the motivation to go ahead and deal with the complexity of such a setup - there are a lot of good reasons:

  • It’s recommended by AWS for security and billing concerns.
  • Concerning security, you can control access and security on a much more granular level. So for example you don’t have to deal with designing IAM rules so that one team can’t access another team’s resources – say EC2 resources - because you “physically” separate those resources from each other. Billing-wise, you instantly get more insights on how much a team is spending or how much a certain product costs without having the need to build reports based on billing tags. You can still make use of those within the accounts to further drill down on how much impact a certain aspect of your infrastructure / product has on your AWS spendings.
  • Mimic your organization’s hierarchy. Sometimes, different departments have different guidelines or budgets so they want to be “in control” and not share accounts in order to avoid interference.
  • Separation of concerns: Separate your staging environments from each other and, most importantly, from your production environment. This makes it much harder to interconnect services from different stages which results in better isolated systems and possibly more consistency. This also protects your production environment from hitting either AWS account limits or API rate limits. Nothing more embarrassing than an AutoScalingGroup that can’t scale up because a load test done by the QA team eats up all of your EC2 instance allowance, right? Or you can’t deploy because a rogue developer script triggers rate limiting for the AWS API in that account.

This separation also helps with decommissioning of services. Sure, you should have all your infrastructure as code but in some cases, it still makes sense to shut down a complete account including deleting all name spaces and start fresh. In the end it all comes down to treat AWS accounts as just another volatile/non-static resource that can be “instantiated” as you wish.

In a nutshell, all of those measures also help to minimize the blast radius of things gone wrong. Account limits, API limits, rogue scripts, security incidents – all of those can be limited to a smaller section of your whole cloud infrastructure by having a multiple account strategy.

The image above shows the rough glomex setup. Each of our teams gets a set of up to four AWS accounts which they use for their environments. The “Team Ops” Accounts have some special IAM roles set up for accessing the other accounts, but is otherwise no different to the development team accounts. Furthermore, we have one master billing account which also serves as the account to keep IAM users. This means that none of our other accounts have any users configured (except for some special 3rd party tools which can’t deal with STS). We enforce two factor authentication and users must then switch to a set of certain preconfigured roles to be able to access any of the other accounts. This way we can easily manage who gets access to which account. The users are kept directly within AWS with no active connection to our user management backend. Instead we sync users from our LDAP to IAM whenever we make changes to a user. So in case our LDAP fails, we can still access AWS. As a last thing, we have another special account which contains all of our CloudTrail logs. This account has very restricted access as it contains sensitive data. For the management of our accounts, we have created an internal tool handling the creation of VPCs, IAM federation, deployment of same basic resources. We are looking very much forward to the GA of AWS Organizations since there is still a lot of manual stuff to do (initial account registration on the AWS website) that we haven’t automated yet.

Pitfalls we discovered

Along the way we found some pitfalls that sometimes were surprising, but eventually we found ways to work around all of them:

  • Tool support for cross-account (STS) is not as good as expected. Since the various AWS SDKs usually handle authentication and authorization we were sometimes surprised that some tools explicitly wouldn’t understand STS credentials:
    • S3cmd
    • Serverless framework (at least in the beginning of 2016, we since have abandoned using it)
    • Kinesis agent (has been fixed)
  • AWS support for cross-account resource access is underwhelming:
    • API gateway is always public
    • VPC security groups cannot be used across accounts
    • S3 bucket permissions between more than 2 accounts are a mess
    • Complex trust relationships between accounts needed
  • Complex networking setup: We have decided against peering for all accounts but still do peer to some accounts
  • Hard to get a good overview over all accounts with AWS tools:
    • Billing works fairly well with Cost Explorer
    • Metrics are best used with a 3rd party tool, we use DataDog
    • Config Rules are too expensive
  • Costs do multiply
    • Config Rules
    • Support costs
    • VPN connections
  • User support and education is more demanding
    • User federation / STS hard to grasp for sporadic users

Tools we used

As there is surprisingly little tooling out there to support such setups we have created some custom tools which we are in the progress of being made open source:

  • LDAP – IAM sync
  • LDAP SSH key user management for EC2 instances
  • Account / environment detection for services as a safety net
  • Base setup tool
  • Custom deployment tools (CloudFormation, Lambda, CodeDeploy, API Gateway)
  • Account creation automation

To sum it all up we can say that from our experience the invest in this kind of account structure has already paid off. We had numerous occasions were somebody would have killed a production setup if it had been in the same account (deleting the wrong resources, eating up various AWS limits, etc). With AWS organizations, it should also become much easier to get rid of the cumbersome part of initial account creation – for us, that would mean that we will have even more accounts in the future.

Slides

    / [pdf]