Random notes about random cloud and photo stuff

Deep dive on CDK: Tradeoffs when developing constructs & libraries


Last week I gave a talk at the AWS Community Day DACH 2023 in Munich. During the presentation I showed the Code live on screen, but I have added the shown examples to the slides as there is no recording of the talk.

When developing CDK constructs or libraries there is often several ways to achieve similar outcomes. This talk will dive deep on some of these decisions, compare the options and with that give the audience information to pick the best option for themselfes.

Examples for these options are eg project setup (projen or not), structuring constructs and their properties or how to hook into CDK for custom checks.

The slides are below and the example code is at GitHub.

    / [pdf]

Privilege Escalation in Build Pipelines


I recently gave a talk at O’Reilly Velocity Conference in Berlin. My initial plan was to create a blog version of it also, but I didn’t have enough time. Luckily there is a recording and the slides are available! Here’s the abstract:

CI/CD systems are usually tightly coupled, and inherit for the CD part a lot of administrative privileges combined with network access to production systems. We tend to believe that we only execute trusted software within those systems, but it quickly becomes clear that code from a huge variety of sources is loaded and executed in that system that isn’t under your control.

In the talk I will walk you through how to identify the most relevant issues along the steps of actual pipelines. You’ll take a deep dive on the confused deputy, a trusted third-party that can be tricked into abuse of its privileges, which will explain how the direct association of code with access permissions on a public cloud provider can help to eliminate the need to trust components in between.

The slides are at Speakerdeck

You can watch the recording on Youtube

    / [pdf]

ECS / Docker NFS Issue hunt down


This post appeared initially on the Scout24 Engineering Blog

Who doesn’t know this situation: Sometime something weird is going on in your setup, but everything is so vague that you don’t know where to start. This is a story about a bug we experienced in a production setup and how we found out what was the root cause.

The problem

Imagine a ECS cluster with multiple tasks running with an NFS (EFS in this case) backed persistent root volume. Launching new tasks works just fine but as soon you are doing a cluster upgrade complete hosts seem to be unable to start tasks or can only start a few and then break. But also some hosts just work fine. So scaling up or down your cluster is a really scary thing that could not be done anymore without manual intervention. That’s an unacceptable situation, but what to do?

Want to know how we handle the NFS mount? My colleague Philipp has a blog post on that


Only some hosts are affected

During the upgrades we noticed that some hosts didn’t have any issues with launching tasks. But what was the common pattern? Number of tasks, availability zone, something else? As many host instances in eu-west-1c were affected the AZ theory was the first to verify. Things that could go wrong with EFS in an AZ are:

  • EFS Mountpoint not there
  • Wrong security group attached to mountpoint
  • per subnet routing in VPC gone wrong
  • routing / subnet config on the instances
  • iptables on instance misconfigured
  • docker interfaces catches NFS traffic

I checked all of them, but everything was configured correctly, but still when i logged on to a host and tried to mount one of the EFS volumes manually it just hung forever.

Could it be something else network related that I did not think of? UDP vs TCP in NFS?

It’s tcpdump time!

tcpdump -v port not 443 -i any -s 65535 -w dump.pcap excludes stuff from SSM Agents that are running on the ECS instance and should really catch all traffic on all interfaces. Starting tcpdump, run the mount command manually, stop tcpdump, pcap file on s3, fetch file to analyse locally with wireshark. Digging through the whole dump I was unable to find any package to the EFS Mountpoint. Did I do something wrong? But also after the second try there was not a single package related to NFS.

The only possible conclusion here: the system does not even try to mount, but why?

Stracing things down

ps shows many other hanging NFS mounts triggered by ECS / Docker. But why are they hanging?

Its strace time! Let’s see what is going on strace -f mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,tcp,intr $EFSID.efs.eu-west-1.amazonaws.com:/ /opt/debugmount/$EFSID &

We see that everything hangs on the mount syscall. And that for even longer than 10 minutes, which should have hit already every timeout. (In this case actually the latest timeout should have triggered after two minutes (600/10*2))

As I noticed before, there were many other hanging NFS mounts, so let’s kill them off and try again. Now the mount runs perfectly fine! Are we hitting an AWS service limit here? Is there some race condition blocking out each other? According to AWS docs, there is not such an service limit, so let’s stick with the deadlock theory and try to build a test case.

function debugmount {
    mkdir -p /opt/debugmount/$1
    echo start $1
    mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,tcp,intr $1.efs.eu-west-1.amazonaws.com:/ /opt/debugmount/$1 
    echo finish $1
# this is a random list of about 15 EFS mounts
for fs in $MYFSLIST; do
    debugmount $fs

The test case above works perfectly fine. But maybe Docker is mounting the volume in a different way, so it might be that the test case is not accurate? Let’s have a look on what is happening within Docker: https://github.com/docker/docker-ce/blob/master/components/engine/volume/local/local_unix.go#L85 https://github.com/moby/moby/blob/master/pkg/mount/mount.go#L70

All of this looks quite sane. There is some fiddling with addr option, but we also tested mounting by IP and had the same issue.

Stepping back to our initial problem: The issues only occur on a cluster update where all tasks are re-scheduled to other instances. What if that happens not sequentially but in parallel? Our simple test is just running in a loop, so one mount command runs after another.

mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,tcp,intr $1.efs.eu-west-1.amazonaws.com:/ /opt/debugmount/$1 &

Now we are backgrounding the mount command. So we do not wait anymore for the command to complete, but immediately start the next one. Here we go: now after a few iterations things seem to block and mounting stops to work. We now have a test case that we can use to verify if we fixed the issue.

and now

We now know that the issue has to be in the kernel and have a way to reproduce it, but I have no experience in tracing actually what is going on in the kernel here. So Google has to help out and after searching for nfs mount race condition deadlock rhel it returns a bug report.

The issue described in the report looks very similiar to what we experience and one of the comments mentions that this behaviour was introduced with NFS 4.1.

So lets change our snippet to use NFS 4.0 and see if everything is working now as expected:

mount -t nfs4 -o nfsvers=4.0,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,tcp,intr $1.efs.eu-west-1.amazonaws.com:/ /opt/debugmount/$1 &

Awesome. Our quite coomplex to debug issue was easy to solve. We changed the mount options in our ECS cluster and everything is running smooth now.

The current Kernel of the AWS ECS optimized AMI seems not to contain that bug fix at the time of writing.

Older posts