Sometimes it can be useful to get files from a docker containers. With kubernetes it can even be more common to use an init container to provision another container and therefore introspecting its filesystem can be important.

However, there is one case which can be an issue: you are on a server or machine without docker and need to check an image (yes, at midnight on the laptop of your child ;)). The most immediate solution is to install docker but if you are on a bastion or deployment server, it can not be an option to change the server state. But in this case, you still have a luck: you depend on a docker image which is not local - since you don't have docker locally ;).

The tip there is to check what is a docker image after all. If you never dig into the image format it can sound complicated but actually a docker image is just a sequence of archive of archives. In other words, once you downloaded the archives you are done.

Here is a bash script using curljqgunzip and tar to recreate a filesystem:

#! /bin/bash

# 1.

# 2.

# 3.
cd "$work_dir"

# 4. get an API token
echo "Getting an API token"
token=$(curl --silent --header 'GET' "$image:pull" | jq -r '.token')

# 5. download manifest to get layers
echo "Retrieving $image:$version layers list"
layers=$(curl --silent --request 'GET' --header "Authorization: Bearer $token" "$image/manifests/$version" | jq -r '.fsLayers[].blobSum')

# 6. download and extract each layer
mkdir -p "layers/gz"
mkdir -p "layers/tar"
for i in $layers; do
  echo "Downloading layer $name"
  curl --silent --location --request 'GET' --header "Authorization: Bearer $token" "$image/blobs/$i" > "$out"
  gunzip -c "$out" > "layers/tar/$name"
  rm "$out"

# 7. for each layer extract the actual files in the target directory
mkdir -p "$target_dir"
for i in layers/tar/*; do
  if tar -tf "$i" "$folder_filter" >/dev/null 2>&1; then
    echo "Extracting $i"
    tar -xf "$i" -C "$target_dir" "$folder_filter"
    echo "No $folder_filter in $i, skipping"
rm -rf "layers"
echo "Created $target_dir"

This script can look long but it is actually quite simple:

  1. we read from the script options the image to download and a subfolder of the filesystem to include ("" would include the whole container filesystem including linux itself),
  2. we configure the work directory since we will download files and expand archives,
  3. we go inside the work directory to avoid to pollute undesired folders,
  4. we retrieve a token from the docker registry we use (the public one here),
  5. we call the manifest of the docker image we configured to get all its layers,
  6. we download all layers found in previous step and extract each of them - they are just plain gzip archives,
  7. each file extracted from previous step is actually a tar containing the filesystem, we just check it contains the filtered folder and extract it in the output folder

After all that processing we obtained in the output folder the container filesystem.

Indeed it can be reworked and enhanced but what is important is to see that from a docker manifest and with just ungzipping and untarring logics, you can access any file of a docker container. Side note being it is trivial to implement the exact same logic in java or any language ;).

Now if you run multiple times this script in an init container of kubernetes you can actually provision a container from a set of images used as plain filesystem. The benefit is to only deliver docker images, not specific zip or repository you need to combine to deploy your application.

My conclusion on that small post will be simple: never accept that a tool you are using is a blackbox, anything done must be understood and you must validate it does what you expect before using it at scale or in production.

Stay curious!

From the same author:

In the same category: