In order to monitor the potential scalability issues of the diverse databases I'm working on, I wanted to test a tool which replays SQL statements based on the SQL Logs. At the moment, it is still not working but I learned a few things along the way that are worth sharing.
Context
Because I wanted my tests to be consistently working on every machine and given that pgReplay is written in C and depends on few libraries, I decided to write a Dockerfile. This Dockerfile has a Debian Image as base layer so this post will mostly be relevant for Debian problems.
Base image and debugging
I chose the Postgres 12 base image as base layer because it's the same version of Postgres as the databases that I'm monitoring. Then, I started writting some commands and some of them were failing. That's where I made my first googling session and found out that if you deactivate `DOCKER_BUILDKIT`, you can see the image building and you can then inspect the intermediate container.
Walkthrough
Given the following Dockerfile:
FROM postgres:14.3 RUN "./my-not-existing-script"
If you execute docker build .
, you get the following:
However, if you turn off the DOCKER_BUILDKIT=0 (DOCKER_BUILDKIT=0 docker build .
), you get the following:
As you can see, we see the intermediate container id and we can now bash in it and debug our build! If we run docker run -it 9bf1006c9d89 /bin/bash
, we can analyse the container and test our build command before adding them to the Dockerfile:
Searching packages in APT
Another problem I faced was to look for the right packages to install so that the versions were matching and that nothing was missing. It turned out that you can search directly in APT and not only in Stackoverflow!
To do so, simply use the apt-cache search <keyword>
command!
Searching files in APT packages
Then even with my keyword searches, I couldn't find the right header file to compile the C library properly. I discovered another great tool apt-file
:
apt-file update
apt-file search <your-header-file/the file you're looking for>
And voilà!
Running command as another user
The postgres base image is a bit specific: you're not allowed to run the postgres server as root. The postgres server has to be started by the postgres
user. It seems to be a security feature.
However, for me, it meant that I needed to start the service from the Dockerfile
with another user. It turned out that this is completely possible with the su
command:
Customizing Dockerfile of Postgres images
At the end I needed to understand how the default docker images of Postgres were working and how I can start the Postgres server in my building container so that I can run the tests contained in pgReplay.
Here are the commands I used:
RUN su -c "/usr/lib/postgresql/12/bin/pg_ctl initdb" postgres
RUN su -c "/usr/lib/postgresql/12/bin/pg_ctl start" postgres
Conclusion
At the end, I'm still having issues with the library itself but the dockerization went okay. I learned a bunch along the way and hope this gives you additional weapons to tackle similar problems :)