# Why do you need an init process inside your Docker container (PID 1)

When you run your application inside a Docker container, it will be assigned process identifier (PID) 1. This particular PID is special in the Unix world. PID 1 is assigned to the very first process that the kernel starts, therefore it takes a special role in the system.

## Why is PID 1 so special?

* The default action is ignored when receiving signals, which means your process will not implicitly terminate on `SIGINT`or `SIGTERM`. Usually, operating systems terminate your process when processes receive `SIGINT`or `SIGTERM`.
    
* Any orphaned process is adopted by PID 1.
    

Let's discuss these more in-depth, especially in the Docker world.

### Signals

Do you know what actually happens behind the scenes when you run `docker stop`?

The main process (PID 1) inside the container will receive `SIGTERM`, and after a grace period, `SIGKILL` signal.

By default, Docker waits 10 seconds after `SIGTERM` before killing it with `SIGKILL`.

Did you ever have a container that took a long time to stop? Did it take 10 seconds accurately? That means your application doesn't handle signals explicitly!

Here is a simple command that spins up a Node.js container that runs forever:

```bash
docker run \
  -d \
  --rm \
  --name node-app \
  node:alpine \
  node -e "setInterval(() => {}, 1000);"
```

Now stop the container and measure the time:

```bash
time docker stop node-app
```

On my system, it took 10.644 seconds to stop the container.

```text
real    0m10.644s
user    0m0.030s
sys     0m0.062s
```

It means the container couldn't be stopped gracefully, it had to be killed.

This is all because your Node.js application runs as PID 1, which doesn't run the default actions of signals, which would be process termination in this case.

Now let's handle the `SIGTERM` signal in the Node.js application:

```bash
docker run \
  -d \
  --rm \
  --name node-app \
  node:alpine \
  node -e "
      const interval = setInterval(() => {}, 1000); 
      process.on('SIGTERM', () => clearInterval(interval));
  "
```

When `SIGTERM` is received, the interval will be stopped, so the process will exit because there are no things to do.

Stop the container again:

```bash
time docker stop node-app
```

On my system, it took about half a second this time.

```plaintext
real    0m0.613s
user    0m0.000s
sys     0m0.060s
```

This means the Node.js application has been gracefully stopped instead of being killed after the 10 seconds timeout.

### Orphaned processes

When a process dies, all of its children become an orphan and are adopted by PID 1. Now it's more interesting what happens when that child finishes its execution or dies for whatever reason.

Let's run a command that showcases orphaned processes. The following command will launch an Ubuntu container and runs the `sh -c "sleep 10 & exec sleep 1000"` command. This command creates a shell and executes `sleep 10 & exec sleep 1000`. The shell creates `sleep 10` process in the background and replaces the shell via `exec` with `sleep 1000`. Therefore PID 1, in the beginning, is `sh`, but `sleep 1000` takes its place.

```bash
docker run -d --rm --name orph ubuntu sh -c "sleep 10 & exec sleep 1000"
```

Right after running the command, print out the processes in the container:

```bash
docker exec orph ps -eaf
```

If you run it in 10 seconds, you should see a similar output:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 16:34 ?        00:00:00 sleep 1000
root         7     1  0 16:34 ?        00:00:00 sleep 10
```

As you can see, `sh` doesn't exist because `sleep 1000` replaced it.

Now wait until `sleep 10` finishes, and print out the processes again:

```bash
docker exec orph ps -eaf
```

This time you should see a different output:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 16:34 ?        00:00:00 sleep 1000
root         7     1  0 16:34 ?        00:00:00 [sleep] <defunct>
```

As you can see, `sleep 10` finished, and it became **defunct**. Terminated processes are supposed to be cleaned up by their parent processes. This process's parent terminated, and the current PID 1 process doesn't take care of this process because it doesn't know about it - and `sleep` doesn't handle child processes. These processes are called **zombie processes**.

In normal circumstances, you should never see zombie processes in your process list. PID 1 should take care of removing zombie processes from the process table.

## Using init

When you boot up a Unix-based operating system, the PID 1 will be an init process. This process takes care of reaping the zombie processes throughout your system's uptime.

Since orphaned processes are always adopted by PID 1 - the init process -, it can take care of those zombie processes easily.

In a Docker container, an init process should also take care of forwarding the signals to your application as well.

Let's rewrite the previous command a little bit:

```bash
 docker run -d --rm --name orph ubuntu sh -c "sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000"
```

This command will produce the following process list:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:42 ?        00:00:00 sleep 1000
root         7     1  0 21:42 ?        00:00:00 sleep 1
root         8     7  0 21:42 ?        00:00:00 sleep 10
```

`sleep 1000` becomes PID 1, `sleep 1` is a child of `sleep 1000` and `sleep 10` is a child of `sleep 1`.

`sleep 1` immediately finishes, then you will see the following process list:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:42 ?        00:00:00 sleep 1000
root         7     1  0 21:42 ?        00:00:00 [sleep] <defunct>
root         8     1  0 21:42 ?        00:00:00 sleep 10
```

As you can see, `sleep 10` was adopted by `sleep 1000`. Because `sleep 1` terminated, which also became a zombie process.

After 10 seconds, when `sleep 10` finishes, it becomes a zombie process as well:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:59 ?        00:00:00 sleep 1000
root         7     1  0 18:59 ?        00:00:00 [sleep] <defunct>
root         8     1  0 18:59 ?        00:00:00 [sleep] <defunct>
```

This time use the `--init` flag. This flag will boot up [Tini](https://github.com/krallin/tini), a lightweight init system in the container as PID 1.

```bash
 docker run --init -d --rm --name orph ubuntu sh -c "sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000"
```

You will immediately see that `sleep 1` became a zombie process and `sleep 10` is adopted by PID 1:

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:45 ?        00:00:00 /sbin/docker-init -- sh -c sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000
root         7     1  0 21:45 ?        00:00:00 sleep 1000
root         8     7  0 21:45 ?        00:00:00 [sleep] <defunct>
root         9     1  0 21:45 ?        00:00:00 sleep 10
```

As you can see, `sleep 1` is a zombie process and stays as-is until `sleep 1000` finishes. This is because `sleep` doesn't take care of cleaning up child processes.

After 10 seconds, `sleep 10` is terminated and disappears from the process list.

```text
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:45 ?        00:00:00 /sbin/docker-init -- sh -c sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000
root         7     1  0 21:45 ?        00:00:00 sleep 1000
root         8     7  0 21:45 ?        00:00:00 [sleep] <defunct>
```

This proves that `tini` takes care of reaping zombie processes.

Now to prove that `tini` forwards signals, let's stop the docker container:

```bash
docker stop orph
```

The container stopped immediately, instead of waiting 10 seconds. Therefore signal forwarding is proven as well.

## Why is this important at all?

Zombie processes reserve the process ID until they are properly cleaned up. Your operating system has a finite amount of process IDs. Therefore they can fill up the process table, and the chaos begins!

This isn't an issue if you are not starting new processes from your application. If you don't have child processes, it's impossible to have zombie processes. Although I still recommend using `tini` because it's available in Docker by default, and it's easy to switch on for both individual Docker commands (`--init`) and Docker Compose services (`init: true`).

## Example Node.js project

A repository is available at [GitHub/david-szabo97/node-docker-init-or-not-to-init](https://github.com/david-szabo97/node-docker-init-or-not-to-init), which showcases zombie processes in a single command.
