進入執行中的容器 • 架構、網絡與存儲

docker exec 的本質#

docker exec 並不是「進入」一個既有的行程，而是：

在「目標容器的 namespace 集合」中
啟動一個全新的子行程
讓這個子行程繼承容器的隔離邊界（PID、Mount、Network 等）

這個新行程與容器主行程是兄弟關係（在 PID Namespace 內共享 PID 1 為親代），不是子行程。

底層流程#

從 CLI 到 Kernel 大致是這樣：

docker exec 將 exec 請求送往 docker daemon
daemon 透過 containerd 找到容器對應的 shim
shim 呼叫 runc（或其他 OCI runtime）建立新行程
runtime 對新行程進行：
- setns(2) 加入容器的 mnt、pid、net、uts、ipc、user、cgroup 等 namespace
- 套用 cgroup（將新行程加入容器的 cgroup）
- 套用能力（capabilities）與 seccomp
- 切換工作目錄、使用者
- execve(2) 執行使用者指定的指令

關鍵在 setns(2)：它讓「呼叫者後續建立的子行程」進入指定 namespace。對 PID Namespace 而言，呼叫者本身不會改變 PID，而是它 fork 的子行程才會以新 namespace 中的 PID 出現。

因此 runtime 進行 setns(2) 加入 PID Namespace 後，必須再 fork/exec 一次，才能讓真正執行使用者指令的行程出現在容器內。

setns(2) 的範例（簡化）#

以下是用 C 簡化示範如何 setns(2) 進入指定容器的 namespace：

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int enter(const char *path, int nstype)
{
    int fd = open(path, O_RDONLY);
    if (fd < 0) { perror(path); return -1; }
    if (setns(fd, nstype) < 0) { perror("setns"); return -1; }
    close(fd);
    return 0;
}

int main(int argc, char **argv)
{
    if (argc < 2) { fprintf(stderr, "usage: %s <pid>\n", argv[0]); exit(1); }
    char p[256];
    pid_t target = atoi(argv[1]);

    /* 注意順序：mnt 通常最後切，user 通常最先切 */
    snprintf(p, sizeof(p), "/proc/%d/ns/uts",    target); enter(p, 0);
    snprintf(p, sizeof(p), "/proc/%d/ns/ipc",    target); enter(p, 0);
    snprintf(p, sizeof(p), "/proc/%d/ns/net",    target); enter(p, 0);
    snprintf(p, sizeof(p), "/proc/%d/ns/pid",    target); enter(p, 0);
    snprintf(p, sizeof(p), "/proc/%d/ns/mnt",    target); enter(p, 0);

    /* 進入 PID Namespace 後，必須 fork 才能讓子行程以新 PID 出現 */
    pid_t pid = fork();
    if (pid == 0) {
        char *args[] = {"/bin/sh", NULL};
        execv(args[0], args);
        perror("execv");
        _exit(1);
    }
    waitpid(pid, NULL, 0);
    return 0;
}

用 nsenter 手動模擬 docker exec#

不寫 C 也可以用 nsenter 直接驗證：

# 取得容器主行程在 host 的 PID
HPID=$(docker inspect -f '{{.State.Pid}}' <container>)

# 進入容器的所有 namespace 並執行 sh
sudo nsenter -t $HPID -a /bin/sh

此時你的 shell 看到的 hostname、/proc、網路介面都與容器內一致，與 docker exec -it <c> sh 幾乎等價。

何時用什麼#

有 docker CLI、容器正常執行：直接 docker exec
daemon 異常但容器行程還在：nsenter 仍可進入
完全沒有 daemon、只有 OCI bundle：用 runc exec 或自寫 setns 程式

在排查「容器內看到的東西為何與 host 不同」時，分別在 host 與容器內檢查 /proc/self/ns/* 的 inode，可以快速判斷你正站在哪一層。

docker exec 的本質#

底層流程#

setns(2) 的範例（簡化）#

用 nsenter 手動模擬 docker exec#

何時用什麼#

延伸閱讀#