I'm brooding over the interactions between namespaces and
capabilities. Occasionally, I stumble over wording like the following
from user_namespaces
(7):
Holding CAP_SYS_ADMIN within the user namespace that owns a
process's mount namespace allows that process to ...
I do understand the following:
Every non-user namespace N is owned by a user namespace U, which is determined by the process creating N being in U at that time.
Capabilities are a property of a process (more accurately: thread) or a file. For this discussion, I think it's good enough to think about processes for now.
For every type of namespace, every process is in exactly one namespace of that type.
Exactly the PID namespaces and the user namespaces form hierarchies. My understanding of the wording in the docs is, that even if a process P is in a namespace A, which in turn is a child of namespace B, one would still not say that P is in B, because P is in A and it is in only one namespace of that type. In other words: The parental relationship of namespaces must not be confused with set inclusion.
Now, the wording
Holding a capability within the user namespace U that owns a
process's mount namespace M allows that process P to ...
tells me to go from a process P to its mount namespace M
(/proc/P/ns/mnt), figure out its owning user namespace U
(ioctl_ns
(2)) and then verify if the process holds a capability in U.
It's the last part I don't get: P is not necessarily in U, so how can it hold a capability there? Is there a Process ×ばつ Usernamespace ↦ Capabilities mapping? Also, U is associated with a UID, but capabilities are not a property of user IDs.
1 Answer 1
Actually, the answer was right under my nose, in user_namespaces
(7), I just seem to have scrolled past the relevant section, which I'll quote below:
1. A process has a capability inside a user namespace if it is a member
of that namespace and it has the capability in its effective capa‐
bility set. A process can gain capabilities in its effective capa‐
bility set in various ways. For example, it may execute a set-user-
ID program or an executable with associated file capabilities. In
addition, a process may gain capabilities via the effect of
clone(2), unshare(2), or setns(2), as already described.
2. If a process has a capability in a user namespace, then it has that
capability in all child (and further removed descendant) namespaces
as well.
3. When a user namespace is created, the kernel records the effective
user ID of the creating process as being the "owner" of the name‐
space. A process that resides in the parent of the user namespace
and whose effective user ID matches the owner of the namespace has
all capabilities in the namespace. By virtue of the previous rule,
this means that the process has all capabilities in all further re‐
moved descendant user namespaces as well. The NS_GET_OWNER_UID
ioctl(2) operation can be used to discover the user ID of the owner
of the namespace; see ioctl_ns(2).
So there actually is a ternary relation of Process ×ばつ Namespace ×ばつ Capability. My understanding is as follows:
A process plainly has those capabilities in the user namespace it is a member of, which are in its effective capability set. No surprise here.
Having a capability holds down the hierarchy of user namespaces. Also no surprise.
If a process P is a member of user namespace U, and U has a child user namespace U', and the eUID of P is the UID of U', then P has all capabilities in U'.
Unfortunately, I'm not sure whether I understood 3 correctly, but I fail to observe it with the following experiment:
$ id -u
1000
$ echo $$
4083
$ readlink /proc/4083/ns/user
user:[4026531837]
$ sleep 10001 &
[1] 4101
$ readlink /proc/4101/ns/user
user:[4026531837]
$ ps -p 4101 -o pid,euid,comm
PID EUID COMMAND
4101 1000 sleep
Now sleep
resides in user namespace 4026531837 and has eUID 1000.
$ unshare -r
# echo $$
4111
# readlink /proc/4111/ns/user
user:[4026532574]
This user namespace with id 4026532574 has parent user namespace 4026531837 and UID 1000, seen from the outside (see below). So it should fulfil the criteria mentioned above. But still, I do not see extended capabilities for the sleep process:
# grep Cap /proc/4101/status
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Maybe I would have to mount a neww /proc
, but I do not know how to do this without shadowing away the sleep
process...
side note
From various scraps of code and man pages, I've put together the rather ad-hoc nsrel to investigate namespace hierarchies. For the example above, when run in the initial namespace, it yields
$ nsrel 4111 user
ID TYPE PARENT USERNS UID
4026532574 User 4026531837 4026531837 1000
4026531837 User <oos> <oos> 0
which shows that process 4111 is in user namespace 4026532574, which has parent namespace 4026531837 and belongs to UID 1000.
-
I think you understood it all right, it’s just that
/proc/<P>/status
shows always only the capabilities granted to P in the user namespace it is a member of, there is no automatic adjusting of what is shown if you query from a different namespace. Nevertheless, your process P does hold all the capabilities when operating on U'. You can see it if you turn thatsleep
into akill
of a process belonging to U’. In fact such akill
would kill even a process belonging to a hierarchically nested U’’ owned by euid 1001 of U, all regardless of what/proc/<P>/status
states about capabilitiesLL3– LL32020年08月07日 12:33:05 +00:00Commented Aug 7, 2020 at 12:33