In the recent PR on adding ProcMount, we introduced a regression when
pods are privileged. This shows up in 18.06 docker with kubeadm in the
kube-proxy container.
The kube-proxy container is privilged, but we end up setting the
`/proc/sys` to Read-Only which causes failures when running kube-proxy
as a pod. This shows up as a failure when using sysctl to set various
network things.
Change-Id: Ic61c4c9c961843a4e064e783fab0b54350762a8d
The --docker-disable-shared-pid flag has been deprecated since 1.10 and
has been superceded by ShareProcessNamespace in the pod API, which is
scheduled for beta in 1.12.
With CRI, kubelet no longer sets up networking for the pods. The
dockershim package is the rightful owner and the only user of the
newtork package. This change moves the package into dockershim to make
the distinction obvious, and untangles the codebase.
The`network/dns`is kept in the original package since it is only used by
kubelet.
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.
Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
normally translates this to a go package of "runtime_v1alpha2", but
I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
public "runtimeapi" in kubelet/apis/cri/runtime.
Having the field set in modifyCommonNamespaceOptions is misleading,
since for the actual container it is later unconditionally overwritten
to point to the sandbox container.
So let's move its setting to modifyHostOptionsForSandbox (renamed from
modifyHostNetworkOptionForSandbox as it's not about network only), since
that reflects what actually happens in practice.
This commit is purely a refactor, it doesn't change any behavior.
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.