Automatic merge from submit-queue
swap FIRSTSEEN/LASTSEEN columns in `kubectl get event -w`
```release-note
Show LASTSEEN, the sorting key, as the first column in `kubectl get event` output
```
Not having LASTSEEN as the first column can confuse users into thinking
that events are not delivered in order.
Fixes#27060
Automatic merge from submit-queue
Kubelet Volume Manager Wait For Attach Detach Controller and Backoff on Error
* Closes https://github.com/kubernetes/kubernetes/issues/27483
* Modified Attach/Detach controller to report `Node.Status.AttachedVolumes` on successful attach (unique volume name along with device path).
* Modified Kubelet Volume Manager wait for Attach/Detach controller to report success before proceeding with attach.
* Closes https://github.com/kubernetes/kubernetes/issues/27492
* Implemented an exponential backoff mechanism for for volume manager and attach/detach controller to prevent operations (attach/detach/mount/unmount/wait for controller attach/etc) from executing back to back unchecked.
* Closes https://github.com/kubernetes/kubernetes/issues/26679
* Modified volume `Attacher.WaitForAttach()` methods to uses the device path reported by the Attach/Detach controller in `Node.Status.AttachedVolumes` instead of calling out to cloud providers.
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.
Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.
Implement exponential backoff for errors in volume manager and attach
detach controller
Automatic merge from submit-queue
rkt: Map kubelet's `--stage1-image` flag to rkt's `--stage1-name` flag.
This enables rkt to use cached stage1 image instead of unpacking the stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to enable rkt to find the stage1 image with the name specified by this flag.
Also, the cloud config is modified to pre-load the stage1 images.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
clarify kubectl recursive flag description
Clarify the description of the recursive flag in `kubectl` so that it's more intuitive to the user
This should make it into v1.3 as the rest of the recursive feature PR's will be available in 1.3
Automatic merge from submit-queue
kubectl describe node is allocatable aware
`kubectl describe node` will render node.status.allocatable if present.
in addition, it will report allocated resources relative to node.status.allocatable if present instead of capacity.
old code was confusing if you setup system-reserved and kube-reserved as allocated resource percentages were relative to node capacity and not schedulable amount of resources.
this is a small but valuable usability improvement, so i think it would be good to make 1.3 milestone.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra @kubernetes/kubectl @davidopp
Automatic merge from submit-queue
httplog: Increase stack size
The previous size, of 2KB, in practice always was filled mostly by
http server-releated stuff well above the panic itself, and truncated
before anything of real value was printed in some cases.
This increases the stack size so that panics are printed in full (well, except for really large ones).
cc @lavalamp
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Hot attach of disk to a scsi controller will work only if the
controller type is lsilogic-sas or paravirtual.This patch filters
the existing controller for these types, if it doesn't find one it
creates a new scsi controller.
Automatic merge from submit-queue
Allow disabling of dynamic provisioning
Allow administrators to opt-out of dynamic provisioning. Provisioning is still on by default, which is the current behavior.
Per a conversation with @jsafrane, a boolean toggle was added and plumbed through into the controller. Deliberate disabling will simply return nil from `provisionClaim` whereas a misconfigured provisioner will continue on and generate error events for the PVC.
@kubernetes/rh-storage @saad-ali @thockin @abhgupta
Automatic merge from submit-queue
Fill PV.Status.Message with deleter/recycler errors.
Instead of empty `Message` `kubectl describe pv` now shows:
```
Name: nfs
Labels: <none>
Status: Failed
Claim: default/nfs
Reclaim Policy: Recycle
Access Modes: RWX
Capacity: 1Mi
Message: Recycler failed: Pod was active on the node longer than specified deadline
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.999.999.999
Path: /
ReadOnly: false
```
This is actually a regression since 1.2
@kubernetes/sig-storage
Automatic merge from submit-queue
Allow emitting PersistentVolume events.
Similarly to Nodes, PersistentVolumes are not in any namespace and we should
not block events on them. Currently, these events are rejected with
`Event "nfs.145841cf9c8cfaf0" is invalid: involvedObject.namespace: Invalid value: "": does not match involvedObject`
Automatic merge from submit-queue
add unit and integration tests for rbac authorizer
This PR adds lots of tests for the RBAC authorizer.
The plan over the next couple days is to add a lot more test cases.
Updates #23396
cc @erictune
Automatic merge from submit-queue
Remove EncodeToStream(..., []unversioned.GroupVersion)
Was not being used. Is a signature change and is necessary for post 1.3 work on Templates and other objects that nest objects.
Extracted from #26044
Automatic merge from submit-queue
AWS volumes: Use /dev/xvdXX names with EC2
We are using HVM style names, which cannot be paravirtual style names.
See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
This also fixes problems introduced when moving volume mounting to KCM.
Fix#27534
Automatic merge from submit-queue
Logging for OutOfDisk when file system info is not available
#26566
1. Adding logs for file system info being not available.
2. Reporting outOfDisk when file system info is not available.
Automatic merge from submit-queue
ObjectMeta, ListMeta, and TypeMeta should implement their interfaces
Make unversioned.ListMeta implement List. Update all the *List types so they implement GetListMeta.
This helps avoid using reflection to get list information.
Remove all unnecessary boilerplate, move the interfaces to the right
places, and add a test that verifies that objects implement one, the
other, but never both.
@ncdc @lavalamp this supercedes #26964 with the boilerplate removed. Added tests
Automatic merge from submit-queue
Fix bug in isLikelyNotMountPoint function
In nsenter_mount.go/isLikelyNotMountPoint function, the returned output
from findmnt command misses the last letter. Modify the code to use
String.contains instead of string matching. fixes#26421fixes#25056fixes#22911
Automatic merge from submit-queue
Filter seccomp profile path from malicious .. and /
Without this patch with `localhost/<some-releative-path>` as seccomp profile one can load any file on the host, e.g. `localhost/../../../../dev/mem` which is not healthy for the kubelet.
/cc @jfrazelle
Unit tests depend on https://github.com/kubernetes/kubernetes/pull/26710.
Automatic merge from submit-queue
kubelet/kubenet: split hostport handling into separate module
This pulls the hostport functionality of kubenet out into a separate module so that it can be more easily tested and potentially used from other code (maybe CNI, maybe downstream consumers like OpenShift, etc). Couldn't find a mock iptables so I wrote one, but I didn't look very hard.
@freehan @thockin @bprashanth
Automatic merge from submit-queue
Revert revert of downward api node defaults
Reverts the revert of https://github.com/kubernetes/kubernetes/pull/27439Fixes#27062
@dchen1107 - who at Google can help debug why this caused issues with GKE infrastructure but not GCE merge queue?
/cc @wojtek-t @piosz @fgrzadkowski @eparis @pmorie
Automatic merge from submit-queue
Remove an empty line being output when exposing annotations and labels via downward api volume
The issue is that formatMap function (for annotations and labels) in pkg/fieldpath/fieldpath.go appends a "\n" after each key value pair which is correct for all pairs except the last pair because then a complete string is returned with a "\n" in the end. It is inconsistent with other strings (metadata.name, namespace and resources) being returned as they dont have "\n" in the end. These returned strings are processed by sortLines function in pkg/volume/downwardapi/downwardapi.go and the function finally appends "\n" to each string, but incorrectly outputs an empty line if there is an already "\n" in the end with the input string. To illustrate:
The sortLines works as follows: lets say the input string is : "a\nb\nc\n".
1. It splits them as "a", "b", "c", "" (note empty string in the end).
2. it sort them: "", "a", b", "c"
3. And then it appends "\n" again to each string: "\n", "a\n" ,"b\n", "c\n"
So we can see that it is erroneously creating an empty string in the beginning when the input string to sortLines has "\n" in the end. As I said above, it is not an issue with metadata.name, namespace and resources as their input strings are without \n" in the end.
So now, the output in the downward api volume, (using the example in http://kubernetes.io/docs/user-guide/downward-api/):
```
# cat /etc/annotations
zone="us-est-coast"
cluster="test-cluster1"
rack="rack-22"
```
After this patch, the output will be correct and without the erroneous empty line in the beginning.
I could think other ways to solve this but I found the way in this patch with minimal code changes.
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
refuse to create a firewall rule with no target tag
fixes#25145
This modification in gce.firewallObject() will return error when trying
to create or update firewall rule if no node tag can be found. Also add
unit test for this modification.
We had a long-lasting bug which prevented creation of volumes in
non-master zones, because the cloudprovider in the volume label
admission controller is not initialized with the multizone setting
(issue #27656).
This implements a simple workaround: if the volume is created with the
failure-domain zone label, we look for the volume in that zone. This is
more efficient, avoids introducing a new semantic, and allows users (and
the dynamic provisioner) to create volumes in non-master zones.
Fixes#27657
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
Automatic merge from submit-queue
pkg/client/leaderelection: log err when retrieving endpoint
The leader election code currently suppresses errors when trying to retrieve an endpoint. This can lead to difficult to debug situations.
In the case of a mis-configured controller-manager or scheduler - where they fail to contact an apiserver - this currently leads to no log output in the default case, or `failed to renew lease foo/bar` in `--v=4`, which isn't very actionable.
Automatic merge from submit-queue
fix updatePod() of RS and RC controllers
Fix updatePod of replication controller manager and replica set controller to handle pod label updates that match no RC or RS.
Fix#27405
If the mount operation exceeds the timeout, it will return an error and the
pod worker will retry in the next sync (10s or less). Compared with the
original value (i.e., 10 minutes), this frees the pod worker sooner to process
pod updates, if there are any.
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
Similarly to Nodes, PersistentVolumes are not in any namespace and we should
not block events on them. Currently, these events are rejected with
'Event "nfs.145841cf9c8cfaf0" is invalid: involvedObject.namespace: Invalid value: "": does not match involvedObject'
Automatic merge from submit-queue
kubelet/rkt - treat pod container as the infra - only network stats
As no "container name" annotation was being applied to the pod as a whole, the rkt pod container didn't have a container name label. This means that in stat/summary it came up as a nameless container that belonged to the pod.
this was problematic as it caused double counting of container stats.
this adds a container name annotation to the pod level which will be overridden during label creation by annotations of the same name at the container level for the containers themselves.
stats/summary will do the right thing as it will treat it the same as the infra container, just get network stats from it.
Suppress #26759
cc @kubernetes/sig-node @kubernetes/rktnetes-maintainers
Automatic merge from submit-queue
Rbac api group make subject apiversion optional
This fixes the verification for the "apiVerion" field in the RBAC subject and makes it optional. This field isn't used and currently won't pass validation if it's filled.
```yml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
name: admins
subject:
- kind: User
name: admin-user
# apiVersion: "entering anything here will fail validation"
roleRef:
kind: ClusterRole
name: admin
apiVersion: rbac.authorization.k8s.io/v1alpha1
```
Automatic merge from submit-queue
Quota uses old object provided via admission
fixes https://github.com/kubernetes/kubernetes/issues/26178
@sdminonne - fixes a bug in services not intercepting updates.
/cc @liggitt
In nsenter_mount.go/isLikelyNotMountPoint function, the returned output
from findmnt command misses the last letter. Modify the code to make sure
that output has the full target path. fix#26421#25056#22911
- replaces probeVolume with scsiHostRescan to scan hot attached disks
- fixes substring match of UUID returned from AttachDisk
- changes DetachDisk to take volumePath argument instead of diskID
- fixes delayed failure at mount rather than attach disk
- removes cloning of virtual disk in AttachDisk
The previous size, of 2KB, in practice always was filled completely by
http server-releated stuff well above the panic itself, and truncated
before anything of real value was printed.
This increases the stack size so that panics are printed in full.
This sets AttachOptions.CommandName dynamically depending on the corba Command
hierarchy. If the root command is named e.g. "oc" (for the OpenShift cli) this
will result in "oc attach" instead of the static "kubectl attach" before this
patch.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1341450
Automatic merge from submit-queue
Let kubelet log the DeletionTimestamp if it's not nil in update
This helps to debug if it's the kubelet to blame when a pod is not deleted.
Example output:
```
SyncLoop (UPDATE, "api"): "redis-master_default(c6782276-2dd4-11e6-b874-64510650ab1c):DeletionTimestamp=2016-06-08T23:58:12Z"
```
ref #26290
cc @Random-Liu
Automatic merge from submit-queue
Update reason_cache.go, Get method operate lru cache not threadsafe
The reason_cache wrapped lru cache , lru cache modies linked list even for a get, should use WLock for both read and write
Automatic merge from submit-queue
Fix docker api version in kubelet
There are two variables `dockerv110APIVersion` and `dockerV110APIVersion` with
the same purpose, but different values. Remove the incorrect one and fix usage
in the file.
/cc @dchen1107 @Random-Liu
Automatic merge from submit-queue
processor listener: fix locking in pop()
Currently the lock in processorListener is used to guard pendingNotifications. But in pop, it also locks around on select chan. This will block the goroutine with lock acquired.
This PR changes the lock to guard the correct section only.
Automatic merge from submit-queue
pkg/kubectl: add resource printers for rbac api group
This PR adds the necessary kubectl printers for the rbac api group which we overlooked in previous PRs.
cc @erictune
Automatic merge from submit-queue
ResourceQuota BestEffort scope aligned with Pod level QoS
This aligns quota with the changes in kubelet and CLI.
So if quota allows 10 `BestEffort` pods, it will now track properly with what the user sees with changes in 1.3.
```
apiVersion: v1
kind: ResourceQuota
metadata:
name: best-effort
spec:
hard:
pods: "10"
scopes:
- BestEffort
```
/cc @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
AWS: cache instances during service reload to avoid rate limiting on restart
Fixes#25610 by reducing redundant calls to DescribeInstances()
```release-note
* The AWS cloudprovider will cache results from DescribeInstances() if the set of nodes hasn't changed
```
Also move int/stringSlicesEqual from servicecontroller.go to pkg/util/slice
Automatic merge from submit-queue
Extract interface for master endpoints reconciler.
Make the master endpoints reconciler an interface so its implementation can be overridden, if
desired.
xref #20975#26574
cc @kubernetes/sig-api-machinery @lavalamp @smarterclayton @pmorie @DirectXMan12 @wojtek-t @kubernetes/rh-cluster-infra
OpenShift needs to be able to use a discovery client against a different
prefix. Make LegacyPrefix optional and parameterizable to the client. No
change to existing interfaces.
Automatic merge from submit-queue
fix recursive & non-recursive kubectl get of generic output format
This PR fixes the issues with `kubectl get` in https://github.com/kubernetes/kubernetes/issues/26466
Changes made:
- fix printing when using the generic output format in both non-recursive & recurvise settings to ensure that errors are being shown
- add tests to check printing generic output in a **non-recursive** setting with non-existent pods
- clean up the **recursive** `kubectl get` tests
/cc @janetkuo
Automatic merge from submit-queue
Sets IgnoreUnknown=1 in CNI_ARGS
```release-note
release-note-none
```
K8 uses CNI_ARGS to pass pod namespace, name and infra container
id to the CNI network plugin. CNI logic will throw an error
if these args are not known to it, unless the user specifies
IgnoreUnknown as part of CNI_ARGS. This PR sets IgnoreUnknown=1
to prevent the CNI logic from erroring and blocking pod setup.
https://github.com/appc/cni/pull/158https://github.com/appc/cni/issues/126
Automatic merge from submit-queue
Listing pods only once when getting pods for RS in deployment
Fixes#26834
1. Avoid ranging over RSes and then `List` pods of each RS. Instead, `List` pods of the deployment once, and then filter pods of each RS.
2. Avoid using clientset to `List` pods in deployment controller. Use podStore instead. (TODO in some functions because the unit tests don't have podStore.)
@kubernetes/deployment
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Since appc requires gid to be non-empty today (https://github.com/appc/spec/issues/623),
we have to error out when gid is empty instead of using the root gid.
Make unversioned.ListMeta implement List. Update all the *List types so they implement GetListMeta.
This helps avoid using reflection to get list information.
Remove all unnecessary boilerplate, move the interfaces to the right
places, and add a test that verifies that objects implement one, the
other, but never both.
Automatic merge from submit-queue
rkt: Replace 'journalctl' with rkt's GetLogs() API.
This replaced the `journactl` shell out with rkt's GetLogs() API.
Fixes#26997
To make this fully work, we need rkt to have this patch #https://github.com/coreos/rkt/pull/2763
cc @kubernetes/sig-node @euank @alban @iaguis @jonboulle
Automatic merge from submit-queue
AWS: support mixed plaintext/encrypted ports in ELBs via service.beta.kubernetes.io/aws-load-balancer-ssl-ports annotation
Fixes#26268
Implements the second SSL ELB annotation, per #24978
`service.beta.kubernetes.io/aws-load-balancer-ssl-ports=*` (comma-separated list of port numbers or e.g. `https`)
If not specified, all ports are secure (SSL or HTTPS).
Automatic merge from submit-queue
rkt: Do not run rkt pod inside a pre-created netns when network plugin is no-op
This fixed a panic where the returned pod network status is nil. (Fix#26540)
Also this makes lkvm stage1 able to run inside a user defined network, where the network name needs to be 'rkt.kubernetes.io'. A temporal solution to solve the network issue for lkvm stage1.
Besides, I fixed minor issues such as passing the wrong pod UID when cleaning up the netns file.
/cc @euank @pskrzyns @jellonek @kubernetes/sig-node
I tested with no networkplugin locally, works fine.
As a reminder, we need to document this in the release.https://github.com/kubernetes/kubernetes/issues/26201
This fixed a panic where the returned pod network status is nil.
Also this makes lkvm stage1 able to run inside a user defined
network, where the network name needs to be 'rkt.kubernetes.io'.
Also fixed minor issues such as passing the wrong pod UID, ignoring
logging errors.
Automatic merge from submit-queue
rkt: Fix incomplete selinux context string when the option is partial.
Fix "EmptyDir" e2e tests failures caused by #https://github.com/kubernetes/kubernetes/pull/24901
As mentioned in https://github.com/kubernetes/kubernetes/pull/24901#discussion_r61372312
We should apply the selinux context of the rkt data directory (/var/lib/rkt) when users do not specify all the selinux options.
Due to my fault, the change was missed during rebase, thus caused the regression.
After applying this PR, the e2e tests passed.
```
$ go run hack/e2e.go -v -test --test_args="--ginkgo.dryRun=false --ginkgo.focus=EmptyDir"
...
Ran 19 of 313 Specs in 199.319 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 294 Skipped PASS
```
BTW, the test is removed because the `--no-overlay=true` flag will only be there on non-coreos distro.
cc @euank @kubernetes/sig-node
Automatic merge from submit-queue
LBaaS v2 Support for Openstack Cloud Provider Plugin
Resolves#19774.
This work is based on Gophercloud support for LBaaS v2 currently in review (this will have to merge first):
https://github.com/rackspace/gophercloud/pull/575
These changes includes the addition of a new loadbalancer configuration option: **LBVersion**. If this configuration attribute is missing or anything other than "v2", lbaas v1 implementation will be used.
Automatic merge from submit-queue
GCE attach tests
Add basic tests for GCE attacher.
Looking at the code, it would deserve some refactoring as suggested in #25888, so mounting is not tested at all.
Automatic merge from submit-queue
Add specific error type for "operation already exists" error.
PersistentVolume controller needs to know why scheduling a new operation has failed - if the operation was already running or some other error happened.
Automatic merge from submit-queue
AWS: kubectl get service should print hostnames for LB services
Fixes#21526
Also test wide outputs. We only guarantee the first IP to be fully printed
if multiple ingresses are present. For AWS, which has no ingress IPs, but
only hostnames, the ELB hostname will be truncated, unless -o=wide is
specified.
Automatic merge from submit-queue
Fix NetworkPolicy validation bug
Fix bugs in NetworkPolicy resource (new in v1.3) validation.
Please add this to the v1.3 milestone.
Automatic merge from submit-queue
Preserve query strings in HTTP probes instead of escaping them
Fixes a problem reported on Slack by devth.
```release-note
* Allow the use of query strings and URI fragments in HTTP probes
```
This might also preserve fragments, for those crazy enough to pass them.
I am using url.Parse() on the path in order to get path/query/fragment
and also deliberately avoiding the addition of more fields to the API.
Automatic merge from submit-queue
Stop 'kubectl drain' deleting pods with local storage.
Kubectl drain will not continue if there are pods with local storage unless
forced with --delete-local-data.
Fixes#23972
Fixes#21526
Also test wide outputs. We only guarantee the first IP to be fully printed
if multiple ingresses are present. For AWS, which has no ingress IPs, but
only hostnames, the ELB hostname will be truncated, unless -o=wide is
specified.
Automatic merge from submit-queue
Resource quantity must support leading and trailing whitespace in JSON for back-compat
For backwards compatibility reasons, we must continue to support leading or trailing whitespace on Quantity values when deserialized from JSON. We must also support numbers serialized into yaml (`cpu: 1`) and JSON (`"cpu": 1`)
Fixes#26898
Automatic merge from submit-queue
Custom sort function for InitContainersStatuses
Order in init containers matters. Statues shoudln't be sorted by name.
Automatic merge from submit-queue
Move quota usage testing for loadbalancers into unit tests
Fixes https://github.com/kubernetes/kubernetes/issues/26319
* moved testing for node port and load balancer usage in quota to unit tests
* remove node port and node port -> loadbalancer service testing out of e2e
* covered already in replenishment_controller_test scenario
Given the time it takes to even allocate a load balancer, it seems better to test that outside of this test case to avoid unnecessary flakes.
/cc @bprashanth
There are two variables `dockerv110APIVersion` and `dockerV110APIVersion` with
the same purpose, but different values. Remove the incorrect one and fix usage
in the file.
Fixes#26268
Implements the second SSL ELB annotation, per #24978
service.beta.kubernetes.io/aws-load-balancer-ssl-ports=* (or e.g. https)
If not specified, all ports are secure (SSL or HTTPS).
Automatic merge from submit-queue
correction on rbd volume object and defaults
- add `omitempty` to `RBDPool RadosUser Keyring SecretRef ReadOnly`
- move defaults from `pkg/volume/rbd/rbd.go` to `pkg/api/v1/defaults.go`
addressing #18885
Double slashes are not allowed in annotation keys. Moreover, using the 63
characters of the name component in an annotation key will shorted the space
for the container name.
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
Automatic merge from submit-queue
rkt: Wrap exec errors as utilexec.ExitError
This is needed by the exec prober to distinguish error types and exit
codes correctly. Without this, the exec prober used for liveness probes
doesn't identify errors correctly and restarts aren't triggered. Fixes#26456
An alternative, and preferable solution would be to use utilexec
everywhere, but that change is much more involved and should come at a
later date. Unfortunately, until that change is made, writing tests for
this is quite difficult.
cc @yifan-gu @sjpotter
Automatic merge from submit-queue
volume controller: Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.
See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
Automatic merge from submit-queue
Fix GCE attacher/detacher to ignore return value of failed calls.
The plugin should ignore any return value if err is set. Found when writing unit tests in #26615 - my dummy `DiskIsAttached` returned `false, errors.New('fake error')` and the volume was **not** detached although the log message `"Error checking if PD (%q) is already attached to current node (%q). Will continue and try detach anyway."` suggested otherwise
@saad-ali, PTAL
@kubernetes/sig-storage
Automatic merge from submit-queue
Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
Fixes#26712
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV
In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.
To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.
Automatic merge from submit-queue
Fix typo and linewrap comments in PV controller
Fix some typos and linewrap long comments that I found while going over this code investigating something.
Automatic merge from submit-queue
Add timeout for image pulling
Fix#26300.
With this PR, if image pulling makes no progress for *1 minute*, the operation will be cancelled. Docker reports progress for every 512kB block (See [here](3d13fddd2b/pkg/progress/progressreader.go (L32))), *512kB/min* means the throughput is *<= 8.5kB/s*, which should be kind of abnormal?
It's a little hard to write unit test for this, so I just manually tested it. If I set the `defaultImagePullingStuckTimeout` to 0s, and `defaultImagePullingProgressReportInterval` to 1s, image pulling will be cancelled.
```
E0601 18:48:29.026003 46185 kube_docker_client.go:274] Cancel pulling image "nginx:latest" because of no progress for 0, latest progress: "89732b811e7f: Pulling fs layer "
E0601 18:48:29.026308 46185 manager.go:2110] container start failed: ErrImagePull: net/http: request canceled
```
/cc @kubernetes/sig-node
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
Automatic merge from submit-queue
Don't allow deps with no discernible license
This updates the few deps we had with no LICENSE file to current versions that do have that file. It also disallows new deps without obvious licenses.
Automatic merge from submit-queue
Attach/Detach Controller Kubelet Changes
This PR contains changes to enable attach/detach controller proposed in #20262.
Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
* There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9#issuecomment-221770924 opened to fix that.
* There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.
Fixes#14642, #19953
```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.
A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
Automatic merge from submit-queue
Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.
In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.
Fixes#25967 (properly)
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
opened to fix that.
Automatic merge from submit-queue
rkt: Get logs via syslog identifier
This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.
With this change, logs cannot be collected on rkt versions v1.6.0 and
before.
I'd like to also bump the required rkt version, but I don't want to do that until there's a released version that can be pointed to (so the next rkt release).
I haven't added tests (which were missing) because this code will be removed if/when logs are retrieved via the API. I have run E2E tests with this merged in and verified the tests which previously failed no longer fail.
cc @yifan-gu
Automatic merge from submit-queue
read gluster log to surface glusterfs plugin errors properly in describe events
glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log'
This PR does the following:
1. adds a gluster option for log-level=ERROR to remove all noise from log file
2. change log file name and path based on PV + Pod name - so specific per PV and Pod
3. create a utility to read the last two lines of the log file when failure occurs
old behavior:
```
13s 13s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log]
Output: Mount failed. Please check the log file for more details.
```
improved behavior: (updated after suggestions from community)
```
34m 34m 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log]
Output: Mount failed. Please check the log file for more details.
the following error information was pulled from the log to help resolve this issue:
[2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out)
[2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected)
```
also this PR is alternate approach to : #24624