Commit Graph

311 Commits (3215e8535ae8afcf850fbaac9df7ff9abe42f9e0)

Author SHA1 Message Date
Hongchao Deng 84c07b0bbf etcd3/store: userUpdate error should be returned 2016-05-03 14:42:29 +08:00
Clayton Coleman fdb110c859
Fix the rest of the code 2016-04-29 17:12:10 -04:00
k8s-merge-robot 11298d02e0 Merge pull request #24455 from hongchaodeng/fl
Automatic merge from submit-queue

Provide flags to use etcd3 backed storage

ref: #24405

What's in this PR?
- Add a new flag "storage-backend" to choose "etcd2" or "etcd3". By default (i.e. empty), it's "etcd2".
- Take out etcd config code into a standalone package and let it create etcd2 or etcd3 storage backend given user input.
2016-04-29 08:49:04 -07:00
k8s-merge-robot d0b887e4e0 Merge pull request #24595 from zhouhaibing089/httpserverclose
Automatic merge from submit-queue

Uncomment the code that caused by #19254

Fix https://github.com/kubernetes/kubernetes/issues/24546.

@lavalamp
2016-04-28 01:41:16 -07:00
Hongchao Deng c0071a1595 add flags to enable etcd3 2016-04-28 09:48:16 +08:00
k8s-merge-robot 28bc4b32c2 Merge pull request #24532 from rsc/master
Automatic merge from submit-queue

apiserver latency reductions

Combined effect of these two commits on the latency observed by the 1000-node kubemark benchmark:

```
name                               old ms/op  new ms/op   delta
LIST_nodes_p50                      127 ±16%    121 ± 9%   -4.58%  (p=0.000 n=29+27)
LIST_nodes_p90                      326 ±12%    266 ±12%  -18.48%  (p=0.000 n=29+27)
LIST_nodes_p99                      453 ±11%    400 ±14%  -11.79%  (p=0.000 n=29+28)
LIST_replicationcontrollers_p50    29.4 ±49%   26.2 ±54%     ~     (p=0.085 n=30+29)
LIST_replicationcontrollers_p90    83.0 ±78%   68.6 ±59%  -17.33%  (p=0.013 n=30+28)
LIST_replicationcontrollers_p99     216 ±43%    177 ±49%  -17.68%  (p=0.000 n=29+29)
DELETE_pods_p50                    24.5 ±14%   24.3 ±13%     ~     (p=0.562 n=30+29)
DELETE_pods_p90                    30.7 ± 1%   30.7 ± 1%   -0.30%  (p=0.011 n=29+29)
DELETE_pods_p99                    77.2 ±34%   54.2 ±23%  -29.76%  (p=0.000 n=30+27)
PUT_replicationcontrollers_p50     5.86 ±26%   5.94 ±32%     ~     (p=0.734 n=29+29)
PUT_replicationcontrollers_p90     15.8 ± 7%   15.5 ± 6%   -2.06%  (p=0.010 n=29+29)
PUT_replicationcontrollers_p99     57.8 ±35%   39.5 ±55%  -31.60%  (p=0.000 n=29+29)
PUT_nodes_p50                      14.9 ± 2%   14.8 ± 2%   -0.68%  (p=0.012 n=30+27)
PUT_nodes_p90                      16.5 ± 1%   16.3 ± 2%   -0.90%  (p=0.000 n=27+28)
PUT_nodes_p99                      57.9 ±47%   41.3 ±35%  -28.61%  (p=0.000 n=30+28)
POST_replicationcontrollers_p50    6.35 ±29%   6.34 ±20%     ~     (p=0.944 n=30+28)
POST_replicationcontrollers_p90    15.4 ± 5%   15.0 ± 5%   -2.18%  (p=0.001 n=29+29)
POST_replicationcontrollers_p99    52.2 ±71%   32.9 ±46%  -36.99%  (p=0.000 n=29+27)
POST_pods_p50                      8.99 ±13%   8.95 ±16%     ~     (p=0.903 n=30+29)
POST_pods_p90                      16.2 ± 4%   16.1 ± 4%     ~     (p=0.287 n=29+29)
POST_pods_p99                      30.9 ±21%   26.4 ±12%  -14.73%  (p=0.000 n=28+28)
POST_bindings_p50                  9.34 ±12%   8.92 ±15%   -4.54%  (p=0.013 n=30+28)
POST_bindings_p90                  16.6 ± 1%   16.5 ± 3%   -0.73%  (p=0.017 n=28+29)
POST_bindings_p99                  23.5 ± 9%   21.1 ± 4%  -10.09%  (p=0.000 n=27+28)
PUT_pods_p50                       10.8 ±11%   10.2 ± 5%   -5.47%  (p=0.000 n=30+27)
PUT_pods_p90                       16.1 ± 1%   16.0 ± 1%   -0.64%  (p=0.000 n=29+28)
PUT_pods_p99                       23.4 ± 9%   20.9 ± 9%  -10.93%  (p=0.000 n=28+27)
DELETE_replicationcontrollers_p50  2.42 ±16%   2.50 ±13%     ~     (p=0.054 n=29+28)
DELETE_replicationcontrollers_p90  11.5 ±12%   11.8 ±13%     ~     (p=0.141 n=30+28)
DELETE_replicationcontrollers_p99  19.5 ±21%   19.1 ±21%     ~     (p=0.397 n=29+29)
GET_nodes_p50                      0.77 ±10%   0.76 ±10%     ~     (p=0.317 n=28+28)
GET_nodes_p90                      1.20 ±16%   1.14 ±24%   -4.66%  (p=0.036 n=28+29)
GET_nodes_p99                      11.4 ±48%    7.5 ±46%  -34.28%  (p=0.000 n=28+29)
GET_replicationcontrollers_p50     0.74 ±17%   0.73 ±17%     ~     (p=0.222 n=30+28)
GET_replicationcontrollers_p90     1.04 ±25%   1.01 ±27%     ~     (p=0.231 n=30+29)
GET_replicationcontrollers_p99     12.1 ±81%  10.0 ±145%     ~     (p=0.063 n=28+29)
GET_pods_p50                       0.78 ±12%   0.77 ±10%     ~     (p=0.178 n=30+28)
GET_pods_p90                       1.06 ±19%   1.02 ±19%     ~     (p=0.120 n=29+28)
GET_pods_p99                       3.92 ±43%   2.45 ±38%  -37.55%  (p=0.000 n=27+25)
LIST_services_p50                  0.20 ±13%   0.20 ±16%     ~     (p=0.854 n=28+29)
LIST_services_p90                  0.28 ±15%   0.27 ±14%     ~     (p=0.219 n=29+28)
LIST_services_p99                  0.49 ±20%   0.47 ±24%     ~     (p=0.140 n=29+29)
LIST_endpoints_p50                 0.19 ±14%   0.19 ±15%     ~     (p=0.709 n=29+29)
LIST_endpoints_p90                 0.26 ±16%   0.26 ±13%     ~     (p=0.274 n=29+28)
LIST_endpoints_p99                 0.46 ±24%   0.44 ±21%     ~     (p=0.111 n=29+29)
LIST_horizontalpodautoscalers_p50  0.16 ±15%   0.15 ±13%     ~     (p=0.253 n=30+27)
LIST_horizontalpodautoscalers_p90  0.22 ±24%   0.21 ±16%     ~     (p=0.152 n=30+28)
LIST_horizontalpodautoscalers_p99  0.31 ±33%   0.31 ±38%     ~     (p=0.817 n=28+29)
LIST_daemonsets_p50                0.16 ±20%   0.15 ±11%     ~     (p=0.135 n=30+27)
LIST_daemonsets_p90                0.22 ±18%   0.21 ±25%     ~     (p=0.135 n=29+28)
LIST_daemonsets_p99                0.29 ±28%   0.29 ±32%     ~     (p=0.606 n=28+28)
LIST_jobs_p50                      0.16 ±16%   0.15 ±12%     ~     (p=0.375 n=29+28)
LIST_jobs_p90                      0.22 ±18%   0.21 ±16%     ~     (p=0.090 n=29+26)
LIST_jobs_p99                      0.31 ±28%   0.28 ±35%  -10.29%  (p=0.005 n=29+27)
LIST_deployments_p50               0.15 ±16%   0.15 ±13%     ~     (p=0.565 n=29+28)
LIST_deployments_p90               0.22 ±22%   0.21 ±19%     ~     (p=0.107 n=30+28)
LIST_deployments_p99               0.31 ±27%   0.29 ±34%     ~     (p=0.068 n=29+28)
LIST_namespaces_p50                0.21 ±25%   0.21 ±26%     ~     (p=0.768 n=29+27)
LIST_namespaces_p90                0.28 ±29%   0.26 ±25%     ~     (p=0.101 n=30+28)
LIST_namespaces_p99                0.30 ±48%   0.29 ±42%     ~     (p=0.339 n=30+29)
LIST_replicasets_p50               0.15 ±18%   0.15 ±16%     ~     (p=0.612 n=30+28)
LIST_replicasets_p90               0.22 ±19%   0.21 ±18%   -5.13%  (p=0.011 n=28+27)
LIST_replicasets_p99               0.31 ±39%   0.28 ±29%     ~     (p=0.066 n=29+28)
LIST_persistentvolumes_p50         0.16 ±23%   0.15 ±21%     ~     (p=0.124 n=30+29)
LIST_persistentvolumes_p90         0.21 ±23%   0.20 ±23%     ~     (p=0.092 n=30+25)
LIST_persistentvolumes_p99         0.21 ±24%   0.20 ±23%     ~     (p=0.053 n=30+25)
LIST_resourcequotas_p50            0.16 ±12%   0.16 ±13%     ~     (p=0.175 n=27+28)
LIST_resourcequotas_p90            0.20 ±22%   0.20 ±24%     ~     (p=0.388 n=30+28)
LIST_resourcequotas_p99            0.22 ±24%   0.22 ±23%     ~     (p=0.575 n=30+28)
LIST_persistentvolumeclaims_p50    0.15 ±21%   0.15 ±29%     ~     (p=0.079 n=30+28)
LIST_persistentvolumeclaims_p90    0.19 ±26%   0.18 ±34%     ~     (p=0.446 n=29+29)
LIST_persistentvolumeclaims_p99    0.19 ±26%   0.18 ±34%     ~     (p=0.446 n=29+29)
LIST_pods_p50                      68.0 ±16%   56.3 ± 9%  -17.19%  (p=0.000 n=29+28)
LIST_pods_p90                       119 ±19%     93 ± 8%  -21.88%  (p=0.000 n=28+28)
LIST_pods_p99                       230 ±18%    202 ±14%  -12.13%  (p=0.000 n=27+28)
```
2016-04-27 08:32:18 -07:00
zhouhaibing089 bf1a3f99c0 Uncomment the code that cause by #19254 2016-04-25 23:21:31 +08:00
Hongchao Deng b0f4517e65 etcd3/watcher: cancelling context shouldn't return error 2016-04-22 12:23:04 +08:00
Russ Cox 6a19e46ed6 pkg/storage: cache timers
A previous change here replaced time.After with an explicit
timer that can be stopped, to avoid filling up the active timer list
with timers that are no longer needed. But an even better fix is to
reuse the timers across calls, to avoid filling the allocated heap
with work for the garbage collector. On top of that, try a quick
non-blocking send to avoid the timer entirely.

For the e2e 1000-node kubemark test, basically everything gets faster,
some things significantly so. The 90th and 99th percentile for LIST nodes
in particular are the worst case that has caused SLO/SLA problems
in the past, and this reduces 99th percentile by 10%.

name                               old ms/op  new ms/op   delta
LIST_nodes_p50                      127 ±16%    124 ±13%     ~     (p=0.136 n=29+29)
LIST_nodes_p90                      326 ±12%    278 ±15%  -14.85%  (p=0.000 n=29+29)
LIST_nodes_p99                      453 ±11%    405 ±19%  -10.70%  (p=0.000 n=29+28)
LIST_replicationcontrollers_p50    29.4 ±49%   26.6 ±43%     ~     (p=0.176 n=30+29)
LIST_replicationcontrollers_p90    83.0 ±78%   68.7 ±63%  -17.30%  (p=0.020 n=30+29)
LIST_replicationcontrollers_p99     216 ±43%    173 ±41%  -19.53%  (p=0.000 n=29+28)
DELETE_pods_p50                    24.5 ±14%   24.3 ±17%     ~     (p=0.562 n=30+28)
DELETE_pods_p90                    30.7 ± 1%   30.6 ± 0%   -0.44%  (p=0.000 n=29+28)
DELETE_pods_p99                    77.2 ±34%   56.3 ±27%  -26.99%  (p=0.000 n=30+28)
PUT_replicationcontrollers_p50     5.86 ±26%   5.83 ±36%     ~     (p=1.000 n=29+28)
PUT_replicationcontrollers_p90     15.8 ± 7%   15.9 ± 6%     ~     (p=0.936 n=29+28)
PUT_replicationcontrollers_p99     57.8 ±35%   56.7 ±41%     ~     (p=0.725 n=29+28)
PUT_nodes_p50                      14.9 ± 2%   14.9 ± 1%   -0.55%  (p=0.020 n=30+28)
PUT_nodes_p90                      16.5 ± 1%   16.4 ± 2%   -0.60%  (p=0.040 n=27+28)
PUT_nodes_p99                      57.9 ±47%   44.6 ±42%  -23.02%  (p=0.000 n=30+29)
POST_replicationcontrollers_p50    6.35 ±29%   6.33 ±23%     ~     (p=0.957 n=30+28)
POST_replicationcontrollers_p90    15.4 ± 5%   15.2 ± 6%   -1.14%  (p=0.034 n=29+28)
POST_replicationcontrollers_p99    52.2 ±71%   53.4 ±52%     ~     (p=0.720 n=29+27)
POST_pods_p50                      8.99 ±13%   9.33 ±13%   +3.79%  (p=0.023 n=30+29)
POST_pods_p90                      16.2 ± 4%   16.3 ± 4%     ~     (p=0.113 n=29+29)
POST_pods_p99                      30.9 ±21%   28.4 ±23%   -8.26%  (p=0.001 n=28+29)
POST_bindings_p50                  9.34 ±12%   8.98 ±17%     ~     (p=0.083 n=30+29)
POST_bindings_p90                  16.6 ± 1%   16.5 ± 2%   -0.76%  (p=0.000 n=28+26)
POST_bindings_p99                  23.5 ± 9%   21.4 ± 5%   -8.98%  (p=0.000 n=27+27)
PUT_pods_p50                       10.8 ±11%   10.3 ± 5%   -4.67%  (p=0.000 n=30+28)
PUT_pods_p90                       16.1 ± 1%   16.0 ± 1%   -0.55%  (p=0.003 n=29+29)
PUT_pods_p99                       23.4 ± 9%   21.6 ±14%   -8.03%  (p=0.000 n=28+28)
DELETE_replicationcontrollers_p50  2.42 ±16%   2.50 ±13%     ~     (p=0.072 n=29+29)
DELETE_replicationcontrollers_p90  11.5 ±12%   11.7 ±10%     ~     (p=0.190 n=30+28)
DELETE_replicationcontrollers_p99  19.5 ±21%   19.0 ±22%     ~     (p=0.298 n=29+28)
GET_nodes_p90                      1.20 ±16%   1.18 ±19%     ~     (p=0.626 n=28+29)
GET_nodes_p99                      11.4 ±48%    8.3 ±40%  -27.31%  (p=0.000 n=28+28)
GET_replicationcontrollers_p90     1.04 ±25%   1.03 ±21%     ~     (p=0.682 n=30+29)
GET_replicationcontrollers_p99     12.1 ±81%  10.0 ±123%     ~     (p=0.135 n=28+28)
GET_pods_p90                       1.06 ±19%   1.08 ±21%     ~     (p=0.597 n=29+29)
GET_pods_p99                       3.92 ±43%   2.81 ±39%  -28.39%  (p=0.000 n=27+28)
LIST_pods_p50                      68.0 ±16%   65.3 ±13%     ~     (p=0.066 n=29+29)
LIST_pods_p90                       119 ±19%    115 ±12%     ~     (p=0.091 n=28+27)
LIST_pods_p99                       230 ±18%    226 ±21%     ~     (p=0.251 n=27+28)
2016-04-21 15:53:47 -04:00
k8s-merge-robot 85de6acadc Merge pull request #23208 from deads2k/fix-version-override
Automatic merge from submit-queue

make storage enablement, serialization, and location orthogonal

This allows a caller (command-line, config, code) to specify multiple separate pieces of config information regarding storage and have them properly composed at runtime.  The information provided is exposed through interfaces to allow alternate implementations, which allows us to change the expression of the config moving forward.  I also fixed up the types to be correct as I moved through.

The same options still exist, but they're composed slightly differently
 1. specify target etcd servers per Group or per GroupResource
 1. specify storage GroupVersions per Groups or per GroupResource
 1. specify etcd prefixes per GroupVersion or per GroupResource
 1. specify that multiple GroupResources share the same location in etcd
 1. enable GroupResources by GroupVersion or by GroupResource whitelist or GroupResource blacklist

The `storage.Interface` is built per GroupResource by:
 1. find the set of possible storage GroupResource based on the priority list of cohabitators
 1. choose a GroupResource from the set by looking at which Groups have the resource enabled
 1. find the target etcd server, etcd prefix, and storage encoding based on the GroupResource

The API server can have its resources separately enabled, but for now I've kept them linked.

@liggitt I think we need this (or something like it) to be able to go from config to these interfaces.  Given another round of refactoring, we may be able to reshape these to be more forward driving.

@smarterclayton this is important for rebasing and for a seamless 1.2 to 1.3 migration for us.
2016-04-21 08:24:29 -07:00
k8s-merge-robot 0a5d57a383 Merge pull request #24079 from hongchaodeng/comp
Automatic merge from submit-queue

etcd3 store: provide compactor util

What's this PR?
- Provides a util to compact keys in etcd.

Reason:
We want to save the most recent 10 minutes event history. It should be more than enough for slow watchers. It is not number based, so it can tolerate event bursts too. We do not want to save longer since the current storage API cannot take advantage of the multi-version key yet. We might keep a longer history in the future.
2016-04-21 05:19:54 -07:00
deads2k 6670b73b18 make storage enablement, serialization, and location orthogonal 2016-04-21 08:18:55 -04:00
Hongchao Deng 2bc022aad4 watcher test: print more info for debugging 2016-04-21 06:56:50 +08:00
Hongchao Deng 46214c60bb etcd3/store: support TTL in Create, Update 2016-04-19 07:35:59 +08:00
Hongchao Deng e18b4e67be etcd3/store: watcher implementation 2016-04-18 21:41:53 +08:00
k8s-merge-robot 5f3f06f0b1 Merge pull request #24022 from hongchaodeng/dep
Automatic merge from submit-queue

Bump up etcd dependency to fix data race

ref: https://github.com/kubernetes/kubernetes/pull/23694

What this PR does
- Bumping up the godep of etcd to fix data race in etcd watcher. Without this change, watcher PR builds will fail in race detection.
- Small changes to fix builds after upgrade
2016-04-17 12:01:32 -07:00
k8s-merge-robot 2bf52175f9 Merge pull request #23923 from hongchaodeng/exp
Automatic merge from submit-queue

Decouple etcd node.expiration logic from DeleitonTimestamp

ref: https://github.com/kubernetes/kubernetes/issues/23902
2016-04-17 04:12:26 -07:00
k8s-merge-robot a275a045d1 Merge pull request #23914 from sky-uk/make-etcd-cache-size-configurable
Automatic merge from submit-queue

Make etcd cache size configurable

Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.

I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.

There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.

This is provided as a fix for #23323
2016-04-17 00:06:31 -07:00
Hongchao Deng 9f43a110d9 file rename; refactor 2016-04-16 01:51:29 +08:00
Andy Goldstein 049e63d253 Honor starting resourceVersion in watch cache
Compare the requested resourceVersion to each event's resourceVersion to ensure events that occurred
in the past are not sent to the client.
2016-04-14 09:37:22 -04:00
Hongchao Deng b9745999c9 Decouple etcd node.expiration logic from DeleitonTimestamp 2016-04-13 15:11:53 -07:00
Daniel Smith 4c539bf082 Merge pull request #23490 from wojtek-t/remove_set_from_storage_interface
Remove Set() from storage.Interface.
2016-04-13 14:22:05 -07:00
Jordan Liggitt ada60236f7 Make watch cache behave like uncached watch 2016-04-12 10:14:07 -04:00
James Ravn 5bb0595260 Make deserialization cache size configurable
Instead of the default 50K entries, allow users to specify more sensible
sizes for their cluster.
2016-04-12 13:42:27 +01:00
Hongchao Deng ab9ac70e56 etcd3 store: provide compactor util 2016-04-09 11:01:27 -07:00
Hongchao Deng 71b46f3f57 fix build 2016-04-07 19:22:28 -07:00
Wojciech Tyczynski 53f433f019 Remove Set() from storage.Interface. 2016-04-04 17:54:18 +02:00
k8s-merge-robot f5c93c8ddc Merge pull request #23472 from wojtek-t/fix_object_meta_for
Automatic merge from submit-queue

Switch from api.ObjectMetaFor to meta.Accessor in most of places

Fix #23278

@smarterclayton @lavalamp
2016-04-02 02:33:40 -07:00
Wojciech Tyczynski 2699be2e7e Switch api.ObjetaMetaFor to meta.Accessor 2016-03-31 17:52:31 +02:00
Hongchao Deng 00ddf0671d etcd (v3) store: implements KV methods of storage.Interface
This implements Get(), Create(), Delete(), GetToList(),
List(), GuaranteedUpdate().
2016-03-30 10:20:39 -07:00
Chao Xu 31b425b3a1 add delete precondition 2016-03-25 11:21:39 -07:00
k8s-merge-robot 4e4ad61260 Merge pull request #23366 from goltermann/vet
Auto commit by PR queue bot
2016-03-24 21:50:56 -07:00
k8s-merge-robot 2777cd7e75 Merge pull request #23295 from hongchaodeng/error
Auto commit by PR queue bot
2016-03-23 02:27:36 -07:00
k8s-merge-robot 4af38b52b9 Merge pull request #22736 from resouer/fix-util-dev
Auto commit by PR queue bot
2016-03-22 19:54:58 -07:00
goltermann 34d4eaea08 Fixing several (but not all) go vet errors. Most are around string formatting, or unreachable code. 2016-03-22 17:26:50 -07:00
Hongchao Deng 189ce6e397 storage: add custom storage error 2016-03-22 08:19:16 -07:00
k8s-merge-robot 2bb6f74bf9 Merge pull request #23099 from shawnps/patch-12
Auto commit by PR queue bot
2016-03-21 09:19:21 -07:00
harry 26dad2c428 Update generated docs 2016-03-21 15:36:24 +08:00
harry b6924a322a Refactor cache into util sub pkg 2016-03-21 14:50:57 +08:00
k8s-merge-robot 08c706a8ab Merge pull request #23194 from hongchaodeng/dep
Auto commit by PR queue bot
2016-03-19 06:35:17 -07:00
Hongchao Deng 0a1ff0bb0b fix EtcdTestServer 2016-03-18 23:39:48 -07:00
Russ Cox e4b369e1d7 storage: clean up timer in cacheWatcher.add
In the e2e benchmarks, this timer is a significant source of garbage
and stale timers. Because the timer is not stopped after its use
in the select, it stays in the timer heap until it eventually fires
(5 seconds later). Under load, a lot of 5-second timers can pile up
before any start going away. The timer heap being large makes timer
operations take longer; the operations are O(log N) but N is still big.

The way to fix this in current versions of Go is to stop the underlying
timer explicitly, which this CL does for this one case.

There are many other places in the code that use the same idiom,
but those do not show up on profiles of the e2e server.
I am investigating changes for Go 1.7's runtime that would make
the old code behave like this new code transparently, so I don't
think it's worth updating any uses of the idiom that are not in
hot spots found with profiling.

Measuring 'LIST nodes' latency in milliseconds during e2e test
shows the benefit of this change.

Using Go 1.4.2:

BEFORE  p50: 148±7   p90: 328±19  p99: 513±29  n: 10
AFTER   p50: 151±8   p90: 339±19  p99: 479±20  n: 9

Using Go 1.6.0:

BEFORE  p50: 141±9   p90: 383±32  p99: 604±44  n: 11
AFTER   p50: 140±14  p90: 360±31  p99: 483±39  n: 10
2016-03-18 15:58:34 -04:00
Shawn Smith 0ea3e43f1c use Fatalf 2016-03-17 15:18:04 +09:00
deads2k ab03317d96 support CIDRs in NO_PROXY 2016-03-16 16:22:54 -04:00
Jordan Liggitt a1c2267f20 Decrease parallelism in deletecollection test, lengthen test etcd certs 2016-03-12 18:30:12 -05:00
AdoHe 7228b9b987 restore ability to run against secured etcd 2016-03-11 11:21:16 -05:00
Fabio Yeon d25449d58e Merge pull request #21310 from wojtek-t/require_versioner
Require versioner in etcdHelper to be non-null.
2016-02-26 15:44:59 -08:00
k8s-merge-robot 90d1276507 Merge pull request #21223 from hongchaodeng/fix
Auto commit by PR queue bot
2016-02-22 07:41:45 -08:00
Daniel Smith 3fb020b28d Fix a locking bug in the cacher. 2016-02-19 17:45:02 -08:00
k8s-merge-robot 17325ef6ef Merge pull request #20501 from piosz/hpa-ga
Auto commit by PR queue bot
2016-02-18 06:52:39 -08:00
Wojciech Tyczynski 20d704aa06 Fix cacher_test unit test 2016-02-18 10:25:04 +01:00
Wojciech Tyczynski 35a3394a0a Require versioner in etcdHelper to be non-null. 2016-02-16 16:22:43 +01:00
Daniel Smith 74400c33ae changes for cross-group moves 2016-02-15 21:39:00 +01:00
Hongchao Deng 0d89cfd6e5 remove unnecessary error check 2016-02-13 22:25:46 -08:00
k8s-merge-robot f43b849e54 Merge pull request #20770 from liggitt/ugorji-var-reset
Auto commit by PR queue bot
2016-02-13 14:59:38 -08:00
Jordan Liggitt dd5d98d80a regen codecs 2016-02-13 09:15:39 -05:00
Wojciech Tyczynski 2e97793840 Don't store no-op updates in etcd. 2016-02-12 09:23:28 +01:00
k8s-merge-robot b3367b90d6 Merge pull request #20892 from lavalamp/fix-us-log
Auto commit by PR queue bot
2016-02-10 06:43:42 -08:00
Daniel Smith 4e85d42f99 fix logging every microsecond when etcd goes down 2016-02-09 00:12:19 -08:00
Jordan Liggitt 545f6be573 Regenerate types.go 2016-02-08 17:19:15 -05:00
Jan Chaloupka 4389b3f0d6 Rewritte util.* -> wait.* wherever reasonable 2016-02-07 12:02:20 +01:00
Wojciech Tyczynski d3639aff73 Fix deadlock in watch cache 2016-02-05 08:31:55 +01:00
Wojciech Tyczynski f23034d5da Reset more metrics before scalability tests 2016-02-04 11:47:13 +01:00
Jeff Lowdermilk caa9433234 Merge pull request #20433 from lavalamp/fix-bad-rv
Add timeout, fix potential startup hang
2016-02-02 17:27:23 -08:00
k8s-merge-robot c5260c8c71 Merge pull request #20145 from mqliang/quorum-read
Auto commit by PR queue bot
2016-02-02 05:50:41 -08:00
Daniel Smith 26683fda29 add timeout to cacher 2016-02-01 15:36:15 -08:00
harry 1032067ff9 Replace runtime reference by pkg 2016-02-01 21:06:44 +08:00
mqliang b0e06c14e5 add a knob to enable quorum read 2016-01-30 20:32:12 +08:00
Chao Xu ebcff4b5e4 fix the namespaceScoped of cachers 2016-01-28 16:24:54 -08:00
Filip Grzadkowski 2200aacbad Add tracing information in etcd helper.
Ref #19036
2016-01-26 17:18:15 +01:00
Brendan Burns b8b074f41f Disable closing of the httptest Server due to https://github.com/golang/go/issues/12262 2016-01-25 15:31:48 -08:00
k8s-merge-robot d3b869ae14 Merge pull request #17922 from smarterclayton/split_codec
Auto commit by PR queue bot
2016-01-25 06:30:39 -08:00
Daniel Smith c0ffbd58db Remove TestWatchEtcdError
We decided to remove this test, as there's no way to get an upper bound
on its running time. Etcd restart behavior should be tested in
integration or e2e tests.
2016-01-22 16:43:44 -08:00
Clayton Coleman 4a6935b31f Remaining codec change refactors 2016-01-22 13:27:27 -05:00
Clayton Coleman 33085c0cf2 Update tests to handle codec changes 2016-01-22 13:27:26 -05:00
Mike Danese 558d0cc65d Merge pull request #19617 from hongchaodeng/watch_cache
small refactor on watch cache
2016-01-14 17:04:13 -08:00
Hongchao Deng 821a196373 small refactor on watch cache 2016-01-13 14:00:59 -08:00
Hongchao Deng c5cebf44ce typo fix in watch_cache.go 2016-01-13 11:23:34 -08:00
David Oppenheimer 8ac484793d Comment out calls to httptest.Server.Close() to work around
https://github.com/golang/go/issues/12262 . See #19254 for
more details. This change should be reverted when we upgrade
to Go 1.6.
2016-01-11 23:02:11 -08:00
Wojciech Tyczynski c8ad31161b Workaround races while closing etcd server. 2016-01-03 08:48:39 +01:00
Wojciech Tyczynski 60fc2bc09e Fix cacher_test flake 2015-12-31 07:53:41 +01:00
Wojciech Tyczynski 65696989b2 Extend logging for debugging 18928 2015-12-30 20:09:05 +01:00
Wojciech Tyczynski 9ada897057 Fix master_test flake 2015-12-30 13:49:43 +01:00
Wojciech Tyczynski 05b60a30cf Fix flakes in cacher_test 2015-12-28 15:28:07 +01:00
k8s-merge-robot 451f4bdbc2 Merge pull request #19130 from wojtek-t/fix_stop_cacher
Auto commit by PR queue bot
2015-12-28 03:45:34 -08:00
Wojciech Tyczynski ec70eb16f3 Graceul termination in Cacher 2015-12-28 10:54:21 +01:00
Wojciech Tyczynski 0b7dce9505 Fix a bug in etcd_watcher_test 2015-12-28 10:43:33 +01:00
Mike Danese bee582293c Merge pull request #19010 from wojtek-t/debug_watcher_test
Fix etcdWatcher test
2015-12-22 14:07:41 -08:00
Wojciech Tyczynski 9f35eebb81 Fix timeout in etcdwatcher tests 2015-12-22 16:56:54 +01:00
Wojciech Tyczynski 8cd50dd005 Fix etcdWatcher test 2015-12-22 15:40:52 +01:00
Wojciech Tyczynski 41c7835039 Fix race in watch tests 2015-12-22 13:21:02 +01:00
Wojciech Tyczynski 2b8854ba28 Close watchers in tests 2015-12-22 09:54:05 +01:00
Wojciech Tyczynski 6297232112 Fix race in EtcdWatcher 2015-12-21 19:24:37 +01:00
Wojciech Tyczynski 1f24297e7a Merge pull request #18921 from timothysc/etcd_datarace
Fix data race on cancel variable in etcd code
2015-12-19 08:09:45 +01:00
Timothy St. Clair e3311aa93a Fix data race on cancel variable in etcd code 2015-12-18 18:37:45 -06:00
Timothy St. Clair c13e4c8c2f re-disable TestWatchEtcdError due to flake 2015-12-18 17:57:46 -06:00
Wojciech Tyczynski 96b5ca0cc5 Fix test issue in Go 1.5 2015-12-18 14:17:52 +01:00
Wojciech Tyczynski d1e039b646 Merge pull request #18635 from timothysc/etcd_client_post_cleanup
Update to use latest etcd client library
2015-12-18 14:14:21 +01:00
Eric Tune 1965fe1824 Rerun hack/update-codecgen.sh 2015-12-17 13:51:33 -08:00
Eric Tune 1752cf22d4 Merge pull request #17940 from soltysh/job_deadline
Added ActiveDeadlineSeconds to jobs
2015-12-17 13:11:13 -08:00
Maciej Szulik 327c104460 Added ActiveDeadlineSeconds to jobs, allowing failing a job after
exceeding allowed time.
2015-12-17 15:26:42 +01:00
deads2k 9fda7f1812 update StatusDetails to handle Groups 2015-12-17 09:14:12 -05:00
Brendan Burns 2efcccf981 Add a server side export facility 2015-12-16 15:01:13 -08:00
Timothy St. Clair c505a5d49d Updating kubernetes proper to use latest etcd client library 2015-12-16 15:56:35 -06:00
k8s-merge-robot e309583ff1 Merge pull request #18473 from smarterclayton/change_runtime_object
Auto commit by PR queue bot
2015-12-16 04:24:22 -08:00
Clayton Coleman 8f203a28f1 Change runtime.Object signature 2015-12-15 13:36:25 -05:00
deads2k 6e33403abf update CodecFor for GroupVersion 2015-12-15 10:56:00 -05:00
k8s-merge-robot 1f0e46abb8 Merge pull request #16237 from ZJU-SEL/fix-util
Auto commit by PR queue bot
2015-12-14 18:41:14 -08:00
Wojciech Tyczynski 960808bf08 Switch to versioned ListOptions in client. 2015-12-14 14:26:09 +01:00
Harry Zhang 8fe92c69d2 Update highwatermark 2015-12-14 16:47:47 +08:00
harry zhang 5405a5d98d Move atomic_value into folder
Change pkg to atomic
2015-12-14 05:50:29 +00:00
Jeff Lowdermilk 9c49cdaa6e Merge pull request #18276 from thockin/airplane_validation_pt6
Validation cleanup parts 5 & 6 together
2015-12-11 13:34:37 -08:00
k8s-merge-robot d3243b8778 Merge pull request #18383 from timothysc/tools_removal
Auto commit by PR queue bot
2015-12-11 07:17:14 -08:00
Tim Hockin 7fb8f60735 Shorten names for better reading 2015-12-10 11:48:19 -08:00
Tim Hockin 87a35047dd Move FieldPath and errors to a sub-package
This makes the naming and reading a lot simpler.
2015-12-10 11:48:16 -08:00
Timothy St. Clair a428246960 Abstract the error handling for the storage layer to eliminate the
direct etcd dependency.
2015-12-10 08:06:19 -06:00
Timothy St. Clair 413d8d18fe Futher storage isolation and removal of the tools interface. 2015-12-09 11:04:14 -06:00
Wojciech Tyczynski 0cefb43707 Enable listing from memory 2015-12-09 16:24:14 +01:00
Wojciech Tyczynski 0369805308 Merge pull request #18207 from wojtek-t/string_resource_version
Change resourceVersion to string in storage.Interface
2015-12-09 15:00:54 +01:00
Wojciech Tyczynski a915b8b29a Merge pull request #18080 from wojtek-t/list_options_in_listwatch
Pass ListOptions to List in ListWatch.
2015-12-09 14:27:51 +01:00
deads2k 2ee3dfe415 update testapi to eliminate redundant fields 2015-12-07 15:54:26 -05:00
Wojciech Tyczynski b0fcb5adef Pass ListOptions to List in ListWatch. 2015-12-07 11:53:53 +01:00
Wojciech Tyczynski 793da62c7f Change resourceVersion to string in storage.Interface 2015-12-07 09:22:59 +01:00
Tim Hockin e6df0b1a24 Convert validation to use FieldPath
Before this change we have a mish-mash of ways to pass field names around for
error generation.  Sometimes string fieldnames, sometimes .Prefix(), sometimes
neither, often wrong names or not indexed when it should be.

Instead of that mess, this is part one of a couple of commits that will make it
more strongly typed and hopefully encourage correct behavior.  At least you
will have to think about field names, which is better than nothing.

It turned out to be really hard to do this incrementally.
2015-12-03 08:19:44 -08:00
k8s-merge-robot 55f5e48047 Merge pull request #16628 from caesarxuchao/change-error-type
Auto commit by PR queue bot
2015-11-30 17:21:52 -08:00
k8s-merge-robot 158fecd474 Merge pull request #17776 from wojtek-t/move_etcd_errors_from_tools
Auto commit by PR queue bot
2015-11-27 11:57:18 -08:00
k8s-merge-robot fc694ea787 Merge pull request #16725 from wojtek-t/update_ugorji
Auto commit by PR queue bot
2015-11-25 22:14:20 -08:00
Chao Xu a4700707b3 change the "too old resource version" error from InternalError to 410 Gone. 2015-11-25 10:27:03 -08:00
deads2k ed95a6d77f update scheme to use GroupVersion 2015-11-25 12:15:48 -05:00
k8s-merge-robot 3bd23b185b Merge pull request #17730 from wojtek-t/use_unversioned_list_options_in_client
Auto commit by PR queue bot
2015-11-25 09:10:19 -08:00
Wojciech Tyczynski 65c381bfdb Hide internal etcd errors. 2015-11-25 14:47:08 +01:00
k8s-merge-robot e95e3dec42 Merge pull request #17414 from timstclair/apiserver
Auto commit by PR queue bot
2015-11-25 05:28:07 -08:00
Wojciech Tyczynski 58062bc347 Regenerate files 2015-11-25 12:34:05 +01:00
k8s-merge-robot 431c67710b Merge pull request #17247 from thockin/airplane_validation_pt3
Auto commit by PR queue bot
2015-11-24 18:37:09 -08:00
Wojciech Tyczynski b6ef62af24 Use unversioned.ListOptions in clients. 2015-11-24 16:52:09 +01:00
k8s-merge-robot 06ef4b0a83 Merge pull request #17156 from feihujiang/moveListFunctionsFromRuntimeToMetaPackage
Auto commit by PR queue bot
2015-11-23 14:13:25 -08:00
Tim St. Clair 20ead45af9 Move etcd_util.go to separate package 2015-11-23 11:32:50 -08:00
Tim Hockin ceee678b29 Rename validation 'New' funcs 2015-11-23 10:01:43 -08:00
Tim Hockin 48b49a5cae s/ValidationErrorList/ErrorList/ 2015-11-22 20:13:20 -08:00
Tim Hockin 0ff66da346 Move fielderrors into validation 2015-11-22 20:12:20 -08:00
feihujiang ad79fa6e84 Move list functions from runtime to meta package 2015-11-20 09:20:55 +08:00
Timothy St. Clair 02851dd1b7 Removal of fakeClient and shift to storage.Interface for
all registry tests.
2015-11-19 10:34:30 -06:00
Wojciech Tyczynski a5a8717539 Pass versioner to cacher. 2015-11-13 08:35:28 +01:00
Wojciech Tyczynski 3df5d1dbc3 Move storage-related dirs under pkg/storage. 2015-11-12 19:49:32 +01:00
Daniel Smith 45a1ec73bb Lengthen delay 2015-11-06 13:03:58 -08:00
Wojciech Tyczynski b6a775ca50 Terminate watcher if it is full 2015-11-06 13:40:21 +01:00
k8s-merge-robot 6b7115067d Merge pull request #16807 from smarterclayton/server_backpressure_on_etcd_down
Auto commit by PR queue bot
2015-11-05 21:19:30 -08:00
Timothy St. Clair f6f2f41ab3 Removal of fakeClient from registry/generic/etcd/etcd_test.go in leiu of
NewEtcdTestClientServer
2015-11-05 08:28:58 -06:00
Clayton Coleman 3da15535b6 Provide backpressure to clients when etcd goes down
When etcd is down today we don't specifically handle the error involved,
which means clients get a generic 500 error. This commit adds a formal
error type internally for both WatchExpired and EtcdUnreachable, and
then converts them to api/errors before returning to the client. It also
upgrades the client to retry on any 429 or 5xx error that has a
Retry-After header, instead of just 429.

In combination, this allows the apiserver to exert backpressure on
controllers that are hotlooping.  Picked 2 seconds by default, but we
could potentially ramp that up even further in a future iteration.
2015-11-04 16:05:12 -05:00
k8s-merge-robot ac63290ff5 Merge pull request #16781 from timothysc/fix-testing-deps
Auto commit by PR queue bot
2015-11-04 01:58:34 -08:00