Commit Graph

264 Commits (30eb1aa7c5a20245a29168d23df275a91230ffef)

Author SHA1 Message Date
Zihong Zheng 6004452bed Auto-updated BUILD files 2018-02-27 11:18:11 -08:00
Zihong Zheng f6eed81f21 [kube-proxy] Mass service/endpoint info functions rename and comments 2018-02-27 11:14:02 -08:00
Zihong Zheng 95cde4fb98 [kube-proxy] Harden change tracker and proxiers for unmatched IP versions 2018-02-27 11:14:02 -08:00
Zihong Zheng dfbec1a63a [kube-proxy] Move ipv6 related funcs to utils pkg 2018-02-27 11:12:45 -08:00
Zihong Zheng b485f7b5b4 [kube-proxy] Move Service/EndpointInfo common codes to change tracker 2018-02-27 11:05:59 -08:00
Kubernetes Submit Queue 42378eab40
Merge pull request #58052 from m1093782566/nodeip-config
Automatic merge from submit-queue (batch tested with PRs 60430, 60115, 58052, 60355, 60116). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make nodeport ip configurable

**What this PR does / why we need it**:

By default, kube-proxy accepts everything from NodePort without any filter. It can be a problem for nodes which has both public and private NICs, and people only want to provide a service in private network and avoid exposing any internal service on the public IPs.

This PR makes nodeport ip configurable.

**Which issue(s) this PR fixes**:
Closes: #21070

**Special notes for your reviewer**:

Design proposal see: https://github.com/kubernetes/community/pull/1547

Issue in feature repo: https://github.com/kubernetes/features/issues/539

**Release note**:

```release-note
Make NodePort IP addresses configurable
```
2018-02-27 09:38:44 -08:00
Kubernetes Submit Queue 05425f0826
Merge pull request #60256 from danwinship/review-iptables-stuff
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add me to iptables/kube-proxy reviewers

kube-proxy needs reviewers!
2018-02-26 07:50:58 -08:00
m1093782566 9bb4807e25 update bazel 2018-02-26 23:48:48 +08:00
m1093782566 ddfa04e8f4 iptables part implementation 2018-02-26 23:48:47 +08:00
Kubernetes Submit Queue c11ae9d21e
Merge pull request #60306 from danwinship/proxier-connstate-new
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Only run connection-rejecting rules on new connections

Kube-proxy has two iptables chains full of rules to reject incoming connections to services that don't have any endpoints. Currently these rules get tested against all incoming packets, but that's unnecessary; if a connection to a given service has already been established, then we can't have been rejecting connections to that service. By only checking the first packet in each new connection, we can get rid of a lot of unnecessary checks on incoming traffic.

Fixes #56842

**Release note**:
```release-note
Additional changes to iptables kube-proxy backend to improve performance on clusters with very large numbers of services.
```
2018-02-24 16:19:56 -08:00
Kubernetes Submit Queue c1a73ea685
Merge pull request #59286 from prameshj/udp-conntrack
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Delete stale UDP conntrack entries that use hostPort

**What this PR does / why we need it**:
This PR introduces a change to delete stale conntrack entries for UDP connections, specifically for udp connections that use hostPort. When the pod listening on that udp port get updated/restarted(and gets a new ip address), these entries need to be flushed so that ongoing udp connections can recover once the pod is back and the new iptables rules have been installed. 
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #59033

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-02-23 19:54:08 -08:00
Kubernetes Submit Queue e6c2a5de10
Merge pull request #57461 from danwinship/proxier-no-dummy-nat-rules
Automatic merge from submit-queue (batch tested with PRs 55637, 57461, 60268, 60290, 60210). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Don't create no-op iptables rules for services with no endpoints

Currently for all services we create `-t nat -A KUBE-SERVICES` rules that match the destination IPs (ClusterIP, ExternalIP, NodePort IPs, etc) and then jump to the appropriate `KUBE-SVC-XXXXXX` chain. But if the service has no endpoints then the `KUBE-SVC-XXXXXX` chain will be empty and so nothing happens except that we wasted time (a) forcing iptables-restore to parse the match rules, and (b) forcing the kernel to test matches that aren't going to have any effect.

This PR gets rid of the match rules in this case. Which is to say, it changes things so that every incoming service packet is matched *either* by nat rules to rewrite it *or* by filter rules to ICMP reject it, but not both. (Actually, that's not quite true: there are no filter rules to reject Ingress-addressed packets, and I *think* that's a bug?)

I also got rid of some comments that seemed redundant.

The patch is mostly reindentation, so best viewed with `diff -w`.

Partial fix for #56842 / Related to #56164 (which it conflicts with but I'll fix that after one or the other merges).

**Release note**:
```release-note
Removed some redundant rules created by the iptables proxier, to improve performance on systems with very many services.
```
2018-02-23 09:49:38 -08:00
Dan Winship 225941679e Only run connection-rejecting rules on new connections 2018-02-23 08:50:58 -05:00
Pavithra Ramesh 098a4467fe Remove conntrack entry on udp rule add.
Moved conntrack util outside of proxy pkg
Added warning message if conntrack binary is not found
Addressed review comments.
ran gofmt
2018-02-22 23:34:42 -08:00
Kubernetes Submit Queue f0ca996274
Merge pull request #56164 from danwinship/proxier-chain-split
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Split KUBE-SERVICES chain to re-shrink the INPUT chain

**What this PR does / why we need it**:
#43972 added an iptables rule "`-A INPUT -j KUBE-SERVICES`" to make NodePort ICMP rejection work. (Previously the KUBE-SERVICES chain was only run from OUTPUT, not INPUT.) #44547 extended that patch for ExternalIP rejection as well.

However, the KUBE-SERVICES chain may potentially have a very large number of ICMP reject rules for plain ClusterIP services (the ones that get run from OUTPUT), and it seems that for some reason the kernel is much more sensitive to the length of the INPUT chain than it is to the length of the OUTPUT chain. So a node that worked fine with kube 1.6 (when KUBE-SERVICES was only run from OUTPUT) might fall over with kube 1.7 (with KUBE-SERVICES being run from both INPUT and OUTPUT).

(Specifically, a node with about 5000 ClusterIP reject rules that ran fine with OpenShift 3.6 [kube 1.6] slowed almost to a complete halt with OpenShift 3.7 [kube 1.7].)

This PR fixes things by splitting out the "new" part of KUBE-SERVICES (NodePort and ExternalIP reject rules) into a separate KUBE-EXTERNAL-SERVICES chain run from INPUT, and moves KUBE-SERVICES back to being only run from OUTPUT. (So, yes, this assumes that you don't have 5000 NodePort/ExternalIP services, but, if you do, there's not much we can do, since those rules *have* to be run on the INPUT side.)

Oh, and I left in the code to clean up the "`-A INPUT -j KUBE-SERVICES`" rule even though we don't generate it any more, so it gets fixed on upgrade.

**Release note**:
```release-note
Reorganized iptables rules to fix a performance regression on clusters with thousands of services.
```

@kubernetes/sig-network-bugs @kubernetes/rh-networking
2018-02-22 18:52:53 -08:00
Dan Winship fc03cfe7a8 add me to iptables/kube-proxy reviewers 2018-02-22 17:36:57 -05:00
Jeff Grafton ef56a8d6bb Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
Dan Winship 07ead7d8e2 Don't create no-op iptables rules for services with no endpoints 2018-02-13 07:52:47 -05:00
Kubernetes Submit Queue 9438e14d39
Merge pull request #52528 from m1093782566/refactor-proxy
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Refactor kube-proxy service/endpoints update so that can be consumed among different proxiers

**What this PR does / why we need it**:

There are huge duplication among different proxiers.  For example, the service/endpoints list/watch part in iptables, ipvs and windows kernel mode(to be get in soon).

I think the more places this is replicated the harder it becomes to keep correct. We may need to refactor it and let different proxiers consume the same code.

**Which issue this PR fixes**: 

fixes #52464

**Special notes for your reviewer**:

* This refactor reduces **500** Lines in iptables proxy, so it will reduce **500*N**(number of proxiers) lines in total. People no need to care the service/endpoints update logic any more and can be more focus on proxy logic.

* I would like to do the following things in follow-ups:

1. rsync it to ipvs proxier

2. rsync it to winkernel proxier

**Release note**:

```release-note
Refactor kube-proxy service/endpoints update so that can be consumed among different proxiers
```
2018-02-12 23:29:50 -08:00
m1093782566 b7dbaab96a update bazel BUILD 2018-02-09 17:26:22 +08:00
m1093782566 f3512cbbb9 iptables proxier part changes 2018-02-09 17:20:51 +08:00
Dan Winship 780d5954e0 Split out a KUBE-EXTERNAL-SERVICES chain so we don't have to run KUBE-SERVICES from INPUT 2018-02-07 10:20:52 -05:00
Kubernetes Submit Queue 283d35a481
Merge pull request #57336 from danwinship/proxier-simplification
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Abstract some duplicated code in the iptables proxier

Reorganizes the iptables proxier code so we only have the list of "-A FOO -j KUBE-BAR" rules in one place rather than duplicating the same list in multiple places. Split out from #56164 for ease of review/merging.

**Release note**:
```release-note
NONE
```
2018-02-06 15:54:07 -08:00
Kubernetes Submit Queue 8fb3e3f5b0
Merge pull request #57942 from m1093782566/localhost-masq
Automatic merge from submit-queue (batch tested with PRs 58300, 58530, 57942, 58543). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix nodeport localhost martian source error

**What this PR does / why we need it**:

kube-proxy NodePort access via localhost, with externalTrafficPolicy=Local will trigger martian source error.

This PR fixes nodeport localhost martian source error.

**Which issue(s) this PR fixes**:
Fixes #57922

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-01-19 20:00:36 -08:00
Kubernetes Submit Queue 3256546a79
Merge pull request #56948 from MrHohn/esipp-remove-feature-gate
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove ExternalTrafficLocalOnly from kube_feature gate

*What this PR does / why we need it**:
This PR is for v1.10.

External Source IP Preservation (ESIPP) had been promoted to GA since 1.7. Following the proposal on https://github.com/kubernetes/kubernetes/issues/46404#issuecomment-303939180, we should be able to remove it from feature gate now.

Added release note to announce this.

Also ref the previous attempt: https://github.com/kubernetes/kubernetes/pull/45857.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56645

**Special notes for your reviewer**:

**Release note**:

```release-note
"ExternalTrafficLocalOnly" has been removed from feature gate. It has been a GA feature since v1.7.
```
2018-01-19 00:35:01 -08:00
m1093782566 b015f1f567 add ut for localhost nodeport 2018-01-15 11:05:21 +08:00
m1093782566 60bde9fbe2 fix nodeport localhost martian source error 2018-01-15 11:05:18 +08:00
Jeff Grafton efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
Dan Winship 25e5c40acb Abstract some duplicated code in the iptables proxier 2017-12-18 10:18:54 -05:00
Zihong Zheng 9ab98d9f69 Remove ExternalTrafficLocalOnly from kube_feature gate 2017-12-07 21:25:11 -08:00
Kubernetes Submit Queue 02ca5cac01
Merge pull request #53555 from leblancd/v6_del_endpoint_proxier
Automatic merge from submit-queue (batch tested with PRs 55988, 53555, 55858). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add IPv6 and negative UT test cases for proxier's deleteEndpointConnections

This change adds IPv6 and negative UT test cases for the proxier's deleteEndpointConnections.

Changes include:
- Add IPv6 UT test cases to TestDeleteEndpointConnections.
- Add negative UT test case to TestDeleteEndpointConnections for
  handling case where no connections need clearing (benign error).
- Add negative UT test case to test unexpected error.
- Reorganize UT in TestDeleteEndpointConnections so that the fake
  command executor's command and scripted responses are generated on
  the fly based on the test case table (rather than using a fixed
  set of commands/responses that will need to be updated every time
  test cases are added/deleted).
- Create the proxier service map in real time, based on the test case
  table (rather than using a fixed service map that will need to be updated
  every time test cases are added/deleted).

fixes #53554



**What this PR does / why we need it**:
This change adds IPv6 and negative UT test cases for the proxier's
deleteEndpointConnections.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #53554

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-11-18 20:31:23 -08:00
Kubernetes Submit Queue 5e178936a1
Merge pull request #53780 from m1093782566/validate-ipvs
Automatic merge from submit-queue (batch tested with PRs 53780, 55663, 55321, 52421, 55659). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Validate kube-proxy options

**What this PR does / why we need it**:

Validate ipvs proxy options

**Which issue this PR fixes** : fixes #53852

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-11-15 09:30:24 -08:00
Kubernetes Submit Queue 2f622b2a28
Merge pull request #52569 from tmjd/add-proxy-forward-rules
Automatic merge from submit-queue (batch tested with PRs 55009, 55532, 55601, 52569, 55533). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Kube-proxy adds forward rules to ensure NodePorts work

**What this PR does / why we need it**:
Updates kube-proxy to set up proper forwarding so that NodePorts work with docker 1.13 without depending on iptables FORWARD being changed manually/externally.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #39823

**Special notes for your reviewer**:
@thockin I used option number 2 that I mentioned in the #39823 issue, please let me know what you think about this change.  If you are happy with the change then I can try to add tests but may need a little direction about what and where to add them.

**Release note**:

```release-note
Add iptables rules to allow Pod traffic even when default iptables policy is to reject.
```
2017-11-14 00:09:57 -08:00
Kubernetes Submit Queue cae7240cf9
Merge pull request #55601 from m1093782566/getlocalips
Automatic merge from submit-queue (batch tested with PRs 55009, 55532, 55601, 52569, 55533). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix ipvs/proxy getLocalIPs inconsistency with iptables/proxy

**What this PR does / why we need it**:

* Fix ipvs/proxy `getLocalIPs()` inconsistency with iptables/proxy

* validate the ip address before pkg/proxy/util IPPart() return ip string.

**Which issue(s) this PR fixes** :
Fixes #55612

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-11-14 00:09:52 -08:00
m1093782566 42832e7666 fix ipvs proxier getLocalIPs() error 2017-11-13 17:55:53 +08:00
Kubernetes Submit Queue d6cabaf706
Merge pull request #55568 from m1093782566/unsortlist
Automatic merge from submit-queue (batch tested with PRs 53580, 55568). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Replace sets.List() with sets.UnsortedList() in pkg/proxy

**What this PR does / why we need it**:

Replace sets.List() with sets.UnsortedList() in pkg/proxy - sets.List() will sort the result array, we don't need sorted array in pkg/proxy. Using sets.UnsortedList() can reduce the unnecessary overhead spending.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

@wojtek-t wdyt ^_^

**Release note**:

```release-note
NONE
```

/sig network
2017-11-12 21:07:37 -08:00
m1093782566 83ada5c7bf replace sets.List() with sets.UnsortedList() 2017-11-13 10:20:54 +08:00
Zihong Zheng f7ed9cf09a [kube-proxy] Fix session affinity with local endpoints traffic 2017-11-10 18:42:07 -08:00
Dr. Stefan Schimanski bec617f3cc Update generated files 2017-11-09 12:14:08 +01:00
Dr. Stefan Schimanski 012b085ac8 pkg/apis/core: mechanical import fixes in dependencies 2017-11-09 12:14:08 +01:00
m1093782566 c7071ed09a try ipset in ipvs proxy mode 2017-11-07 17:34:27 +08:00
m1093782566 28000f925f fix IPV6 judgement bug and add UTs 2017-10-31 10:02:07 +08:00
Erik Stidham 535634f547 Review updates 2017-10-30 13:44:43 -05:00
Phil Cameron 965cf128b6 Remove iptables log on restore failure
Don't log the set of rules at v2 in kube-proxy on error.
The rules are displayed at v5 before the restore is attempted.

In a large cluster the report can generate up to 100000 lines.
A partial report is only helpful if the problem is displayed
in the partial report.
2017-10-27 09:14:35 -04:00
m1093782566 dab9b84b67 add proxy metrics in app level 2017-10-16 21:10:51 +08:00
Jeff Grafton aee5f457db update BUILD files 2017-10-15 18:18:13 -07:00
Dane LeBlanc 799341f2dc Add IPv6 and negative UT test cases for proxier's deleteEndpointConnections
This change adds IPv6 and negative UT test cases for the proxier's
deleteEndpointConnections.

Changes include:
- Add IPv6 UT test cases to TestDeleteEndpointConnections.
- Add negative UT test case to TestDeleteEndpointConnections for
  handling case where no connections need clearing (benign error).
- Add negative UT test case to test unexpected error.
- Reorganize UT in TestDeleteEndpointConnections so that the fake
  command executor's command and scripted responses are generated on
  the fly based on the test case table (rather than using a fixed
  set of commands/responses that will need to be updated every time
  test cases are added/deleted).
- Create the proxier service map in real time, based on the test case
  table (rather than using a fixed service map that will need to be updated
  every time test cases are added/deleted).

fixes #53554
2017-10-12 20:07:19 -04:00
m1093782566 d96409178b consume endpoints IPPart function in util 2017-10-11 09:51:58 +08:00
Kubernetes Submit Queue a0c93de03d Merge pull request #52028 from leblancd/v6_conntrack
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add required family flag for conntrack IPv6 operation

This change causes kube-proxy to supply the required "-f ipv6"
family flag whenever the conntrack utility is executed and the
associated service is using IPv6.

This change is required for IPv6-only operation.

Note that unit test coverage for the 2-line changes in
pkg/proxy/iptables/proxier.go and /pkg/proxy/ipvs/proxier.go will need
to be added after support for IPv6 service addresses is added to these
files. For pkg/proxy/iptables/proxier.go, this coverage will be added
either with PR #48551.

fixes #52027



**What this PR does / why we need it**:
Kube-proxy is currently not supplying the required "-f ipv6" family flag whenever it
calls the conntrack utility and the associated service is using an IPv6 service IP address.
This means that for IPv6-only operation, conntrack is not properly cleaning up
stale UDP connections, and this may be effecting ip6tables operation.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # 52027

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-10-04 17:11:36 -07:00
Dane LeBlanc 5fbc9e45cc Add IPv6 support to iptables proxier
The following changes are proposed for the iptables proxier:

* There are three places where a string specifying IP:port is parsed
  using something like this:

      if index := strings.Index(e.endpoint, ":"); index != -1 {

  This will fail for IPv6 since V6 addresses contain colons. Also,
  the V6 address is expected to be surrounded by square brackets
  (i.e. []:). Fix this by replacing call to Index with
  call to LastIndex() and stripping out square brackets.
* The String() method for the localPort struct should put square brackets
  around IPv6 addresses.
* The logging in the merge() method for proxyServiceMap should put brackets
  around IPv6 addresses.
* There are several places where filterRules destination is hardcoded to
  /32. This should be a /128 for IPv6 case.
* Add IPv6 unit test cases

fixes #48550
2017-09-16 09:16:12 -04:00