Instead of copying the map, like OnServicesUpdate() used to do and which
was copied into buildServiceMap() to preserve semantics while creating
testcases, start with a new empty map and do deletion checking later.
The API docs say:
// ServiceTypeExternalName means a service consists of only a reference to
// an external name that kubedns or equivalent will return as a CNAME
// record, with no exposing or proxying of any pods involved.
which implies that ExternalName services should be ignored for proxy
purposes.
Automatic merge from submit-queue (batch tested with PRs 35884, 37305, 37369, 37429, 35679)
fix mixleading warning message regarding kube-proxy nodeIP initializa…
The current warning message implies that the operator should restart kube-proxy with some flag related to node IP which can be very misleading.
Automatic merge from submit-queue
Cover port_allocator_test with more conditions
The test cases of port_allocator_test should cover more conditions, such as `rangeAllocator.used.Bit`.
Automatic merge from submit-queue
Bug fix. Incoming UDP packets not reach newly deployed services
**What this PR does / why we need it**:
Incoming UDP packets not reach newly deployed services when old connection's state in conntrack is not cleared. When a packet arrives, it will not go through NAT table again, because it is not "the first" packet. The PR fix the issue
**Which issue this PR fixes**
Fixes#31983
xref https://github.com/docker/docker/issues/8795
Automatic merge from submit-queue
Curating Owners: pkg/proxy
cc @thockin
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue
Change stickyMaxAge from seconds to minutes, fixes issue #35677
**What this PR does / why we need it**: Increases the service sessionAfinity time from 180 seconds to 180 minutes for proxy mode iptables which was a bug introduced in a refactor.
**Which issue this PR fixes**: fixes#35677
**Special notes for your reviewer**:
**Release note**:
``` release-note
Fixed wrong service sessionAffinity stickiness time from 180 sec to 180 minutes in proxy mode iptables.
```
Since there is no test for the sessionAffinity feature at the moment I wanted to create one but I don't know how.
Automatic merge from submit-queue
Add Windows support to kube-proxy
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This is the first stab at supporting kube-proxy (userspace mode) on Windows
**Which issue this PR fixes** :
fixes#30278
**Special notes for your reviewer**:
The MVP uses `netsh portproxy` to redirect traffic from `ServiceIP:ServicePort` to a `LocalIP:LocalPort`.
For the next version we are expecting to have guidance from Microsoft Container Networking team.
**Limitations**:
Current implementation does not support DNS queries over UDP as `netsh portproxy` currently only supports TCP. We are working with Microsoft to remediate this.
cc: @brendandburns @dcbw
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Add kubelet --network-plugin-mtu flag for MTU selection
* Add network-plugin-mtu option which lets us pass down a MTU to a network provider (currently processed by kubenet)
* Add a test, and thus make sysctl testable
Automatic merge from submit-queue
Remove br_netfilter warning in kube-proxy
Many distros have this module linked in, generating a spurious error.
Fixes#23385
bridge-nf-call-iptables appears to only be relevant when the containers are
attached to a Linux bridge, which is usually the case with default Kubernetes
setups, docker, and flannel. That ensures that the container traffic is
actually subject to the iptables rules since it traverses a Linux bridge
and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1.
But with other networking solutions (like openshift-sdn) that don't use Linux
bridges, bridge-nf-call-iptables may not be not relevant, because iptables is
invoked at other points not involving a Linux bridge.
The decision to set bridge-nf-call-iptables should be influenced by networking
plugins, so push the responsiblity out to them. If no network plugin is
specified, fall back to the existing bridge-nf-call-iptables=1 behavior.
This allows us to use the MARK-MASQ chain as a subroutine, rather than encoding
the mark in many places. Having a KUBE-POSTROUTING chain means we can flush
and rebuild it atomically. This makes followon work to change the mark
significantly easier.
We were trying to be clever and respect TCP's notion of half-open sockets, but
it causes leaks when we can't unblock io.Copy(). This fixes those leaks and
seems to follow most expectations. I think we were just be too clever.
Refactor `pkg/proxy/config`’s ServiceConfigHandler.OnUpdate and
EndpointsConfigHandler.OnUpdate to different method names as they have
different signatures.
This will let the new proxy
(https://github.com/GoogleCloudPlatform/kubernetes/issues/3760)
implement both interfaces.
Since we won’t need a separate loadbalancer structure (load balancing
is handled in the proxy rules), we will simply handle both event types
from the same object.
Moves the userspace code in proxy to a sub-package and adds the
ProxyProvider interface.
This is in preparation for landing an implementation of
https://github.com/GoogleCloudPlatform/kubernetes/issues/3760, which
will mostly be in another sub package for iptables.
Not doing so breaks e2e tests and people that may be using them,
even though we will eventually want to stop supporting this now
that we have better alternatives for typical use cases (NodePort)
A service with a NodePort set will listen on that port, on every node.
This is both handy for some load balancers (AWS ELB) and for people
that want to expose a service without using a load balancer.
Moves proxySocket out of proxier.go to new proxysocket.go in proxy
package in order to start separating proxy logic and implementation and
make proxier more manageable to review.
Standardize how our fakes are used so that a test case can use a
simpler mechanism for providing large, complex data sets, as well
as represent queries over time.
Instead of endpoints being a flat list, it is now a list of "subsets"
where each is a struct of {Addresses, Ports}. To generate the list of
endpoints you need to take union of the Cartesian products of the
subsets. This is compact in the vast majority of cases, yet still
represents named ports and corner cases (e.g. each pod has a different
port number).
This also stores subsets in a deterministic order (sorted by hash) to
avoid spurious updates and comparison problems.
This is a fully compatible change - old objects and clients will
keepworking as long as they don't need the new functionality.
This is the prep for multi-port Services, which will add API to produce
endpoints in this new structure.
All the distros that use this have been updated,
or have PRs out to update them, or owners
have been asked to fix RPMs.
Removing this prevents further use of this model.
Remove now dead code: EtcdClientOrDie
Remove now dead pkg/proxy/config/etcd.go.
Remove unused imports.
gofmt -s from 1.4 does not like
for _ = range BLAH
it wants
for range BLAH
But gofmt from 1.3 dies:
./pkg/proxy/config/config.go:265:6: expected operand, found 'range'
./pkg/proxy/config/config.go:268:3: expected '{', found 'EOF'
So instead, rewrite the code to make them both happy
As far as I know, nobody uses it. It was replaced by PublicIPs. If I were
being very polite I would leave it in internal, but since I am 99.99% sure
nobody uses it, I am cutting it. Let's argue about it.
It was an ABA problem where the proxy loop might see its own service as
"existing" when it had been destroyed and recreated (as in an update).
To prove this I added a counter of running ProxyLoop goroutines and check that
in tests. If I undo my main change, the tests fail. This makes the
proxier_test significantly slower (3 seconds vs 0.5 seconds). Sorry.
Split SourceAPI into two subobjects.
Parallel structure for endpoints, services will allow
changing to use generic code in pkg/client/cache/reflector.go.
Rename some funcs to be more like pkg/client/cache.
After this DNS is resolvable from the host, if the DNS server is targetted
explicitly. This does NOT add the cluster DNS to the host's resolv.conf. That
is a larger problem, with distro-specific tie-ins and circular deps.
- Added process to cleanup stale session affinity records
- Automatically set cloud provided load balancer for sticky session if the service requires it - Note, this only works on GCE right now.
- Changed sessionAffinityMap a map to pointers instead of structs to improve performance
- Commented out cookie and protocol from sessionAffinityDetail to avoid confusion as it is not yet implemented.
Don't log an error when Accept failed because the interface (portal)
was just removed.
Don't pass around a pointer to a serviceInfo since another thread
deletes those. Instead, just check if service name is still in the
service map.
Delete the locking on the serviceInfo object since it is only used
by the "main" proxier thread.
A watch of the API can return an api.Status rather than the watched
obejct type. This code didn't handle that.
Tested with services e2e test (in conjunction with other PR).
The iptables args list needs to include all fields as they are eventually spit
out by iptables-save. This is because some systems do not support the
'iptables -C' arg, and so fall back on parsing iptables-save output. If this
does not match, it will not pass the check. For example: adding the /32 on
the destination IP arg is not strictly required, but causes this list to not
match the final iptables-save output. This is fragile and I hope one day we
can stop supporting such old iptables versions.
This allows the proxier to portal Public IPs even if the
createExternalLoadBalancer flag is not set.
This also fixes what appears to be a bug in the createExternalLoadBalancer path
wherein multiple PublicIPs would get truncated.
Allows us to define different watch versioning regimes in the future
as well as to encode information with the resource version.
This changes /watch/resources?resourceVersion=3 to start the watch at
4 instead of 3, which means clients can read a resource version and
then send it back to the server. Clients should no longer do math on
resource versions.
Move a lot of common error logging into better buckets:
glog.Errorf() - Always an error
glog.Warningf() - Something unexpected, but probably not an error
glog.V(0) - Generally useful for this to ALWAYS be visible
to an operator
* Programmer errors
* Logging extra info about a panic
* CLI argument handling
glog.V(1) - A reasonable default log level if you don't want
verbosity
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions
that can be corrected (pod detected as unhealthy)
glog.V(2) - Useful steady state information about the service
* Logging HTTP requests and their exit code
* System state changing (killing pod)
* Controller state change events (starting pods)
* Scheduler log messages
glog.V(3) - Extended information about changes
* More info about system state changes
glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where
you may want to come back later and check it
* Make Codec separate from Scheme
* Move EncodeOrDie off Scheme to take a Codec
* Make Copy work without a Codec
* Create a "latest" package that imports all versions and
sets global defaults for "most recent encoding"
* v1beta1 is the current "latest", v1beta2 exists
* Kill DefaultCodec, replace it with "latest.Codec"
* This updates the client and etcd to store the latest known version
* EmbeddedObject is per schema and per package now
* Move runtime.DefaultScheme to api.Scheme
* Split out WatchEvent since it's not an API object today, treat it
like a special object in api
* Kill DefaultResourceVersioner, instead place it on "latest" (as the
package that understands all packages)
* Move objDiff to runtime.ObjectDiff
The bug was that the proxier is passed as value on method decleration.
This caused a copy of Proxier to be created when the method was invoked.
The copy being a shallow copy turned out to have them both reference
a same map instance, but their mutexes were different instances.
This turned out to use different mutexes before operating on a same map
instance, which didn't make sense.
This change includes minor refactoring and cleanup of the proxy
package including the following items:
* Rename source files with misspelling of round robin
* Remove unnecessary and redundant comments
* Update comments for clarity
* Add locking when updating the round-robin index
* Improve method receiver names
* Rename the LoadBalance method to NextEndpoint to add clarity
No changes in behaviour have been introduced.
Splits endpoint and service configuration into their own objects. Also makes
the endpoint and service configuration tests correct - there was a race condition
previously that meant tests were passing but not checking correct code.
1) imported glog to third_party (previous commit)
2) add support for third_party/update.sh to update just one pkg
3) search-and-replace:
s/log.Printf/glog.Infof/
s/log.Print/glog.Info/
s/log.Fatalf/glog.Fatalf/
s/log.Fatal/glog.Fatal/
4) convert glog.Info.*, err into glog.Error*
Adds some util interfaces to logging and calls them from each cmd, which
will set the default log output to write to glog. Pass glog-wrapped
Loggers to etcd for logging.
Log files will go to /tmp - we should probably follow this up with a
default log dir for each cmd.
The glog lib is sort of weak in that it only flushes every 30 seconds, so
we spin up our own flushing goroutine.