Normally the named pipe would buffer up to 64k, but in some cases when a
soft limit is reached, they will start only buffering up to 4k.
In either case, we should not deadlock.
This commit changes the pipe-bootstrap command to first buffer all of
stdin into the process, before trying to write it to the named pipe.
This allows the process memory to act as the buffer, instead of the
named pipe.
Also changed the order of operations in `makeBootstrapPipe`. The new
test added in this PR showed that simply buffering in the process memory
was not enough to fix the issue. We also need to ensure that the
`pipe-bootstrap` process is started before we try to write to its
stdin. Otherwise the write will still block.
Also set stdout/stderr on the subprocess, so that any errors are visible
to the user.
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
The only thing that needed fixing up pertained to this section of the 1.18.x release notes:
> grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default.
For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default.
Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.
Union of all commit messages follows to give an overarching summary:
xds: exclusively support incremental xDS when using xDS v3
Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support
Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit
xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings
In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.
This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.
xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
* Use proxy outbound port from TransparentProxyConfig if provided
* If -proxy-id is provided to the redirect-traffic command, exclude any listener ports
from inbound traffic redirection. This includes envoy_prometheus_bind_addr,
envoy_stats_bind_addr, and the ListenerPort from the Expose configuration.
* Allow users to provide additional inbound and outbound ports, outbound CIDRs
and additional user IDs to be excluded from traffic redirection.
This affects both the traffic-redirect command and the iptables SDK package.
* Add new consul connect redirect-traffic command for applying traffic redirection rules when Transparent Proxy is enabled.
* Add new iptables package for applying traffic redirection rules with iptables.
Allows setting -prometheus-backend-port to configure the cluster
envoy_prometheus_bind_addr points to.
Allows setting -prometheus-scrape-path to configure which path
envoy_prometheus_bind_addr exposes metrics on.
-prometheus-backend-port is used by the consul-k8s metrics merging feature, to
configure envoy_prometheus_bind_addr to point to the merged metrics
endpoint that combines Envoy and service metrics so that one set of
annotations on a Pod can scrape metrics from the service and it's Envoy
sidecar.
-prometheus-scrape-path is used to allow configurability of the path
where prometheus metrics are exposed on envoy_prometheus_bind_addr.
Note that this does NOT upgrade to xDS v3. That will come in a future PR.
Additionally:
- Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated.
- Shuffled some agent/xds imports in advance of a later xDS v3 upgrade.
- Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR.
Fixes#8425
This allows setting ForceWithoutCrossSigning when reconfiguring the CA
for any provider, in order to forcibly move to a new root in cases where
the old provider isn't reachable or able to cross-sign for whatever
reason.
Add a skip condition to all tests slower than 100ms.
This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests
With this change:
```
$ time go test -count=1 -short ./agent
ok github.com/hashicorp/consul/agent 0.743s
real 0m4.791s
$ time go test -count=1 -short ./agent/consul
ok github.com/hashicorp/consul/agent/consul 4.229s
real 0m8.769s
```
This PR updates the tags that we generate for Envoy stats.
Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.
- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.
- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.
- Add a new serf feature flag indicating support for
intentions-as-config-entries.
- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.
- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.
- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.
- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
When consul is restarted and an envoy that had already sent
DiscoveryRequests to the previous consul process sends a request to the
new process it doesn't respect the setting and never populates
DiscoveryRequest.Node for the life of the new consul process due to this
bug: https://github.com/envoyproxy/envoy/issues/9682Fixes#8430
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.
Removing this field also allows us to remove one of the return values from logging.Setup.
Related changes:
- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
Previously, the envoy bootstrap config would blindly copy the self_admin
cluster into the list of static clusters when configuring either
ReadyBindAddr, PrometheusBindAddr, or StatsBindAddr.
Since ingress gateways always configure the ReadyBindAddr property,
users ran into this case much more often than previously.
Highlights:
- add new endpoint to query for intentions by exact match
- using this endpoint from the CLI instead of the dump+filter approach
- enforcing that OSS can only read/write intentions with a SourceNS or
DestinationNS field of "default".
- preexisting OSS intentions with now-invalid namespace fields will
delete those intentions on initial election or for wildcard namespaces
an attempt will be made to downgrade them to "default" unless one
exists.
- also allow the '-namespace' CLI arg on all of the intention subcommands
- update lots of docs
There are a couple of things in here.
First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.
This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.
In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
* Standardize support for Tagged and BindAddresses in Ingress Gateways
This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.
* Update command/connect/envoy/envoy.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* PR comments
* Check to see if address is an actual IP address
* Update agent/xds/listeners.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix whitespace
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Add support for ingress-gateway in CLI command
- Supports -register command
- Creates a static Envoy listener that exposes only the /ready API so
that we can register a TCP healthcheck against the ingress gateway
itself
- Updates ServiceAddressValue.String() to be more in line with Value()
The envoy version was updated after the PR which added this test was opened, and
merged before the test was merged, so it ended up with the wrong version.
The api client should never rever to HTTP if the user explicitly
requested TLS. This change broke some tests because the tests always use
an non-TLS http server, but some tests explicitly enable TLS.
The new grpcAddress function contains all of the logic to translate the
command line options into the values used in the template.
The new type has two advantages.
1. It introduces a logical grouping of values in the BootstrapTplArgs
struct which is exceptionally large. This grouping makes the struct
easier to understand because each set of nested values can be seen
as a single entity.
2. It gives us a reasonable return value for this new function.
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:
There are several distinct chunks of code that are affected:
* new flags and config options for the server
* retry join WAN is slightly different
* retry join code is shared to discover primary mesh gateways from secondary datacenters
* because retry join logic runs in the *agent* and the results of that
operation for primary mesh gateways are needed in the *server* there are
some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
at multiple layers of abstraction just to pass the data down to the right
layer.
* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers
* the function signature for RPC dialing picked up a new required field (the
node name of the destination)
* several new RPCs for manipulating a FederationState object:
`FederationState:{Apply,Get,List,ListMeshGateways}`
* 3 read-only internal APIs for debugging use to invoke those RPCs from curl
* raft and fsm changes to persist these FederationStates
* replication for FederationStates as they are canonically stored in the
Primary and replicated to the Secondaries.
* a special derivative of anti-entropy that runs in secondaries to snapshot
their local mesh gateway `CheckServiceNodes` and sync them into their upstream
FederationState in the primary (this works in conjunction with the
replication to distribute addresses for all mesh gateways in all DCs to all
other DCs)
* a "gateway locator" convenience object to make use of this data to choose
the addresses of gateways to use for any given RPC or gossip operation to a
remote DC. This gets data from the "retry join" logic in the agent and also
directly calls into the FSM.
* RPC (`:8300`) on the server sniffs the first byte of a new connection to
determine if it's actually doing native TLS. If so it checks the ALPN header
for protocol determination (just like how the existing system uses the
type-byte marker).
* 2 new kinds of protocols are exclusively decoded via this native TLS
mechanism: one for ferrying "packet" operations (udp-like) from the gossip
layer and one for "stream" operations (tcp-like). The packet operations
re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
overhead.
* the server instances specially wrap the `memberlist.NetTransport` when running
with gateway federation enabled (in a `wanfed.Transport`). The general gist is
that if it tries to dial a node in the SAME datacenter (deduced by looking
at the suffix of the node name) there is no change. If dialing a DIFFERENT
datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
gateways to eventually end up in a server's :8300 port.
* a new flag when launching a mesh gateway via `consul connect envoy` to
indicate that the servers are to be exposed. This sets a special service
meta when registering the gateway into the catalog.
* `proxycfg/xds` notice this metadata blob to activate additional watches for
the FederationState objects as well as the location of all of the consul
servers in that datacenter.
* `xds:` if the extra metadata is in place additional clusters are defined in a
DC to bulk sink all traffic to another DC's gateways. For the current
datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
balances all servers as well as one mini-cluster per node
(`<node>.server.<dc>.consul`)
* the `consul tls cert create` command got a new flag (`-node`) to help create
an additional SAN in certs that can be used with this flavor of federation.
* add 1.12.2
* add envoy 1.13.0
* Introduce -envoy-version to get 1.10.0 passing.
* update old version and fix consul-exec case
* add envoy_version and fix check
* Update Envoy CLI tests to account for the 1.13 compatibility changes.
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Expose Envoy /stats for statsd agents; Add testcases
* Remove merge conflict leftover
* Add support for prefix instead of path; Fix docstring to mirror these changes
* Add new config field to docs; Add testcases to check that /stats/prometheus is exposed as well
* Parametrize matchType (prefix or path) and value
* Update website/source/docs/connect/proxies/envoy.md
Co-Authored-By: Paul Banks <banks@banksco.de>
Co-authored-by: Paul Banks <banks@banksco.de>
Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates.
This PR adds a check that renews the cert if it is half way through its validity period.
In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.
* Use consts for well known tagged adress keys
* Add ipv4 and ipv6 tagged addresses for node lan and wan
* Add ipv4 and ipv6 tagged addresses for service lan and wan
* Use IPv4 and IPv6 address in DNS
Since generated envoy clusters all are named using (mostly) SNI syntax
we can have envoy read the various fields out of that structure and emit
it as stats labels to the various telemetry backends.
I changed the delimiter for the 'customization hash' from ':' to '~'
because ':' is always reencoded by envoy as '_' when generating metrics
keys.
* Ensure we MapWalk the proxy config in the NodeService and ServiceNode structs
This gets rid of some json encoder errors in the catalog endpoints
* Allow passing explicit bind addresses to envoy
* Move map walking to the ConnectProxyConfig struct
Any place where this struct gets JSON encoded will benefit as opposed to having to implement it everywhere.
* Fail when a non-empty address is provided and not bindable
* camel case
* Update command/connect/envoy/envoy.go
Co-Authored-By: Paul Banks <banks@banksco.de>
* connect: allow overriding envoy listener bind_address
* Update agent/xds/config.go
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
* connect: allow overriding envoy listener bind_port
* envoy: support unix sockets for grpc in bootstrap
Add AgentSocket BootstrapTplArgs which if set overrides the AgentAddress
and AgentPort to generate a bootstrap which points Envoy to a unix
socket file instead of an ip:port.
* Add a test for passing the consul addr as a unix socket
* Fix config formatting for envoy bootstrap tests
* Fix listeners test cases for bind addr/port
* Update website/source/docs/connect/proxies/envoy.md
* Make exec test assert Envoy version - it was not rebuilding before and so often ran against wrong version. This makes 1.10 fail consistenty.
* Switch Envoy exec to use a named pipe rather than FD magic since Envoy 1.10 doesn't support that.
* Refactor to use an internal shim command for piping the bootstrap through.
* Fmt. So sad that vscode golang fails so often these days.
* go mod tidy
* revert go mod tidy changes
* Revert "ignore consul-exec tests until fixed (#5986)"
This reverts commit 683262a686.
* Review cleanups
* Add integration test for central config; fix central config WIP
* Add integration test for central config; fix central config WIP
* Set proxy protocol correctly and begin adding upstream support
* Add upstreams to service config cache key and start new notify watcher if they change.
This doesn't update the tests to pass though.
* Fix some merging logic get things working manually with a hack (TODO fix properly)
* Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway
* Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does
* Fix up service manageer and API test failures
* Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally
* Remove version.go hack - will make integration test fail until release
* Remove unused code from commands and upstream merge
* Re-bump version to 1.5.0
* Add HTTP endpoints for config entry management
* Finish implementing decoding in the HTTP Config entry apply endpoint
* Add CAS operation to the config entry apply endpoint
Also use this for the bootstrapping and move the config entry decoding function into the structs package.
* First pass at the API client for the config entries
* Fixup some of the ConfigEntry APIs
Return a singular response object instead of a list for the ConfigEntry.Get RPC. This gets plumbed through the HTTP API as well.
Dont return QueryMeta in the JSON response for the config entry listing HTTP API. Instead just return a list of config entries.
* Minor API client fixes
* Attempt at some ConfigEntry api client tests
These don’t currently work due to weak typing in JSON
* Get some of the api client tests passing
* Implement reflectwalk magic to correct JSON encoding a ProxyConfigEntry
Also added a test for the HTTP endpoint that exposes the problem. However, since the test doesn’t actually do the JSON encode/decode its still failing.
* Move MapWalk magic into a binary marshaller instead of JSON.
* Add a MapWalk test
* Get rid of unused func
* Get rid of unused imports
* Fixup some tests now that the decoding from msgpack coerces things into json compat types
* Stub out most of the central config cli
Fully implement the config read command.
* Basic config delete command implementation
* Implement config write command
* Implement config list subcommand
Not entirely sure about the output here. Its basically the read output indented with a line specifying the kind/name of each type which is also duplicated in the indented output.
* Update command usage
* Update some help usage formatting
* Add the connect enable helper cli command
* Update list command output
* Rename the config entry API client methods.
* Use renamed apis
* Implement config write tests
Stub the others with the noTabs tests.
* Change list output format
Now just simply output 1 line per named config
* Add config read tests
* Add invalid args write test.
* Add config delete tests
* Add config list tests
* Add connect enable tests
* Update some CLI commands to use CAS ops
This also modifies the HTTP API for a write op to return a boolean indicating whether the value was written or not.
* Fix up the HTTP API CAS tests as I realized they weren’t testing what they should.
* Update config entry rpc tests to properly test CAS
* Fix up a few more tests
* Fix some tests that using ConfigEntries.Apply
* Update config_write_test.go
* Get rid of unused import
* Add support for HTTP proxy listeners
* Add customizable bootstrap configuration options
* Debug logging for xDS AuthZ
* Add Envoy Integration test suite with basic test coverage
* Add envoy command tests to cover new cases
* Add tracing integration test
* Add gRPC support WIP
* Merged changes from master Docker. get CI integration to work with same Dockerfile now
* Make docker build optional for integration
* Enable integration tests again!
* http2 and grpc integration tests and fixes
* Fix up command config tests
* Store all container logs as artifacts in circle on fail
* Add retries to outer part of stats measurements as we keep missing them in CI
* Only dump logs on failing cases
* Fix typos from code review
* Review tidying and make tests pass again
* Add debug logs to exec test.
* Fix legit test failure caused by upstream rename in envoy config
* Attempt to reduce cases of bad TLS handshake in CI integration tests
* bring up the right service
* Add prometheus integration test
* Add test for denied AuthZ both HTTP and TCP
* Try ANSI term for Circle