Commit Graph

20134 Commits (00c85757f7f356dc92ec97f5a3ca6a859337afba)

Author SHA1 Message Date
cskh 1339c79f8d
consul-container test: no splitting and on single runner (#17394) 2023-05-17 14:57:12 -04:00
Kyle Havlovitz 2904d0a431
Pull virtual IPs for filter chains from discovery chains (#17375) 2023-05-17 11:18:39 -07:00
R.B. Boyer 21c6e0e8e6
fix two typos (#17389) 2023-05-17 08:50:26 -07:00
R.B. Boyer 2f5256ec7a
test: slight refactoring ahead of peering testing improvements (#17387) 2023-05-16 14:57:24 -05:00
John Landa 8f6b9fe177
Add ACLs Enabled field to consul agent startup status message (#17086)
* Add ACLs Enabled field to consul agent startup status message

* Add changelog

* Update startup messages to include default ACL policy configuration

* Correct import groupings
2023-05-16 13:47:02 -05:00
Connor 0789661ce5
Rename hcp-metrics-collector to consul-telemetry-collector (#17327)
* Rename hcp-metrics-collector to consul-telemetry-collector

* Fix docs

* Fix doc comment

---------

Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
2023-05-16 14:36:05 -04:00
R.B. Boyer 06481bf03a
test: fix oss/ent drift in gateway container tests (#17365) 2023-05-16 11:49:27 -05:00
Dan Bond 8dee353492
agent: don't write server metadata in dev mode (#17383)
Signed-off-by: Dan Bond <danbond@protonmail.com>
2023-05-16 02:50:27 -07:00
cskh 59db5e1a2a
integ-test CI: retry if fail to install packages (#17359) 2023-05-15 14:53:07 -04:00
wangxinyi7 70ed184c2b
counterpart of the ent in oss (#17367) 2023-05-15 10:49:43 -07:00
Dan Stough be7d2a4d84
fix(connect envoy): set initial_fetch_timeout to wait for initial xDS… (#17317)
* fix(connect envoy): set initial_fetch_timeout to wait for initial xDS indefinitely

---------

Co-authored-by: Kiril Angov <kiril.angov@gmail.com>
2023-05-15 10:45:16 -04:00
Semir Patel abeccb4c76
Support update resource with change in GroupVersion (#17330) 2023-05-15 09:42:01 -05:00
Matt Keeler d37572bd44
Add a Node health controller (#17214)
This will aggregate all HealthStatus objects owned by the Node and update the status of the Node with an overall health.
2023-05-15 09:55:03 -04:00
cskh 17f06b8808
upgrade test: fix on-the-fly-image build and downsize runner (#17331) 2023-05-15 09:33:05 -04:00
Dan Upton 0a38fc1a2a
resource: handle `ErrWatchClosed` in `WatchList` endpoint (#17289) 2023-05-15 12:35:10 +01:00
Dan Upton 879b775459
docs: initial documentation for the new State Store (#17315) 2023-05-15 12:34:36 +01:00
Dan Bond 95f462d5f1
agent: prevent very old servers re-joining a cluster with stale data (#17171)
* agent: configure server lastseen timestamp

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use correct config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use default age in test golden data

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add changelog

Signed-off-by: Dan Bond <danbond@protonmail.com>

* fix runtime test

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add server_metadata

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* correctly check if metadata file does not exist

Signed-off-by: Dan Bond <danbond@protonmail.com>

* follow instructions for adding new config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* Update agent/agent.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* agent/config: add validation for duration with min

Signed-off-by: Dan Bond <danbond@protonmail.com>

* docs: add new server_rejoin_age_max config definition

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add unit test for checking server last seen

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: log continually for 60s before erroring

Signed-off-by: Dan Bond <danbond@protonmail.com>

* pr comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* remove unneeded todo

* agent: fix error message

Signed-off-by: Dan Bond <danbond@protonmail.com>

---------

Signed-off-by: Dan Bond <danbond@protonmail.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2023-05-15 04:05:47 -07:00
Jeremy Jacobson f334fccb4f
[release/1.15.3] Add cloud stanza documentation (#17311)
* [CC-4856] Add cloud stanza documentation

* Add environment variables to cloud descriptions
2023-05-15 12:52:57 +02:00
Krastin Krastev d90e7d8126
docs: update names in references to renamed tutorials (#17261)
* docs: update names for tutorial references

* docs: update more names for tutorial references
2023-05-15 10:59:30 +03:00
Hans Hasselberg b6097a99b8
Add new fields to HCP bootstrap config request and push state request
To support linking cluster, HCP needs to know the datacenter and if ACLs are enabled. Otherwise hosted Consul Core UI won't work properly.
2023-05-12 21:01:56 -06:00
Jeff Boruszak 8dce0ba504
docs: connect-service-upstreams annotation fixes (#17312)
* corrections

* fixes

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Update website/content/docs/k8s/annotations-and-labels.mdx

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Switching order of labeled/unlabeled

---------

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2023-05-12 22:07:29 +00:00
Eric Haberkorn 8bb16567cd
sidecar-proxy refactor (#17328) 2023-05-12 16:49:42 -04:00
cskh 2edfda998a
consul-container: mitigate the drift from ent repo (#17323) 2023-05-12 13:03:30 -04:00
Chris Thain b9102c295d
Add Network Filter Support for Envoy Extensions (#17325) 2023-05-12 09:52:50 -07:00
Matt Keeler 456156ebec
Add type validations for the catalog resources (#17211)
Also adding some common resource validation error types to the internal/resource package.
2023-05-12 09:24:55 -04:00
Kyle Havlovitz 81d8332524
Attach service virtual IP info to compiled discovery chain (#17295)
* Add v1/internal/service-virtual-ip for manually setting service VIPs

* Attach service virtual IP info to compiled discovery chain

* Separate auto-assigned and manual VIPs in response
2023-05-12 02:28:16 +00:00
Kyle Havlovitz bd0eb07ed3
Add /v1/internal/service-virtual-ip for manually setting service VIPs (#17294) 2023-05-12 00:38:52 +00:00
cskh c61e994fc0
Container test: fix container test slow image build (#17316)
Container integ test: fix container test slow image build
2023-05-11 22:49:49 +00:00
Tu Nguyen 30eee13cb9
Update consul-k8s install command so it is valid (#17310) 2023-05-11 11:55:23 -07:00
R.B. Boyer cd80ea18ff
grpc: ensure grpc resolver correctly uses lan/wan addresses on servers (#17270)
The grpc resolver implementation is fed from changes to the
router.Router. Within the router there is a map of various areas storing
the addressing information for servers in those areas. All map entries
are of the WAN variety except a single special entry for the LAN.

Addressing information in the LAN "area" are local addresses intended
for use when making a client-to-server or server-to-server request.

The client agent correctly updates this LAN area when receiving lan serf
events, so by extension the grpc resolver works fine in that scenario.

The server agent only initially populates a single entry in the LAN area
(for itself) on startup, and then never mutates that area map again.
For normal RPCs a different structure is used for LAN routing.

Additionally when selecting a server to contact in the local datacenter
it will randomly select addresses from either the LAN or WAN addressed
entries in the map.

Unfortunately this means that the grpc resolver stack as it exists on
server agents is either broken or only accidentally functions by having
servers dial each other over the WAN-accessible address. If the operator
disables the serf wan port completely likely this incidental functioning
would break.

This PR enforces that local requests for servers (both for stale reads
or leader forwarded requests) exclusively use the LAN "area" information
and also fixes it so that servers keep that area up to date in the
router.

A test for the grpc resolver logic was added, as well as a higher level
full-stack test to ensure the externally perceived bug does not return.
2023-05-11 11:08:57 -05:00
R.B. Boyer 0ee95df4e0
proto: clear out old ratelimit.tmp files before making new ones (#17292) 2023-05-11 10:36:41 -05:00
John Murret e9986e3774
ci:upload test results to datadog (#17206)
* WIP

* ci:upload test results to datadog

* fix use of envvar in expression

* getting correct permission in reusable-unit.yml

* getting correct permission in reusable-unit.yml

* fixing DATADOG_API_KEY envvar expresssion

* pass datadog-api-key

* removing type from datadog-api-key
2023-05-10 14:49:18 -06:00
Dan Upton 5030101cdb
resource: add missing validation to the `List` and `WatchList` endpoints (#17213) 2023-05-10 10:38:48 +01:00
Dan Upton 6c24a66f73
resource: optionally compare timestamps in `EqualStatus` (#17275) 2023-05-10 10:37:54 +01:00
Derek Menteer 5ecab506a6
Fix ent bug caused by #17241. (#17278)
Fix ent bug caused by #17241

All tests passed in OSS, but not ENT. This is a patch to resolve
the problem for both.
2023-05-09 16:36:29 -05:00
cskh 48f7d99305
snapshot: some improvments to the snapshot process (#17236)
* snapshot: some improvments to the snapshot process

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2023-05-09 15:28:52 -04:00
Semir Patel 40eefaba18
Reaper controller for cascading deletes of owner resources (#17256) 2023-05-09 13:57:40 -05:00
Freddy 0f23def80c
Post a PR comment if the backport runner fails (#17197) 2023-05-09 12:28:34 -06:00
Freddy 7c3e9cd862
Hash namespace+proxy ID when creating socket path (#17204)
UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy

To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
  short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
  limit on the socket directory length.
2023-05-09 12:20:26 -06:00
Dan Upton d53a1d4a27
resource: add helpers for more efficiently comparing IDs etc (#17224) 2023-05-09 19:02:24 +01:00
Derek Menteer 4f6da20fe5
Fix multiple issues related to proxycfg health queries. (#17241)
Fix multiple issues related to proxycfg health queries.

1. The datacenter was not being provided to a proxycfg query, which resulted in
bypassing agentless query optimizations and using the normal API instead.

2. The health rpc endpoint would return a zero index when insufficient ACLs were
detected. This would result in the agent cache performing an infinite loop of
queries in rapid succession without backoff.
2023-05-09 12:37:58 -05:00
Dan Upton 972998203e
controller: deduplicate items in queue (#17168) 2023-05-09 18:14:20 +01:00
Dan Bond 5f079eb05b
Revert "ci: remove test splitting for compatibility tests (#17166)" (#17262)
This reverts commit 861a8151d5.
2023-05-09 10:44:31 -06:00
Dan Upton 6e1bc57469
Controller Runtime 2023-05-09 15:25:55 +01:00
Dan Stough 5e4b736b70
chore(ci): fix backport assistant branch creation race (#17249) 2023-05-08 20:30:45 +00:00
John Murret 861a8151d5
ci: remove test splitting for compatibility tests (#17166)
* remove test splitting from compatibility-integration-tests

* enable on push

* remove ipv6 loopback fix

* re-add ipv6 loopback fix

* remove test splitting from upgrade-integration-tests

* remove test splitting from upgrade-integration-tests

* put test splitting back in for upgrade tests

* upgrade-integration tests-o
ne runner no retries
2023-05-08 20:26:16 +00:00
Matt Keeler 34915670f2
Register new catalog & mesh protobuf types with the resource registry (#17225) 2023-05-08 15:36:35 -04:00
Derek Menteer 50ef6a697e
Fix issue with peer stream node cleanup. (#17235)
Fix issue with peer stream node cleanup.

This commit encompasses a few problems that are closely related due to their
proximity in the code.

1. The peerstream utilizes node IDs in several locations to determine which
nodes / services / checks should be cleaned up or created. While VM deployments
with agents will likely always have a node ID, agentless uses synthetic nodes
and does not populate the field. This means that for consul-k8s deployments, all
services were likely bundled together into the same synthetic node in some code
paths (but not all), resulting in strange behavior. The Node.Node field should
be used instead as a unique identifier, as it should always be populated.

2. The peerstream cleanup process for unused nodes uses an incorrect query for
node deregistration. This query is NOT namespace aware and results in the node
(and corresponding services) being deregistered prematurely whenever it has zero
default-namespace services and 1+ non-default-namespace services registered on
it. This issue is tricky to find due to the incorrect logic mentioned in #1,
combined with the fact that the affected services must be co-located on the same
node as the currently deregistering service for this to be encountered.

3. The stream tracker did not understand differences between services in
different namespaces and could therefore report incorrect numbers. It was
updated to utilize the full service name to avoid conflicts and return proper
results.
2023-05-08 13:13:25 -05:00
John Murret 6fa104409e
security: update go version to 1.20.4 (#17240)
* update go version to 1.20.3

* add changelog

* rename changelog file to remove underscore

* update to use 1.20.4

* update change log entry to reflect 1.20.4
2023-05-08 11:57:11 -06:00
Jared Kirschner f908ad82d0
docs: correct misspelling (#17229) 2023-05-08 13:30:48 -04:00