### Description
<!-- Please describe why you're making this change, in plain English.
-->
- Currently the jwt-auth filter doesn't take into account the service
identity when validating jwt-auth, it only takes into account the path
and jwt provider during validation. This causes issues when multiple
source intentions restrict access to an endpoint with different JWT
providers.
- To fix these issues, rather than use the JWT auth filter for
validation, we use it in metadata mode and allow it to forward the
successful validated JWT token payload to the RBAC filter which will
make the decisions.
This PR ensures requests with and without JWT tokens successfully go
through the jwt-authn filter. The filter however only forwards the data
for successful/valid tokens. On the RBAC filter level, we check the
payload for claims and token issuer + existing rbac rules.
### Testing & Reproduction steps
<!--
* In the case of bugs, describe how to replicate
* If any manual tests were done, document the steps and the conditions
to replicate
* Call out any important/ relevant unit tests, e2e tests or integration
tests you have added or are adding
-->
- This test covers a multi level jwt requirements (requirements at top
level and permissions level). It also assumes you have envoy running,
you have a redis and a sidecar proxy service registered, and have a way
to generate jwks with jwt. I mostly use:
https://www.scottbrady91.com/tools/jwt for this.
- first write your proxy defaults
```
Kind = "proxy-defaults"
name = "global"
config {
protocol = "http"
}
```
- Create two providers
```
Kind = "jwt-provider"
Name = "auth0"
Issuer = "https://ronald.local"
JSONWebKeySet = {
Local = {
JWKS = "eyJrZXlzIjog....."
}
}
```
```
Kind = "jwt-provider"
Name = "okta"
Issuer = "https://ronald.local"
JSONWebKeySet = {
Local = {
JWKS = "eyJrZXlzIjogW3...."
}
}
```
- add a service intention
```
Kind = "service-intentions"
Name = "redis"
JWT = {
Providers = [
{
Name = "okta"
},
]
}
Sources = [
{
Name = "*"
Permissions = [{
Action = "allow"
HTTP = {
PathPrefix = "/workspace"
}
JWT = {
Providers = [
{
Name = "okta"
VerifyClaims = [
{
Path = ["aud"]
Value = "my_client_app"
},
{
Path = ["sub"]
Value = "5be86359073c434bad2da3932222dabe"
}
]
},
]
}
},
{
Action = "allow"
HTTP = {
PathPrefix = "/"
}
JWT = {
Providers = [
{
Name = "auth0"
},
]
}
}]
}
]
```
- generate 3 jwt tokens: 1 from auth0 jwks, 1 from okta jwks with
different claims than `/workspace` expects and 1 with correct claims
- connect to your envoy (change service and address as needed) to view
logs and potential errors. You can add: `-- --log-level debug` to see
what data is being forwarded
```
consul connect envoy -sidecar-for redis1 -grpc-addr 127.0.0.1:8502
```
- Make the following requests:
```
curl -s -H "Authorization: Bearer $Auth0_TOKEN" --insecure --cert leaf.cert --key leaf.key --cacert connect-ca.pem https://localhost:20000/workspace -v
RBAC filter denied
curl -s -H "Authorization: Bearer $Okta_TOKEN_with_wrong_claims" --insecure --cert leaf.cert --key leaf.key --cacert connect-ca.pem https://localhost:20000/workspace -v
RBAC filter denied
curl -s -H "Authorization: Bearer $Okta_TOKEN_with_correct_claims" --insecure --cert leaf.cert --key leaf.key --cacert connect-ca.pem https://localhost:20000/workspace -v
Successful request
```
### TODO
* [x] Update test coverage
* [ ] update integration tests (follow-up PR)
* [x] appropriate backport labels added
* Add header filter to api-gateway xDS golden test
* Stop adding all header filters to virtual host when generating xDS for api-gateway
* Regenerate xDS golden file for api-gateway w/ header filter
When UpstreamEnvoyExtender was introduced, some code was left duplicated
between it and BasicEnvoyExtender. One path in that code panics when a
TProxy listener patch is attempted due to no upstream data in
RuntimeConfig matching the local service (which would only happen in
rare cases).
Instead, we can remove the special handling of upstream VIPs from
BasicEnvoyExtender entirely, greatly simplifying the listener filter
patch code and avoiding the panic. UpstreamEnvoyExtender, which needs
this code to function, is modified to ensure a panic does not occur.
This also fixes a second regression in which the Lua extension was not
applied to TProxy outbound listeners.
* add upstream service targeting to property override extension
* Also add baseline goldens for service specific property override extension.
* Refactor the extension framework to put more logic into the templates.
* fix up the golden tests
* Support Listener in Property Override
Add support for patching `Listener` resources via the builtin
`property-override` extension.
Refactor existing listener patch code in `BasicEnvoyExtender` to
simplify addition of resource support.
* Support ClusterLoadAssignment in Property Override
Add support for patching `ClusterLoadAssignment` resources via the
builtin `property-override` extension.
`property-override` is an extension that allows for arbitrarily
patching Envoy resources based on resource matching filters. Patch
operations resemble a subset of the JSON Patch spec with minor
differences to facilitate patching pre-defined (protobuf) schemas.
See Envoy Extension product documentation for more details.
Co-authored-by: Eric Haberkorn <eric.haberkorn@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kyle@hashicorp.com>
To avoid unintended tampering with remote downstreams via service
config, refactor BasicEnvoyExtender and RuntimeConfig to disallow
typical Envoy extensions from being applied to non-local proxies.
Continue to allow this behavior for AWS Lambda and the read-only
Validate builtin extensions.
Addresses CVE-2023-2816.
UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy
To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
limit on the socket directory length.
* Add MaxEjectionPercent to config entry
* Add BaseEjectionTime to config entry
* Add MaxEjectionPercent and BaseEjectionTime to protobufs
* Add MaxEjectionPercent and BaseEjectionTime to api
* Fix integration test breakage
* Verify MaxEjectionPercent and BaseEjectionTime in integration test upstream confings
* Website docs for MaxEjectionPercent and BaseEjection time
* Add `make docs` to browse docs at http://localhost:3000
* Changelog entry
* so that is the difference between consul-docker and dev-docker
* blah
* update proto funcs
* update proto
---------
Co-authored-by: Maliz <maliheh.monshizadeh@hashicorp.com>
This implements permissive mTLS , which allows toggling services into "permissive" mTLS mode.
Permissive mTLS mode allows incoming "non Consul-mTLS" traffic to be forward unmodified to the application.
* Update service-defaults and proxy-defaults config entries with a MutualTLSMode field
* Update the mesh config entry with an AllowEnablingPermissiveMutualTLS field and implement the necessary validation. AllowEnablingPermissiveMutualTLS must be true to allow changing to MutualTLSMode=permissive, but this does not require that all proxy-defaults and service-defaults are currently in strict mode.
* Update xDS listener config to add a "permissive filter chain" when MutualTLSMode=permissive for a particular service. The permissive filter chain matches incoming traffic by the destination port. If the destination port matches the service port from the catalog, then no mTLS is required and the traffic sent is forwarded unmodified to the application.
This commit swaps the partition field to the local partition for
discovery chains targeting peers. Prior to this change, peer upstreams
would always use a value of default regardless of which partition they
exist in. This caused several issues in xds / proxycfg because of id
mismatches.
Some prior fixes were made to deal with one-off id mismatches that this
PR also cleans up, since they are no longer needed.
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.
If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
These will forward metrics to a loopback listener sent over xDS.
- A dynamic listener listening at the socket path that the previously
defined static cluster is sending metrics to.
- A dynamic cluster that will forward traffic received at this listener
to the hcp-metrics-collector service.
Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
only be defined in bootstrap config. With dynamic listeners/clusters
we can use the proxy's leaf certificate issued by the Connect CA,
which isn't available at bootstrap time.
- We want to intelligently route to the HCP collector. Configuring its
addreess at bootstrap time limits our flexibility routing-wise. More
on this below.
Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.
- Certificate management is taken care of, as mentioned above.
- Service discovery and routing logic is automatically taken care of,
meaning that no code changes are required in the xds package.
- Custom routing rules can be added for the collector using discovery
chain config entries. Initially the collector is expected to be
deployed to each admin partition, but in the future could be deployed
centrally in the default partition. These config entries could even be
managed by HCP itself.
* Leverage ServiceResolver ConnectTimeout for route timeouts to make TerminatingGateway upstream timeouts configurable
* Regenerate golden files
* Add RequestTimeout field
* Add changelog entry
* Include secret type when building resources from config snapshot
* First pass at generating envoy secrets from api-gateway snapshot
* Update comments for xDS update order
* Add secret type + corresponding golden files to existing tests
* Initialize test helpers for testing api-gateway resource generation
* Generate golden files for new api-gateway xDS resource test
* Support ADS for TLS certificates on api-gateway
* Configure TLS on api-gateway listeners
* Inline TLS cert code
* update tests
* Add SNI support so we can have multiple certificates
* Remove commented out section from helper
* regen deep-copy
* Add tcp tls test
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Simple API Gateway e2e test for tcp routes
* Drop DNSSans since we don't front the Gateway with a leaf cert
* WIP listener tests for api-gateway
* Return early if no routes
* Add back in leaf cert to testing
* Fix merge conflicts
* Re-add kind to setup
* Fix iteration over listener upstreams
* New tcp listener test
* Add tests for API Gateway with TCP and HTTP routes
* Move zero-route check back
* Drop generateIngressDNSSANs
* Check for chains not routes
---------
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
* Add Tproxy support to Envoy Extensions (this is needed for service to service validation)
* Add validation for Envoy configuration for an upstream service
* Use both /config_dump and /cluster to validate Envoy configuration
This is because of a bug in Envoy where the EndpointsConfigDump does not
include a cluster_name, making it impossible to match an endpoint to
verify it exists.
This removes endpoints support for builtin extensions since only the
validate plugin was using it, and it is no longer used. It also removes
test cases for endpoint validation. Endpoints validation now only occurs
in the top level test from config_dump and clusters json files.
Co-authored-by: Eric <eric@haberkorn.co>
* updated builtin extension to parse region directly from ARN
- added a unit test
- added some comments/light refactoring
* updated golden files with proper ARNs
- ARNs need to be right format now that they are being processed
* updated tests and integration tests
- removed 'region' from all EnvoyExtension arguments
- added properly formatted ARN which includes the same region found in the removed "Region" field: 'us-east-1'
* extensions: refactor PluginConfiguration into a more generic type
ExtensionConfiguration
Also:
* adds endpoints configuration to lambda golden tests
* uses string constant for builtin/aws/lambda
Co-authored-by: Eric <eric@haberkorn.co>
* feat(ingress-gateway): support outlier detection of upstream service for ingress gateway
* changelog
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
* Fix mesh gateway proxy-defaults not affecting upstreams.
* Clarify distinction with upstream settings
Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.
* Fix mgw mode usage for peered upstreams
There were a couple issues with how mgw mode was being handled for
peered upstreams.
For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.
Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.
This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.
Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.
* Fix upstream mesh gateway mode handling in xds
This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.
In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.
* Merge service/proxy defaults in cfg resolver
Previously the mesh gateway mode for connect proxies would be
merged at three points:
1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.
The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.
The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.
The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.
This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.
* Ensure that proxy-defaults is considered in wc
Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.
* Add changelog.
Co-authored-by: freddygv <freddy@hashicorp.com>
* Regenerate golden files.
* Backport from ENT: "Avoid race"
Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv
* Backport from ENT: "chore: fix flake peerstream test"
Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
* ingress-gateways: don't log error when registering gateway
Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.