* Add a feature flag to enable the new manager
This PR creates a copy of the legacy manager and uses it by default.
It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream
in go, and we have also enabled ReadIdleTimeout.
Fix#7588Fix#9068
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd
However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.
This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This change sets the scheme to https when a rule specified by Ingress
matches a wildcard DNS entry in the ingress TLS hosts
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* optimize Linode SD by polling for event changes during refresh
Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.
Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
* Fix: Use json.Unmarshal() instead of json.Decoder
See https://ahmet.im/blog/golang-json-decoder-pitfalls/
json.Decoder is for JSON streams, not single JSON objects / bodies.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert modifications to targetgroup parsing
Signed-off-by: Julius Volz <julius.volz@gmail.com>
prometheus_sd_discovered_targets is wrongly calculated when there are
multiple SD configurations in place. One discovery manager can have
multiple groups coming from multiple service discoveries.
When multiple service discovery configs are used, we do not compute the
metric correctly, and instead just set the metric to one of the service
discoveries.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This makes it clear that the dockerswarm package does more than docker
swarm, but does also docker.
I have picked moby as it is the upstream name: https://mobyproject.org/
There is no user-facing change, except in the case of a bad
configuration. Previously, a user who would have a bad docker sd config
would see an error like:
> field xx not found in type dockerswarm.plain
Now that error would be turned into:
> field xx not found in type moby.plain
While not perfect, it should at not be confusing between docker and
dockerswarm.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Prometheus adds the ability to read secrets from files. This add
this feature for the scaleway service discovery.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This PR introduces support for follow_redirect, to enable users to
disable following HTTP redirects.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The label `__meta_digitalocean_image` expose the `slug` of the image and
the `slug` is only present in the public images.
To refer a user-generated image (`snapshot` or `custom`) we can use
the image's display name.
See: https://developers.digitalocean.com/documentation/v2/#images
Signed-off-by: Matteo Valentini <matteo.valentini@nethesis.it>
Last change in 4efca5a introduced a problem where NewDiscovery would
just return a nil value, which is not handled well and didn't allow for
fixing configuration issues at runtime without a reload.
Signed-off-by: Alfred Krohmer <alfred.krohmer@logmein.com>
This also caches credentials that are obtained e.g. via IRSA on AWS EKS.
Previously, every refresh cycle would request the credentials again.
Signed-off-by: Alfred Krohmer <alfred.krohmer@logmein.com>
Label selector can be
"set-based"(https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement)
but such a selector causes Prometheus start failure with the "unexpected
error: parsing YAML file ...: invalid selector: 'foo in (bar,baz)';
can't understand 'baz)'"-like error.
This is caused by the `fields.ParseSelector(string)` function that
simply splits an expression as a CSV-list, so a comma confuses such a
parsing method and lead to the error.
Use `labels.Parse(string)` to use a valid lexer to parse a selector
expression.
Closes#8284.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Refactor test assertions
This pull request gets rid of assert.True where possible to use
fine-grained assertions.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Java timestamps are causing issues when unmarshalling with a 32 bit
prometheus. It appears that we do not use those fields, so let's remove
them.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix Hetzner Robot SD trying to decode response when a non 2xx HTTP code was returned
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.
Signed-off-by: Andy Bursavich <abursavich@gmail.com>
* discovery: check for nil triton_sd_config
Note: this was discovered thanks to the added test.
The test is pretty low-level but also effective.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test for promql
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test for Consul SD
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test in discovery manager
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We were missing testing on the behaviour of the configuration
unmarshalling.
This PR adds a refresh command that can be used to test that we
use the correct refresh function.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Since we dependend on go1.14 now, we can use T.Cleanup
https://golang.org/pkg/testing/#T.Cleanup
This provides a nicer approach to shut down the test server.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* OpenStack SD: Add availability config option, to choose endpoint type
In some environments Prometheus must query OpenStack via an alternative
endpoint type (gophercloud calls this `availability`.
This commit implements this option.
Co-Authored-By: Dennis Kuhn <d.kuhn@syseleven.de>
Signed-off-by: Steffen Neubauer <s.neubauer@syseleven.de>
use ?wait=10m will give results as fast as usual when data is changing
but will perform far less requests when services do not change.
On large infrastructure, this will reduce quite a lot the number of
qps on Consul servers while having the same performance for freshness
of results.
Signed-off-by: Pierre Souchay <p.souchay@criteo.com>
Previously `max` results stopped reading from results in tests
prematurely, as it stopped when `max` number of items were received from
the channel instead of `max` number of unique target groups received.
This caused flaky tests where the same target group was received
multiple times, as Kubernetes informers may emit the same event multiple
times.
Before this patch, running this test repeatedly failed eventually. After
this patch I have run the test many thousand times without failure.
```bash
go test -run TestEndpointsDiscoveryNamespaces -count 1000 -test.v
```
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
Added optional configuration item role, defaults to 'container' (backwards-compatible).
Setting role to 'cn' will discover compute nodes instead.
Human-friendly compute node hostname discovery depends on cmon 1.7.0:
c1a2aeca36
Adjust testcases to use discovery config per case as two different types are now supported.
Updated documentation:
* new role setting
* clarify what the name 'container' covers as triton uses different names in different locations
Signed-off-by: jzinkweg <jzinkweg@gmail.com>
Add extra meta labels which will be useful in the case
Prometheus discovery hypervisor .
Signed-off-by: pzqu <pzqu@qq.com>
Co-authored-by: pzqu <pzqu@example.com>
We can assume that not all target groups are nil in normal scernarios,
so we can create targets[poolKey] outside the loop.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The Kubernetes client records workqueue duration and latency metrics as
seconds so there's no need to convert the values from microseconds to
seconds anymore.
The cache metrics (prometheus_sd_kubernetes_cache_*) are removed because
they aren't used anymore by the client though still exposed by its API.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* adding additional unit tests for getDataCenter() in consul
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consult Tests : update comments to start with uppercase and end with point
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consult Test : using table-driven tests
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : cleaner syntax
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : even cleaner syntax
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : update comments
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Fixing naming convention by removing underscore in function name
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Removing duplicated test case for getDatacenter()
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* adding unit test for target group
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Improve unit tests for target group
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Fix imports
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Improve test by asserting on whole Target Group object
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
- Use testutil.ToFloat64 to collect testing metrics
- Declare ServiceDiscoveryConfig directly instead of calling Unmarshal on a piece of YAML
Signed-off-by: Nevill <nevill.dutt@gmail.com>
* Update go.mod dependencies before release
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add issue for showing query warnings in promtool
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert json-iterator back to 1.1.6
It produced errors when marshaling Point values with special float
values.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix expected step values in promtool tests after client_golang update
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Update generated protobuf code after proto dep updates
Signed-off-by: Julius Volz <julius.volz@gmail.com>
With the next release of client_golang, Summaries will not have
objectives by default. To not lose the objectives we have right now,
explicitly state the current default objectives.
Signed-off-by: beorn7 <beorn@grafana.com>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
Add extra meta labels which will be useful in the case
Prometheus discovery instances from all projects.
Signed-off-by: Kien Nguyen <kiennt2609@gmail.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
* discovery: factorize for SD based on refresh
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* discovery: use common metrics for refresh
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
See,
$ codespell -S './vendor/*,./.git*,./web/ui/static/vendor*' --ignore-words-list="uint,dur,ue,iff,te,wan"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
i) Increased the size of the Service Discovery Readme title
ii) Changed `TargetGroups` to "target groups" as it has been relocated and renamed to another package.
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
Although it is spelling mistakes, it might make an affects
while reading.
Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
* discovery/kubernetes: fix support for password_file
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Create and pass custom RoundTripper to Kubernetes client
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use inline HTTPClientConfig
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
https://github.com/miekg/dns/pull/815 goes into the detail, but more or
less the existing solution was no longer supported and needed to be
rewritten to support the new versions of the library. miekg additionally
claims this is more correct in the ticket.
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* vendor update
* discovery/gce: oauth2.NoContext is deprecated, replace with context.Background()
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
* discovery: send empty group on blank SD config
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Update comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Add another comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* add logic to check if an azure VM is deallocated or not
* update documentation with the new azure power state label
Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>
* Adding private_dns_name to the list of ec2 labels which can be used in node naming for dynamic environments
Signed-off-by: Serghei Anicheev <serghei@rentalcover.com>
* discovery/azure: fail hard when client_id/client_secret is empty
Signed-off-by: mengnan <supernan1994@gmail.com>
* discovery/azure: fail hard when authentication parameters are missing
Signed-off-by: mengnan <supernan1994@gmail.com>
* add unit test
Signed-off-by: mengnan <supernan1994@gmail.com>
* add unit test
Signed-off-by: mengnan <supernan1994@gmail.com>
* format code
Signed-off-by: mengnan <supernan1994@gmail.com>
Fixes#4855 - ServicePort was wrongly used to construct an address to endpoints
defined in portMappings. This was changed to HostPort. Support for obtaining
auto-generated host ports was also added.
Signed-off-by: Timo Beckers <timo@incline.eu>
Currently Prometheus requests show up with a UA of Go-http-client/1.1
which isn't super helpful. Though the X-Prometheus-Remote-* headers
exist they need to be explicitly configured when logging the request in
order to be able to deduce this is a request originating from
Prometheus. By setting the header we remove this ambiguity and make
default server logs just a bit more useful.
This also updates a few other places to consistently capitalize the 'P'
in the user agent, as well as ensure we set a UA to begin with.
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
Set __meta_ec2_platform label with the instance platform string. Set to 'windows' on Windows servers and absent otherwise.
Signed-off-by: Silvio Gissi <silvio@gissilabs.com>
By default, OpenStack SD only queries for instances
from specified project. To discover instances from other
projects, users have to add more openstack_sd_configs for
each project.
This patch adds `all_tenants` <bool> options to
openstack_sd_configs. For example:
- job_name: 'openstack_all_instances'
openstack_sd_configs:
- role: instance
region: RegionOne
identity_endpoint: http://<identity_server>/identity/v3
username: <username>
password: <super_secret_password>
domain_name: Default
all_tenants: true
Co-authored-by: Kien Nguyen <kiennt2609@gmail.com>
Signed-off-by: dmatosl <danielmatos.lima@gmail.com>
* *: move to go 1.11
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Reduce number of places where we specify the Go version
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Additionally, add triton groups metadata to the discovery reponse
and correct a documentation error regarding the triton server id
metadata.
Signed-off-by: Richard Kiene <richard.kiene@joyent.com>
Commit 1c89984 introduced the ability to expose the owner of the instance.
However, this breaks Prometheus if there is no OwnerID in the reservation (Eg. if you are using a private EC2-API introduced by #4333)
Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>