The scrape manage receiver's channel now just saves the target sets
and another backgorund runner updates the scrape loops every 5 seconds.
This is so that the scrape manager doesn't block the receiving channel
when it does the long background reloading of the scrape loops.
Active and dropped targets are now saved in each scrape pool instead of
the scrape manager. This is mainly to avoid races when getting the
targets via the web api.
When reloading the scrape loops now happens in parallel to speed up the
final disared state and this also speeds up the prometheus's shutting
down.
Also updated some funcs signatures in the web package for consistency.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
* web: fix asset paths for Windows platforms
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* web: add tests
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Limit the number of samples remote read can return.
- Return 413 entity too large.
- Limit can be set be a flag. Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
When prom2 came out the storage querier interface consolidated to a
single Select() method. While doing this it makes it impossible as the
implementer of the querier to know if you are being called for metadata
or actual data. The workaround has been to check if the SelectParams are
nil, which the federation call is always nil. This has 2 negative
consequences (1) remote implementations interpret this as a metadata
call, which makes the federation endpoint return nothing. (2) this means
that the storage implementations don't get the same information passed
down to them as far as SelectParams goes.
This diff simply adds SelectParams to the Select() call in the
federation handler
Mitigation for #4057
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
Looking at https://tech.townsourced.com/post/embedding-static-files-in-go/ (which was mentioned in the issue), vfsgen has all the needed features.
In particular:
- Reproducible builds (no issue with timestamping).
- Well maintained and relatively popular.
- Integration with go generate.
- Self-contained (no external dependency).
* [WIP] Replace go-bindata by vfsgen
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Add license + remove doc.go
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Generate templates assets
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use new templates assets
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* split static assets
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Idempotent make assets
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Update vendor/
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* vendor vfsgendev
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Update README.md
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Simplify assets generation
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix README.md
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use generate helper program instead of vfsgen
This avoids installing vfsgendev in the target environment.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Remove unused vfsgen package
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix Makefile
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* vendoring shurcooL/vfsgen
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix go generate command
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Sync web/ui/assets_vfsdata.go
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* adding information about the health and errors for Rules
adding Health() and LastError() to the Rule interface. This will allow
us to easily surface information about rules.
Signed-off-by: noqcks <benny@noqcks.io>
* updating rules.html with fields for Rule errors and health state
Signed-off-by: noqcks <benny@noqcks.io>
* fix code comment grammar & access Rule health/error info using a mutex
Signed-off-by: noqcks <benny@noqcks.io>
* s/Errors/Error/ in rules.html to remain consistent with targets.html
Signed-off-by: noqcks <benny@noqcks.io>
* adding periods to code comments in reporting/alerting
Signed-off-by: noqcks <benny@noqcks.io>
* putting health/error below mutex in struct field
Signed-off-by: noqcks <benny@noqcks.io>
It was added 5 years ago by Matt and I'm not sure anyone ever used
it after public release (since we have /debug/pprof/heap as well).
It also lacked error checking and allows people to write to disk over HTTP.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Allow for BufferedSeriesIterator instances to be created without an underlying iterator, to simplify their usage.
Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>
* Add Start/End to SelectParams
* Make remote read use the new selectParams for start/end
This commit will continue sending the start/end time of the remote read
query as the overarching promql time and the specific range of data that
the query is intersted in receiving a response to is now part of the
ReadHints (upstream discussion in #4226).
* Remove unused vendored code
The genproto.sh script was updated, but the code wasn't regenerated.
This simply removes the vendored deps that are no longer part of the
codegen output.
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
Displaying all the dropped targets in the service-discovery page hurts
the Prometheus server as well as the browser when thousands of dropped
targets exist. This change limits this number to 1,000 and display the
number of active/total targets per scrape configuration.
Add warning when more than 100 targets are dropped
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Move range logic to 'eval'
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make aggregegate range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* PromQL is statically typed, so don't eval to find the type.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Extend rangewrapper to multiple exprs
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Start making function evaluation ranged
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make instant queries a special case of range queries
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Eliminate evalString
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Evaluate range vector functions one series at a time
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make unary operators range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make binops range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Pass time to range-aware functions.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make simple _over_time functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reduce allocs when working with matrix selectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add basic benchmark for range evaluation
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse objects for function arguments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Do dropmetricname and allocating output vector only once.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add range-aware support for range vector functions with params
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise holt_winters, cut cpu and allocs by ~25%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make rate&friends range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make more functions range aware. Document calling convention.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make date functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make simple math functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Convert more functions to be range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make more functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Specialcase timestamp() with vector selector arg for range awareness
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove transition code for functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove the rest of the engine transition code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove more obselete code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove the last uses of the eval* functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove engine finalizers to prevent corruption
The finalizers set by matrixSelector were being called
just before the value they were retruning to the pool
was then being provided to the caller. Thus a concurrent query
could corrupt the data that the user has just been returned.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add new benchmark suite for range functinos
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Migrate existing benchmarks to new system
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Expand promql benchmarks
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Simply test by removing unused range code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* When testing instant queries, check range queries too.
To protect against subsequent steps in a range query being
affected by the previous steps, add a test that evaluates
an instant query that we know works again as a range query
with the tiimestamp we care about not being the first step.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse ring for matrix iters. Put query results back in pool.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse buffer when iterating over matrix selectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Unary minus should remove metric name
Cut down benchmarks for faster runs.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reduce repetition in benchmark test cases
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Work series by series when doing normal vectorSelectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise benchmark setup, cuts time by 60%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Have rangeWrapper use an evalNodeHelper to cache across steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Use evalNodeHelper with functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Cache dropMetricName within a node evaluation.
This saves both the calculations and allocs done by dropMetricName
across steps.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse input vectors in rangewrapper
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse the point slices in the matrixes input/output by rangeWrapper
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make benchmark setup faster using AddFast
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Simplify benchmark code.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add caching in VectorBinop
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Use xor to have one-level resultMetric hash key
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add more benchmarks
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Call Query.Close in apiv1
This allows point slices allocated for the response data
to be reused by later queries, saving allocations.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise histogram_quantile
It's now 5-10% faster with 97% less garbage generated for 1k steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make the input collection in rangeVector linear rather than quadratic
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise label_replace, for 1k steps 15x fewer allocs and 3x faster
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise label_join, 1.8x faster and 11x less memory for 1k steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Expand benchmarks, cleanup comments, simplify numSteps logic.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address Fabian's comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Comments from Alin.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address jrv's comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove dead code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address Simon's comments.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Rename populateIterators, pre-init some sizes
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Handle case where function has non-matrix args first
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Split rangeWrapper out to rangeEval function, improve comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Cleanup and make things more consistent
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make EvalNodeHelper public
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Fabian's comments.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated.
* web: replace deprecated InstrumentHandler()
This change replaces the deprecated InstrumentHandler function by the
equivalent functions from the promhttp package.
The following metrics are removed:
* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).
And the following metrics are added instead:
* prometheus_http_request_duration_seconds (Histogram).
* prometheus_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
* Update github.com/prometheus/common/route package
* web: refactor using the new prometheus/common/route package
After removing the checkbox in #3913 the only remaining element that
looked like it was the new Show Annotations checkbox on the Alerts page.
Which in turn didn't look like the Enable query history checkout on the
graph page. So:
1. This takes the Enable query history button as canonical.
2. Updates the show annotations button code to match it.
3. Simplifies the JS for the checkbox.
The new Service Discovery page uses the CSS/JS from the Targets page but
used slightly differently. This makes the job header match in the
Service Discovery page for a more consistent look-n-feel.
* Added only healthy to Targets
This adds a "Only heathly" button to supplement the "Only unhealthy"
button. The two are mutually exclusive.
I've also added a red/green text color to the buttons.
Arguably this could be a toggle instead if folks think this is
worthwhile... Happy to modify it.
* Moved functions above init
* Simplifed code and made prettier
* Appeased codeacy
* Made buttons square
* Fix JS error: cannot read source of undefined
When the page was refreshed with queries on the page,
the updateTypeaheadMetricsSet function was called before
the typeahead had been initialized.
* Fix: updates URL when query submits
When queries were submitted by pressing enter, the URL did not update
to reflect the change. Not sure why, but this was only the case when
the queries were non-simple, meaning when either labels werre specified
or other promql functions were used.
* Rebase master and make assets
This is a very minor UX change. The current "No Alert rules" present
table row has the `alert_header` class attached. This changes the cursor
and some other stuff and makes sense with the populated table but less
sense with the unpopulated table. So removing it the latter case.
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
When you have no alerting rules defined you get a screen sharing this
information in the WebUI. If no rules are defined then you instead see
an empty white screen. This adds a "No rules" defined `else` clause and
a `Rules` header to the page.
* Do not autoselect the first item in the dropdown
* Historical queries only show in dropdown when toggled on
* Move shared behavior to queryHistory.isEnabled function
* Do not auto submit selected history queries
net.Listener converts 0.0.0.0 to :: which fails for hosts where IPv6 is
disabled. This change uses the original listen address parameter instead
of grpcl.Addr().String().
Federation makes use of dedupedSeriesSet to merge SeriesSets for every
query into one output stream. If many match[] arguments are provided,
many dedupedSeriesSet objects will get chained. This has the downside of
causing a potential O(n*k) running time, where n is the number of series
and k the number of match[] arguments.
In the mean time, the storage package provides a mergeSeriesSet that
accomplishes the same with an O(n*log(k)) running time by making use of
a binary heap. Let's just get rid of dedupedSeriesSet and change all
existing callers to use mergeSeriesSet.
When there is an empty result set, the Prometheus server replies with
{"status":"success","data":{"resultType":"vector","result":null}}
That "null" reply was not handled correctly by the graphing library.
This commit handles that case and shows "no data" in the UI console view
instead of throwing an error in the browser javascript console.
Fixes#3515
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
API consumers should be able to get insight into the query run times.
The UI currently measures total roundtrip times. This PR allows for more
fine grained metrics to be exposed.
* adds new timer for total execution time (queue + eval)
* expose new timer, queue timer, and eval timer in stats field of the
range query response:
```json
{
"status": "success",
"data": {
"resultType": "matrix",
"result": [],
"stats": {
"execQueueTimeNs": 4683,
"execTotalTimeNs": 2086587,
"totalEvalTimeNs": 2077851
}
}
}
```
* stats field is optional, only set when query parameter `stats` is not
empty
Try it via
```sh
curl 'http://localhost:9090/api/v1/query_range?query=up&start=1486480279&end=1486483879&step=14000&stats=true'
```
Review feedback
* moved query stats json generation to query_stats.go
* use seconds for all query timers
* expose all timers available
* Changed ExecTotalTime string representation from Exec queue total time to Exec total time
This PR fixes#3072 by providing POST endpoints for `query` and `query_range`.
POST request must be made with `Content-Type: application/x-www-form-urlencoded` header.
* Add UI warning for time drift >30 seconds
* Yellow time drift warning & better warning message
* Set warning threshold to 30 sec
* Include changed assets
* Re-add contexts to storage.Storage.Querier()
These are needed when replacing the storage by a multi-tenant
implementation where the tenant is stored in the context.
The 1.x query interfaces already had contexts, but they got lost in 2.x.
* Convert promql.Engine to use native contexts
No matter how we refactor docs, `/docs/` will stay the prefix, so there's not long-term risk in changing this.
One we version docs, we should probably try and keep link & version in sync.
Whenever a route prefix is applied, the router prepends the prefix to
the URL path on the request. For most handlers, this is not an issue
because the request's path is only used for routing and is not actually
needed by the handler itself. However, Prometheus delegates the handling
of the /debug/* endpoints to the http.DefaultServeMux which has it's own
routing logic that depends on the url.Path. As a result, whenever a
prefix is applied, the prefixed URL is passed to the DefaultServeMux
which has no awareness of the prefix and returns a 404.
This change fixes the issue by creating a new serveDebug handler which
routes requests /debug/* requests to appropriate net/http/pprof handler
and removing the net/http/pprof import in cmd/prometheus since it is no
longer necessary.
Fixes#2183.
This PR adds the `/status/config` endpoint which exposes the currently
loaded Prometheus config. This is the same config that is displayed on
`/config` in the UI in YAML format. The response payload looks like
such:
```
{
"status": "success",
"data": {
"yaml": <CONFIG>
}
}
```
Issue #3046 is triggered by html/template changes in go1.9.
See https://tip.golang.org/pkg/html/template. Quote:
// To ease migration to Go 1.9 and beyond, "html" and "urlquery" will
// continue to be allowed as the last command in a pipeline. However, if the
// pipeline occurs in an unquoted attribute value context, "html" is
// disallowed. Avoid using "html" and "urlquery" entirely in new templates.
The commit also includes a trivial whitespace fix.
To cover the cases where stale markers may not be available,
we need to infer the interval and mark series stale based on that.
As we're lacking stale markers this is less accurate, however
it should be good enough for these cases.
We need 4 intervals as if say we had data at t=0 and t=10,
coming via federation. The next data point should be at t=20 however it
could take up to t=30 for it actually to be ingested, t=40 for it to be
scraped via federation and t=50 for it to be ingested.
We then add 10% on to that for slack, as we do elsewhere.
* Use request.Context() instead of a global map of contexts.
* Add some basic opentracing instrumentation on the query path.
* Remove tracehandler endpoint.
This is needed for federating non-instance level metrics, so they don't
end up with the instance label of the prometheus target.
Also sort external labels, so label output order is consistent.
* Fixed int64 overflow for timestamp in v1/api parseDuration and parseTime
This led to unexpected results on wrong query with "(...)&start=148966367200.372&end=1489667272.372"
That query is wrong because of `start > end` but actually internal int64 overflow caused start to be something around MinInt64 (huge negative value) and was passing validation.
BTW: Not sure if negative timestamp makes sense even.. But model.Earliest is actually MinInt64, can someone explain me why?
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
* Added missing trailing periods on comments.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
* MOved to only `<` and `>`. Removed equal.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Expose buildQueryUrl, refactor dispatch to use
buildQueryUrl will allow users to execute queries over the range of an
existing graph. This will be helpful to select data series they wish to
annotate the graph with, for example.
The fuzzy library didn't try to find a "best match", but settled on the
first fuzzy match that exists. This patch includes a modified version of
the fuzzy library, which recursivley tries on the rest of the search
string to find a better match. If found, returns that one.
Another small modification is that if a pattern fully matches, it
skips the lookup entirley and returns the highest score possible for
that match.
For some of the queries, the fuzzy lookup was not filtering properly.
The problem is due to the "replace" beind made on the query itself. It
accidently removes only the first underscore. This patch changes it so
that it removes all of the whitespaces, letting the fuzzy algorithm do
its magic, also fixing this problem.
Originally, the underscore were replaced by a space for this specific
reason, to let the user type a space and have the lookup treat it as the
word break.
Fixes#2380
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.
It might even have caused the issues reported as #2266 and #2262 .