prometheus

Commit Graph

Author	SHA1	Message	Date
Chris Marchbanks	87f1dad16d	throttle resends of alerts to 1 minute by default (#4538 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	6 years ago
Krasi Georgiev	12fe204ea6	move runtime debug funcs in own package (#4494 ) To make local debuging with `go run` easyer moved all files into a dedicate package `runtime`. This allows running prometheus just by using `go run main.go` instead of passing mani files like `go run main.go limits_default.go ...` Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	6 years ago
Simon Pasquier	08c2f50382	Merge pull request #4418 from simonpasquier/log-vm-limits prometheus: log virtual memory limits	6 years ago
Frederic Branczyk	b0b3e3dd74	promql: Remove old and unused alerting/reconding syntax Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	6 years ago
Dave Henderson	73a08f0045	promtool - Adding --step flag to 'query range' subcommand (#4454 ) Signed-off-by: Dave Henderson <dhenderson@gmail.com>	6 years ago
Julius Volz	90521a65f8	Remove error return value from NotifyFunc() (#4459 ) It's always nil and we also forgot to check it. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Ganesh Vernekar	f1db699dff	Persist alert 'for' state across restarts (#4061 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	6 years ago
Simon Pasquier	a94450c288	Fix build for openbsd Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Simon Pasquier	141c188ae6	Enforce conversion for freebsd Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Simon Pasquier	208d21a393	Add comment and print units Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Simon Pasquier	ba22b10113	prometheus: log virtual memory limits Signed-off-by: Simon Pasquier <spasquie@redhat.com>	6 years ago
Daisy T	a3376e8f36	add query labels command to promtool (#4346 ) Signed-off-by: Daisy T <daisyts@gmx.com>	6 years ago
Julius Volz	95dfb1b1dd	Add missing import to promtool, fix build (#4395 ) Sorry, I used GitHub's web-based merge-conflict-resolution editor on https://github.com/prometheus/prometheus/pull/4308 and it didn't show me test errors afterwards, but maybe they didn't run again or I should have waited or something. Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Shubheksha	125da3b812	promtool: add command for querying series (#4308 ) Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>	6 years ago
Julius Volz	03aa3a3de8	main: Improve / clean up error messages (#4286 ) Signed-off-by: Julius Volz <julius.volz@gmail.com>	6 years ago
Chih-Hung Yeh	912d19fb85	Add 3 commands in `promtool` for getting debug information from prometheus server (#4247 ) `debug all` - all information `debug metrics` - metrics information `debug pprof` - profiling information the final result is compressed in a `tar.gz` file Signed-off-by: chyeh <chyeh.taiwan@gmail.com>	6 years ago
Brian Brazil	68e8b80ffe	Reorder startup and shutdown to prevent panics. (#4321 ) Start rule manager only after tsdb and config is loaded. Stop rule manager before tsdb to avoid writing to closed storage. Wait for any in-progress reloads to complete before shutting down rule manager, so that rule manager doesn't get updated after being shut down. Remove incorrect comment around shutting down query enginge. Log when config reload is completed. Fixes #4133 Fixes #4262 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	6 years ago
Michael Khalil	78e0784d04	return error exit status in prometheus cli (#4296 ) Signed-off-by: mikeykhalil <mikeyfkhalil@gmail.com>	7 years ago
Tom Wilkie	8acad5f3cd	make it compile Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Tom Wilkie	e51d6c4b6c	Make remote flush deadline a command line param. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	7 years ago
Sneha Inguva	c1a851074b	promtool: add query instant and query range commands (#4085 ) * promtool: add QueryInstant and QueryRange cmds * promtool: add more query functions * promtool: finished query Instant * promtool: add range query * promtool: add query command and address arguments * vendor client and api	7 years ago
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	7 years ago
Sneha Inguva	7be846754a	main: actor functionality comments	7 years ago
Marek Siarkowicz	bb86c3f62b	Report internal runtime information on status page (#3921 ) Add information about tsdb, wal and config reload	7 years ago
James Turnbull	ba5273a0ab	Minor edits to help text (#3990 )	7 years ago
Simon Pasquier	e1fd96db25	cmd: fix help text (#3989 )	7 years ago
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	7 years ago
Bartek Plotka	93a63ac5fd	api: Added v1/status/flags endpoint. (#3864 ) Endpoint URL: /api/v1/status/flags Example Output: ```json { "status": "success", "data": { "alertmanager.notification-queue-capacity": "10000", "alertmanager.timeout": "10s", "completion-bash": "false", "completion-script-bash": "false", "completion-script-zsh": "false", "config.file": "my_cool_prometheus.yaml", "help": "false", "help-long": "false", "help-man": "false", "log.level": "info", "query.lookback-delta": "5m", "query.max-concurrency": "20", "query.timeout": "2m", "storage.tsdb.max-block-duration": "36h", "storage.tsdb.min-block-duration": "2h", "storage.tsdb.no-lockfile": "false", "storage.tsdb.path": "data/", "storage.tsdb.retention": "15d", "version": "false", "web.console.libraries": "console_libraries", "web.console.templates": "consoles", "web.enable-admin-api": "false", "web.enable-lifecycle": "false", "web.external-url": "", "web.listen-address": "0.0.0.0:9090", "web.max-connections": "512", "web.read-timeout": "5m", "web.route-prefix": "/", "web.user-assets": "" } } ``` Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	7 years ago
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	7 years ago
Conor Broderick	5169ccf258	Merge pull request #3724 from simonpasquier/fix-bad-data-error Don't reset FiredAt for inactive alerts	7 years ago
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	7 years ago
Krasi Georgiev	7858745c04	rename structs for consistency	7 years ago
Krasi Georgiev	acc4197098	remove dicovery race for the context field	7 years ago
Julien Pivotto	8b20cb1e8d	last config success time gauge: use SetToCurrentTime() (#3750 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	7 years ago
Simon Pasquier	81c0ab69e0	Don't reset FiredAt for inactive alerts Otherwise AlertManager receives resolved alerts where StartsAt is zero which fails the validation.	7 years ago
Krasi Georgiev	719c579f7b	refactor main execution reloadReady handling, update some comments	7 years ago
Krasi Georgiev	0eafaf32d3	set the correct config reloading execution for scraper and notifier	7 years ago
Krasi Georgiev	97f0461e29	refactor the config reloading execution	7 years ago
Krasi Georgiev	5260c650ec	use the config hash for the map lookup	7 years ago
Krasi Georgiev	8369826808	comment to rethink the map reference for the notifier discovery	7 years ago
Krasi Georgiev	d12e6f29fc	discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager reimplement the service discovery for the notify manager Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	7 years ago
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	7 years ago
Goutham Veeramachaneni	35a6ffbaf3	Merge pull request #3587 from krasi-georgiev/web-test-error-check handle web_test webhandler errors.	7 years ago
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	7 years ago
Brian Brazil	ecc24b554d	Hide block duration flags. (#3618 ) Users are starting to use these mistakenly thinking they'll help with issues, and thus causing some confusion. Thus hide them and make it clear that they're only there for testing reasons.	7 years ago
Krasi Georgiev	c94fa731aa	bypass the proxy for the tests	7 years ago
Krasi Georgiev	ad66476c4f	fix flaky main.go test and simplify a bit	7 years ago
Fabian Reinartz	2881d73ed8	Merge pull request #3362 from krasi-georgiev/discovery-refactoring Decouple the discovery and refactor the retrieval package	7 years ago
Goutham Veeramachaneni	9c9f96b2c0	Merge pull request #3529 from krasi-georgiev/main-integration-test main.go integration test for Startup interrupting.	7 years ago
Krasi Georgiev	587dec9eb9	rebased and resolved conflicts with the new Discovery GUI page Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	7 years ago
Krasi Georgiev	1ec76d1950	rearange the contexts variables and logic split the groupsMerge function to set and get other small nits	7 years ago
Krasi Georgiev	6ff1d5c51e	add the scrape manager config reloader handle errors with invalid scrape config	7 years ago
Krasi Georgiev	b0d4f6ee08	resolved merge confilc in main.go	7 years ago
Krasi Georgiev	c5cb0d2910	simplify naming and API.	7 years ago
Krasi Georgiev	9c61f0e8a0	scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage.	7 years ago
Krasi Georgiev	e405e2f1ea	refactored discovery	7 years ago
pasquier-s	2440696961	Log file descriptor limits at startup (#3567 ) Fixes #3564	7 years ago
Alberto Cortés	29da2fb9cd	testutil: update to go1.9 testing.Helper	7 years ago
Alberto Cortés	8f6a9f7833	config: simplify tests by using testutil.NotOk (#3289 ) Also include filename in all LoadFile errors Also add mesage to testuitl.NotOk so we can identify failing tests when using table driven tests.	7 years ago
Krasi Georgiev	740662644e	write to temp dir and remove it at the end. Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	7 years ago
Brian Brazil	b97f4cf48c	Add metrics for rule group interval and last duration.	7 years ago
Krasi Georgiev	2c2a962da3	main.go integration test for Startup interrupting.	7 years ago
Goutham Veeramachaneni	823b7f90b3	Use the files globbed files and not the files in cfg Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	7 years ago
Fabian Reinartz	62461379b7	rules: decouple notifier packages The dependency on the notifier packages caused a transitive dependency on discovery and with that all client libraries our service discovery uses.	7 years ago
Fabian Reinartz	4d964a0a0d	rules: make glob expansion a concern of main	7 years ago
Fabian Reinartz	bd9f7460eb	rules: remove config package dependency	7 years ago
Fabian Reinartz	2d0e3746ac	rules: remove dependency on promql.Engine	7 years ago
Krasi Georgiev	e2f4850fea	Refactor main.go with oklog/pkg/group actors pattern	7 years ago
Thibault Chataigner	fc4406201e	Tsdb StartTime : Use a simplier way to compute StartTime	7 years ago
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	7 years ago
Julius Volz	9d43176ab3	Remove unused printVersion variable (#3335 ) Kingpin now automatically does this via --version.	7 years ago
Julius Volz	82c5b98496	Capitalize Prometheus in startup message (#3332 ) Hey, branding :)	7 years ago
Thibault Chataigner	bf4a279a91	Remote storage reads based on oldest timestamp in primary storage (#3129 ) Currently all read queries are simply pushed to remote read clients. This is fine, except for remote storage for wich it unefficient and make query slower even if remote read is unnecessary. So we need instead to compare the oldest timestamp in primary/local storage with the query range lower boundary. If the oldest timestamp is older than the mint parameter, then there is no need for remote read. This is an optionnal behavior per remote read client. Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>	7 years ago
Julius Volz	5f715f5733	Fix typo in flag description (#3302 )	7 years ago
Tobias Schmidt	3589f2f1d4	Merge pull request #3285 from jlevesy/use-testutils-in-cmd-subpackage Use testutil assertion helpers in cmd package	7 years ago
Julien Levesy	d7b4fa8d78	use testutil assertions in the cmd/prometheus package	7 years ago
Mathieu Pasquet	38afa507bb	Provide better errors messages in commandline Instead or only printing the help message, which is not always helpful. For example, when upgrading from prometheus v1, the retention time value format has changed and now only accepts one unit (e.g. "15d") where it previously allowed more complex strings (e.g. "360h0m0s"). This commit provides the error message as an explanation for the parsing failure.	7 years ago
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	7 years ago
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	7 years ago
Paul Gier	08af129b4d	cmd/prometheus: don't allow quotes at beginning or end of url This prevents accidental copy/paste error where a the web.external-url or alertmanager.url params could have an extra set of quotes. See also: https://github.com/prometheus/prometheus/issues/1229	7 years ago
Paul Gier	f79b55d057	cmd/prometheus: remove govalidator for url validation The usage of govalidator is redundant with the call to url.Parse for url validation. Removing it has the following benefits: - The explicit error message is displayed instead of just a generic valid/invalid message - Slightly smaller code with one fewer external dependency - Speed improvement by removing duplicate call to url.Parse (inside govalidator.IsURL() - Resolves issue #2717 The only potential drawback of removing govalidator is that certain URLs will be considered valid which were previously invalid. For example: - URLs with hostnames that start and/or end with an underscore (http://_example.com_) - URLs with hostnames that contain some special characters (http://foo&*bar.org) These are valid URIs according to RFC 3986 and valid domain names per RFC 2181, however they are not valid hostnames per RFC 952.	7 years ago
Fabian Reinartz	7b02bfee0a	web: start web handler while TSDB is starting up	7 years ago
Goutham Veeramachaneni	f5aed810f9	logging: Port to common/promlog Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	7 years ago
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	7 years ago
Fabian Reinartz	c70379e1c7	Merge branch 'dev-2.0' of github.com:prometheus/prometheus into dev-2.0	7 years ago
Fabian Reinartz	fffe51fb03	Add mutex and block profiling via envvar	7 years ago
Ben Kochie	59aca4138b	Fix staticcheck issues.	7 years ago
Matt Bostock	64973f5c65	cmd/prometheus: Fix capitalisation in log line (#3123 ) Change 'Ready' to 'ready'.	7 years ago
Mark Adams	77c816b309	Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054 ) Whenever a route prefix is applied, the router prepends the prefix to the URL path on the request. For most handlers, this is not an issue because the request's path is only used for routing and is not actually needed by the handler itself. However, Prometheus delegates the handling of the /debug/* endpoints to the http.DefaultServeMux which has it's own routing logic that depends on the url.Path. As a result, whenever a prefix is applied, the prefixed URL is passed to the DefaultServeMux which has no awareness of the prefix and returns a 404. This change fixes the issue by creating a new serveDebug handler which routes requests /debug/* requests to appropriate net/http/pprof handler and removing the net/http/pprof import in cmd/prometheus since it is no longer necessary. Fixes #2183.	7 years ago
Callum Styan	8912f81ffe	check if file_sd files exist in checkConfig	7 years ago
Fabian Reinartz	25f3e1c424	Merge branch 'master' into mergemaster	7 years ago
KalivarapuReshma	686050d816	Change -config.file to --config.file in Readme and error message	7 years ago
emluque	ff54c5c11a	2831 Add Healthy and Ready endpoints	7 years ago
Fabian Reinartz	4d3d8ee229	Merge pull request #2850 from tomwilkie/dev-2.0-remote Remote APIs for v2	7 years ago
Julius Volz	cc50aa2c6b	main: Consistently end flag descriptions with periods. (#2977 )	7 years ago
Tom Wilkie	2dda5775e3	Initial port of remote storage to v2.	7 years ago
Fabian Reinartz	32226e30f5	Guard reload and quit endpoints by flag	7 years ago
Fabian Reinartz	45ac064669	web: disable Amin APIs by default	7 years ago
Fabian Reinartz	ccf9e62972	*: add admin grpc API	7 years ago
Fabian Reinartz	be32afd6df	cmd/prometheus: add back tsdb.no-lockfile flag	8 years ago
Goutham Veeramachaneni	f9202c6511	Move from .yaml to .yml in update rules Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	e3701077c3	Move promtool to kingpin Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Fabian Reinartz	867b8d108f	cmd/prometheus: cleanup	8 years ago
Fabian Reinartz	34ab7a885a	cmd/prometheus: switch to kingpin	8 years ago
Goutham Veeramachaneni	592cb00c2f	Remove version from RuleGroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	37e7b69f56	Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	67dc73fd59	Flag changes for 2.0 Fixes: prometheus/prometheus#2087 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	d407bd150c	Consolidate the duration params in CLI * All CLI params moved to model.Duration Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	6b70a4d850	Incorporate PR feedback * Move fingerprint to Hash() * Move away from tsdb.MultiError * 0777 -> 0666 for files * checkOverflow of extra fields Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	6c1617fd13	Simplify usage string Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	dc69645e92	Move back to go-yaml Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	8abb91f656	Move CLI commander to cobra Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	1c08743721	Update check-rules to new format. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Goutham Veeramachaneni	cea1e99f78	Add update-rules command to promtool Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	8 years ago
Fabian Reinartz	669075c6b9	Merge branch 'master' into dev-2.0	8 years ago
Chris Goller	42de0ae013	Use log.Logger interface for all discovery services	8 years ago
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	8 years ago
Fabian Reinartz	4c31061251	Merge branch 'master' into dev-2.0	8 years ago
Fabian Reinartz	d289dc55c3	storage: update TSDB	8 years ago
Shashank Varanasi	dea60bb553	Fix malformed uname string (#2727 ) * Fix malformed uname string * Make fix better * Reformat code for simplicity	8 years ago
Fabian Reinartz	06c2b76cd4	Merge branch 'master' into uptsdb	8 years ago
Shashank Varanasi	61235fd851	Print system information (uname) at Prometheus startup (#2709 ) * Print uname on prom startup * Make uname file linux-only * Add missing license headers Add missing license headers * Print OS when uname is not available * Print only OS name when uname not available * Remove extra space, fix cmd/prometheus/main.go license header * Add fix for int8 and uint8 systems * Better formatting for build tags in cmd/prometheus/uname files * Remove newline	8 years ago
Frederic Branczyk	c50a3eccce	prometheus: default max-block-duration to 10% of retention	8 years ago
Michal Witkowski	4177c35eba	Fixup sighup for P2 TSDB init #2699	8 years ago
Fabian Reinartz	9b175d48cb	Add flag to disable TSDB lock file	8 years ago
Fabian Reinartz	73b8ff0ddc	Merge branch 'master' into dev-2.0	8 years ago
Matt Layher	283756c503	Initial commit of 'promtool check-metrics', promlint package (#2605 )	8 years ago
Fabian Reinartz	757cba7c31	cmd/prometheus: Undo GOGC adjustment	8 years ago
beorn7	f20b84e816	flags: Improve doc strings for checkpoint flags	8 years ago
Fabian Reinartz	8ffc851147	Merge branch 'master' into dev-2.0	8 years ago
Julius Volz	589061919a	Merge pull request #2465 from Gouthamve/alert-metrics-2429 Better Metrics For Alerts	8 years ago
Goutham Veeramachaneni	f27ce34a13	Use Registerer to Register All Metrics * Made Metric a Gauge so that it can be registered.	8 years ago
Goutham Veeramachaneni	0d0c9d5440	Move Registerer to Config Struct in Notifier	8 years ago
Björn Rabenstein	29f05680a2	Merge pull request #2528 from prometheus/beorn7/storage2 main.go: Set GOGC to 40 by default	8 years ago
Björn Rabenstein	e63d079b59	Merge pull request #2527 from prometheus/beorn7/storage storage: Evict chunks and calculate persistence pressure...	8 years ago
Julius Volz	b5b0e00923	Merge pull request #2499 from prometheus/remote-read Remote Read	8 years ago
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	8 years ago
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	8 years ago
beorn7	04ccf84559	main.go: Set GOGC to 40 by default Rationale: The default value for GOGC is 100, i.e. a garbage collected is initialized once as many heap space has been allocated as was in use after the last GC was done. This ratio doesn't make a lot of sense in Prometheus, as typically about 60% of the heap is allocated for long-lived memory chunks (most of which are around for many hours if not days). Thus, short-lived heap objects are accumulated for quite some time until they finally match the large amount of memory used by bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we are essentially reinstating "normal" GC behavior by acknowledging that about 60% of the heap are used for long-term bulk storage. The median Prometheus production server at SoundCloud runs a GC cycle every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds (which is still not very often). However, the effective RAM usage is now reduced by about 30%. If settings are updated to utilize more RAM, the time between GC cycles goes up again (as the heap size is larger with more long-lived memory chunks, but the frequency of creating short-lived heap objects does not change). On a quite busy large Prometheus server, the timing changed from one GC run every 20s to one GC run every 12s. In the former case (just changing GOGC, leave everything else as it is), the CPU usage increases by about 10% (on a mid-size referenc server from 8.1 to 8.9). If settings are adjusted, the CPU consumptions increases more drastically (from 8 cores to 13 cores on a large reference server), despite GCs happening more rarely, presumably because a 50% larger set of memory chunks is managed now. Having more memory chunks is good in many regards, and most servers are running out of memory long before they run out of CPU cycles, so the tradeoff is overwhelmingly positive in most cases. Power users can still set the GOGC environment variable as usual, as the implementation in this commit honors an explicitly set variable.	8 years ago
Julius Volz	8fda83ea12	Make rules only read local data	8 years ago
Julius Volz	406b65d0dc	Rename remote.Storage to remote.Writer	8 years ago
Julius Volz	02395a224d	[WIP] Remote Read	8 years ago
Fabian Reinartz	b586781283	*: update tsdb vendoring and add retention flag	8 years ago
Goutham Veeramachaneni	f35816613e	Refactored Notifier to use Registerer * Brought metrics back into Notifier Notifier still implements a Collector. Check if that is needed.	8 years ago
Fabian Reinartz	9304179ef7	Merge branch 'master' into dev-2.0	8 years ago
Fabian Reinartz	4397b4d508	*: pass Prometheus registry into storage	8 years ago
Julius Volz	beb3c4b389	Remove legacy remote storage implementations This removes legacy support for specific remote storage systems in favor of only offering the generic remote write protocol. An example bridge application that translates from the generic protocol to each of those legacy backends is still provided at: documentation/examples/remote_storage/remote_storage_bridge See also https://github.com/prometheus/prometheus/issues/10 The next step in the plan is to re-add support for multiple remote storages.	8 years ago
Fabian Reinartz	ea3ba338dd	main: add flags for new storage	8 years ago
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	8 years ago

1 2 3 4 5 ...

356 Commits (18d45e564b2be25b168465e356713bf9d795f7db)