prometheus

Commit Graph

Author	SHA1	Message	Date
Fabian Reinartz	285bc07030	Switch append refs to string	8 years ago
Shashank Varanasi	dea60bb553	Fix malformed uname string (#2727 ) * Fix malformed uname string * Make fix better * Reformat code for simplicity	8 years ago
Fabian Reinartz	c8438cfc81	Add mutex profiling to benchmark	8 years ago
Shashank Varanasi	61235fd851	Print system information (uname) at Prometheus startup (#2709 ) * Print uname on prom startup * Make uname file linux-only * Add missing license headers Add missing license headers * Print OS when uname is not available * Print only OS name when uname not available * Remove extra space, fix cmd/prometheus/main.go license header * Add fix for int8 and uint8 systems * Better formatting for build tags in cmd/prometheus/uname files * Remove newline	8 years ago
Frederic Branczyk	c50a3eccce	prometheus: default max-block-duration to 10% of retention	8 years ago
Michal Witkowski	4177c35eba	Fixup sighup for P2 TSDB init #2699	8 years ago
Fabian Reinartz	9b175d48cb	Add flag to disable TSDB lock file	8 years ago
Matt Layher	283756c503	Initial commit of 'promtool check-metrics', promlint package (#2605 )	8 years ago
Fabian Reinartz	778103b450	Add liecence file and headers	8 years ago
Fabian Reinartz	757cba7c31	cmd/prometheus: Undo GOGC adjustment	8 years ago
beorn7	f20b84e816	flags: Improve doc strings for checkpoint flags	8 years ago
Fabian Reinartz	10c7c9acbe	Adjust import names to new repository organisation	8 years ago
Goutham Veeramachaneni	f27ce34a13	Use Registerer to Register All Metrics * Made Metric a Gauge so that it can be registered.	8 years ago
Goutham Veeramachaneni	0d0c9d5440	Move Registerer to Config Struct in Notifier	8 years ago
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	8 years ago
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	8 years ago
beorn7	04ccf84559	main.go: Set GOGC to 40 by default Rationale: The default value for GOGC is 100, i.e. a garbage collected is initialized once as many heap space has been allocated as was in use after the last GC was done. This ratio doesn't make a lot of sense in Prometheus, as typically about 60% of the heap is allocated for long-lived memory chunks (most of which are around for many hours if not days). Thus, short-lived heap objects are accumulated for quite some time until they finally match the large amount of memory used by bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we are essentially reinstating "normal" GC behavior by acknowledging that about 60% of the heap are used for long-term bulk storage. The median Prometheus production server at SoundCloud runs a GC cycle every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds (which is still not very often). However, the effective RAM usage is now reduced by about 30%. If settings are updated to utilize more RAM, the time between GC cycles goes up again (as the heap size is larger with more long-lived memory chunks, but the frequency of creating short-lived heap objects does not change). On a quite busy large Prometheus server, the timing changed from one GC run every 20s to one GC run every 12s. In the former case (just changing GOGC, leave everything else as it is), the CPU usage increases by about 10% (on a mid-size referenc server from 8.1 to 8.9). If settings are adjusted, the CPU consumptions increases more drastically (from 8 cores to 13 cores on a large reference server), despite GCs happening more rarely, presumably because a 50% larger set of memory chunks is managed now. Having more memory chunks is good in many regards, and most servers are running out of memory long before they run out of CPU cycles, so the tradeoff is overwhelmingly positive in most cases. Power users can still set the GOGC environment variable as usual, as the implementation in this commit honors an explicitly set variable.	8 years ago
Julius Volz	8fda83ea12	Make rules only read local data	8 years ago
Julius Volz	406b65d0dc	Rename remote.Storage to remote.Writer	8 years ago
Julius Volz	02395a224d	[WIP] Remote Read	8 years ago
Goutham Veeramachaneni	761e4768f3	Lint and Vet Fixes	8 years ago
Fabian Reinartz	b586781283	*: update tsdb vendoring and add retention flag	8 years ago
Fabian Reinartz	87805fb83f	Remove Partitioned* code	8 years ago
Goutham Veeramachaneni	f35816613e	Refactored Notifier to use Registerer * Brought metrics back into Notifier Notifier still implements a Collector. Check if that is needed.	8 years ago
Fabian Reinartz	cc0a7c8279	Create alloc and inuse space heap profile	8 years ago
Fabian Reinartz	ffb24a98f4	Add missing unlock, run debug endpoint in benchmark	8 years ago
Fabian Reinartz	4397b4d508	*: pass Prometheus registry into storage	8 years ago
Fabian Reinartz	db5c88ea9a	Misc compaction fixes	8 years ago
Fabian Reinartz	b281e4e39b	Accept prometheus.Registerer in constructor	8 years ago
Fabian Reinartz	a3b47c4929	Create default logger for DB	8 years ago
Fabian Reinartz	9c7a88223e	Add full encode/decode WAL cycle test	8 years ago
Julius Volz	beb3c4b389	Remove legacy remote storage implementations This removes legacy support for specific remote storage systems in favor of only offering the generic remote write protocol. An example bridge application that translates from the generic protocol to each of those legacy backends is still provided at: documentation/examples/remote_storage/remote_storage_bridge See also https://github.com/prometheus/prometheus/issues/10 The next step in the plan is to re-add support for multiple remote storages.	8 years ago
Fabian Reinartz	79944a5912	Break out WAL into segment files	8 years ago
Fabian Reinartz	9c76624df2	Add initial retention cutoff	8 years ago
Fabian Reinartz	ea3ba338dd	main: add flags for new storage	8 years ago
Fabian Reinartz	012cf4ef25	Count writer references on head blocks	8 years ago
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	8 years ago
Fabian Reinartz	5a1c8eaa0e	Fix missing appends after reference lookups	8 years ago
Fabian Reinartz	30efe4a58c	Support writing to multiple head blocks This is an initial (and hacky) first pass on allowing appending to multiple blocks simultaniously to avoid dropping samples right after cutting a new head block. It's also required for cases like the PGW, where a scrape may contain varying timestamps.	8 years ago
Fabian Reinartz	c20cc44b06	Add docs, write sequence number to meta.json	8 years ago
Fabian Reinartz	035976b275	retrieval: handle not found error correctly	8 years ago
Fabian Reinartz	5fb01d41aa	Use new Prometheus text format parser	8 years ago
Bartek Plotka	579e33f19a	Fixed style issues.	8 years ago
Bartek Plotka	d7febe97fa	Fixed regression in -alertmanager.url flag. Basic auth was ignored. - Included basic auth parsing while parsing to AlertmanagerConfig - Added test case Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	8 years ago
Fabian Reinartz	ad9bc62e4c	storage: extend appender and adapt it	8 years ago
Fabian Reinartz	fde69dab49	Use buffer pool for head appenders	8 years ago
Fabian Reinartz	a317f252b9	Expose series references to clients This exposes a reference number of a series represented by a label set to clients. Subsequent samples can be directly added via the reference rather than repeatedly passing in the full labels. This drasitcally speeds up the append process. The appender chain uses different sections of the reference number for assignment to child appenders and invalidating reference numbers as necessary. Clients can either pass out reference numbers themselves or have their own optimized lookup, i.e. by directly associating unparsed metric descriptors strings with reference numbers.	8 years ago
Fabian Reinartz	80affd98a8	Add barrier to benchmark writer This adds a barrier to avoid issues with unfair goroutine scheduling that causes some fake scrapers to run away from the other ones.	8 years ago
Fabian Reinartz	c32a94d409	Unexport HeadBlock, export Block interface	8 years ago
Fabian Reinartz	d86e8a63c7	Report correct number of appended samples	8 years ago

... 3 4 5 6 7

316 Commits (750e438ebb11f0d46e3a8c40ffed24ac3c1fa760)