Commit Graph

21 Commits (6bf33d0db963229654edc960e6b09172e1e45e33)

Author SHA1 Message Date
Ashesh Vidyut 5e7693bb1b
NET-4519 Collecting journald logs in "consul debug" bundle (#18797) (#18884)
NET-4519 Collecting journald logs in "consul debug" bundle (#18797)

* debug since

* fix docs

* chagelog added

* fix go mod

* debug test fix

* fix test

* tabs test fix

* Update .changelog/18797.txt



---------

Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
2023-09-19 17:37:11 +05:30
hc-github-team-consul-core 0575758840
Backport of [NET-4107][Supportability] Log Level set to TRACE and duration set to 5m for consul-debug into release/1.16.x (#17728)
* backport of commit 9d72a262f3
* backport of commit 9b9bb8d206
* backport of commit ba44809297

---------

Co-authored-by: Ashesh Vidyut <ashesh.vidyut@hashicorp.com>
2023-06-20 13:48:33 -07:00
Ronald 4c070c38e4
Copyright headers for command folder (#16705)
* copyright headers for agent folder

* Ignore test data files

* fix proto files and remove headers in agent/uiserver folder

* ignore deep-copy files

* copyright headers for agent folder

* Copyright headers for command folder

* fix merge conflicts
2023-03-28 15:12:30 -04:00
Chris S. Kim a518893685
Fix various flaky tests (#16396) 2023-02-23 14:52:18 -05:00
Chris S. Kim 0e176dd6aa
Allow consul debug on non-ACL consul servers (#15155) 2022-10-27 09:25:18 -04:00
Daniel Nephin 53ae4b3e2c debug: update CLI docs
To clarify how trace is captured.

Also remove the minimum seconds check, because that is already done in prepare()
2022-02-15 18:16:12 -05:00
Daniel Nephin cc2c005fad debug: limit the size of the trace
We've noticed that a trace that is captured over the full duration is
too large to open on most machines. A trace.out captured over just the
interval period (30s by default) should be a more than enough time to
capture trace data.
2022-02-15 14:15:34 -05:00
Daniel Nephin 797ee061e4 debug: use human readable dates for filenames
The unix timestamps that were used make the debug data a little bit more
difficult to consume. By using human readable dates we can easily see
when the profile data was collected.

This commit also improves the test coverage. Two test cases are removed
and the assertions from those cases are moved to TestDebugCommand.

Now TestDebugCommand is able to validate the contents of all files. This
change reduces the test runtime of the command/debug package by almost
50%. It also makes much more strict assertions about the contents by
using gotest.tools/v3/fs.
2021-08-18 13:06:57 -04:00
Daniel Nephin 2f8d0e12cf debug: small cleanup
Use the new WriteJsonFile function to write index.json
Remove .String() from time.local() since that is done by %s
Remove an unused field.
2021-08-18 12:30:59 -04:00
Daniel Nephin 4359e38114 debug: restore cancel on SigInt
Some previous changes broke interrupting the debug on SigInterupt. This change restores
the original behaviour by passing a context to requests.

Since a new API client function was required to pass the context, I had
it also return an io.ReadCloser, so that output can be streamed to files
instead of fully buffering in process memory.
2021-08-18 12:29:34 -04:00
Daniel Nephin bbf6a94c9a debug: rename cluster target to members
The API is called members. Using the same name as the API should help describe the contents
of the file.
2021-08-18 12:29:34 -04:00
Daniel Nephin 251026e374 debug: remove unused 2021-08-18 12:29:33 -04:00
Daniel Nephin beea1c2218 http: emit indented JSON in the metrics stream endpoint
To remove the need to decode and re-encode in the CLI
2021-07-26 17:53:33 -04:00
Daniel Nephin c3149ec0fd debug: use the new metrics stream in debug command 2021-07-26 17:53:32 -04:00
Dhia Ayachi 005ad9e46d
generate a single debug file for a long duration capture (#10279)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* extract wait group outside the go routine to avoid a race condition

* capture pprof in a separate go routine

* perform a single capture for pprof data for the whole duration

* add missing vendor dependency

* add a change log and fix documentation to reflect the change

* create function for timestamp dir creation and simplify error handling

* use error groups and ticker to simplify interval capture loop

* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval

* refactor Logs capture routine and add log capture specific test

* improve error reporting when log test fail

* change test duration to 1s

* make time parsing in log line more robust

* refactor log time format in a const

* test on log line empty the earliest possible and return

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* rename function to captureShortLived

* more specific changelog

Co-authored-by: Paul Banks <banks@banksco.de>

* update documentation to reflect current implementation

* add test for behavior when invalid param is passed to the command

* fix argument line in test

* a more detailed description of the new behaviour

Co-authored-by: Paul Banks <banks@banksco.de>

* print success right after the capture is done

* remove an unnecessary error check

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2021-06-07 13:00:51 -04:00
Dhia Ayachi 4c7f5f31c7
debug: remove the CLI check for debug_enabled (#10273)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* Add changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-05-27 09:41:53 -04:00
rerorero 86c8e48dd9 fix: incorrect struct tag and WaitGroup usage (#6649)
* remove duplicated json tag

* fix: incorrect wait group usage
2019-10-18 13:59:29 -04:00
Christian Muehlhaeuser 7753b97cc7 Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
Boris Popovschi b4eca8fcd7 Fixed gziping function for debug archive (#5184) 2019-01-03 10:39:58 -05:00
R.B. Boyer 57dd160f40 command/debug: make better use of atomic operations to write out the debug snapshots to disk 2018-11-02 13:13:49 -05:00
Jack Pearkes 8c684db488 New command: consul debug (#4754)
* agent/debug: add package for debugging, host info

* api: add v1/agent/host endpoint

* agent: add v1/agent/host endpoint

* command/debug: implementation of static capture

* command/debug: tests and only configured targets

* agent/debug: add basic test for host metrics

* command/debug: add methods for dynamic data capture

* api: add debug/pprof endpoints

* command/debug: add pprof

* command/debug: timing, wg, logs to disk

* vendor: add gopsutil/disk

* command/debug: add a usage section

* website: add docs for consul debug

* agent/host: require operator:read

* api/host: improve docs and no retry timing

* command/debug: fail on extra arguments

* command/debug: fixup file permissions to 0644

* command/debug: remove server flags

* command/debug: improve clarity of usage section

* api/debug: add Trace for profiling, fix profile

* command/debug: capture profile and trace at the same time

* command/debug: add index document

* command/debug: use "clusters" in place of members

* command/debug: remove address in output

* command/debug: improve comment on metrics sleep

* command/debug: clarify usage

* agent: always register pprof handlers and protect

This will allow us to avoid a restart of a target agent
for profiling by always registering the pprof handlers.

Given this is a potentially sensitive path, it is protected
with an operator:read ACL and enable debug being
set to true on the target agent. enable_debug still requires
a restart.

If ACLs are disabled, enable_debug is sufficient.

* command/debug: use trace.out instead of .prof

More in line with golang docs.

* agent: fix comment wording

* agent: wrap table driven tests in t.run()
2018-10-19 08:41:03 -07:00