Merge pull request #48204 from shyamjvs/logdump-only-n-nodes

Automatic merge from submit-queue

Allow log-dumping only N randomly-chosen nodes in the cluster

This should let us save "lots" (~3-4 hours) of time in our 5000-node cluster scale tests as we copy logs from all the nodes to jenkins worker and then upload all of them to gcs (while we don't need too many).
This will also prevent the jenkins container facing "No space left on device" error while dumping logs, that we saw in runs 12-13 of gce-enormous-cluster.

The longterm fix will be to enable [logexporter](https://github.com/kubernetes/test-infra/tree/master/logexporter) for our tests.

cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @gmarek @fejta
pull/6/head
Kubernetes Submit Queue 2017-06-29 04:23:58 -07:00 committed by GitHub
commit 7018479968
1 changed files with 12 additions and 1 deletions

View File

@ -224,8 +224,19 @@ function dump_nodes() {
return
fi
nodes_selected_for_logs=()
if [[ -n "${LOGDUMP_ONLY_N_RANDOM_NODES:-}" ]]; then
# We randomly choose 'LOGDUMP_ONLY_N_RANDOM_NODES' many nodes for fetching logs.
for index in `shuf -i 0-$(( ${#node_names[*]} - 1 )) -n ${LOGDUMP_ONLY_N_RANDOM_NODES}`
do
nodes_selected_for_logs+=("${node_names[$index]}")
done
else
nodes_selected_for_logs=( "${node_names[@]}" )
fi
proc=${max_scp_processes}
for node_name in "${node_names[@]}"; do
for node_name in "${nodes_selected_for_logs[@]}"; do
node_dir="${report_dir}/${node_name}"
mkdir -p "${node_dir}"
# Save logs in the background. This speeds up things when there are