This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.
That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
// if server is nil this indicates that while we have a Raft leader
returnfalse,server
// something has caused that node to be considered unhealthy which
// cascades into its removal from the serverLookup struct. In this case
// we should not report no cluster leader but instead report a different
// error so as not to confuse our users as to the what the root cause of
// an issue might be.
ifserver==nil{
s.logger.Warn("Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.","leader",leader)
returnfalse,nil,structs.ErrLeaderNotTracked
}
returnfalse,server,nil
}
}
// forwardDC is used to forward an RPC call to a remote DC, or fail if no servers
// forwardDC is used to forward an RPC call to a remote DC, or fail if no servers