Server 2016 introduced three new “node states” and the one that people are confused about is Quarantine. Quarantined nodes are just that; their roles are drained to other nodes in the cluster and they are not allowed to rejoin the cluster for two hours. The idea is to stop failing nodes from bouncing up and down (so called flapping), joining and leaving the cluster over and over again causing performance issues.
Here are the things you should know about Quarantine:
- The default of 2 hours can be viewed and changed using the PowerShell commands
- EVENT LOG will show ERROR 1676: THE NODE WILL BE QUARANTINED UNTIL <specific time>
- To view force clear a quarantined node, us the followng PowerShell commands
However, in a recent call to tech support Microsoft told me not to use it because it could take as long a FOUR hours to come back on line with such a command
- If your whole cluster is flapping, the cluster will only allow a maximum of 25% of the nodes to be quarantined. The exception to this rule is if you have a 2 or 3 node cluster, in that case the cluster will allow quarantining one node
- To determine the number of times a node can have problems before it is quarantined, use the PowerShell commands:
For more details on Server 2016 node states see THIS Microsoft article. You might also find THIS Microsoft article on PowerShell commands relating to Clusters to be useful.