Two protocols being proposed as ways towards making the stack failure tolerance. Currently, the root switch or a link down can break broadcast and unicast in the stack.
1. probing to discover a physical link or a valve failure:
by periodically sending probe packets via all physical stack links. Missing e.g. 3 consecutive packets implies link or peer down. With multiple stack links between switches, all links down means peer down.
This can be implemented by extending the current LLDP beacon. By including dp_id, port_id to the probe, one can compare the received info with the config file to determine if miscabling has happened.
2. state propagation.
Two states that are important for broadcast and unicast calculation after a failure: mapping between a MAC and a switch, and link state.
When calculating unicast rules, we need mappings host-to-switch. Two possible ways for that: tagging (QinQ) or overloading dest MAC (actually replace broadcast MAC with unique MAC of the origin sw; switches will have rules to match and broadcast the packets to non-stack ports as normal).
For link state, some kind of gossip can be used.
I've have implemented some part of this. but really want to hear your comments to make it usable.