Two protocols being proposed as ways towards making the stack failure
tolerance. Currently, the root switch or a link down can break broadcast
and unicast in the stack.
*1. probing to discover a physical link or a valve failure:*
by periodically sending probe packets via all physical stack links. Missing
e.g. 3 consecutive packets implies link or peer down. With multiple stack
links between switches, all links down means peer down.
This can be implemented by extending the current LLDP beacon. By including
dp_id, port_id to the probe, one can compare the received info with the
config file to determine if miscabling has happened.
*2. state propagation.*
Two states that are important for broadcast and unicast calculation after a
failure: mapping between a MAC and a switch, and link state.
When calculating unicast rules, we need mappings host-to-switch. Two
possible ways for that: tagging (QinQ) or overloading dest MAC (actually
replace broadcast MAC with unique MAC of the origin sw; switches will have
rules to match and broadcast the packets to non-stack ports as normal).
For link state, some kind of gossip can be used.
I've have implemented some part of this. but really want to hear your
comments to make it usable.