For building staged tree models at scale, use Total Variation divergence with Ward.D2 hierarchical clustering—it matches the accuracy of slower methods like Backward Hill Climbing but runs significantly faster.
This paper presents a new method for building staged tree models—a type of probabilistic graphical model that captures context-specific patterns in data. The approach uses hierarchical clustering on probability distributions, comparing different distance metrics and clustering strategies.