A transformer-based neural network design that learns to understand language by predicting masked words in text, improved upon the original BERT model.