Rethinking Language Model Scaling under Transferable Hypersphere Optimization — ThinkLLM