I’m working on a multi-task project using Transformers—what's the best practice to manage multiple heads in a single model?

If I want one Transformer model to do multiple tasks, what’s the right way to design and organize the separate output layers (heads) for those tasks?

Hmm… Just the resources for now…