• arXiv : [2106.09785] Efficient Self-supervised Vision Transformers for Representation Learning (arxiv.org)
  • github : https://github.com/microsoft/esvit

  • Related works

  • Contribution

  • Monolithic ViT / Multi-stage ViT

  • Pre-training task

  • Experiment results

  • Appendix.