Tag: vision-transformers

Blog Post·2024-06-19·7 min read

Contrastive SSL: SimCLR, MoCo, and DINO

Contrastive learning teaches a model that two views of the same image should be close in representation space, and views of different images should be far apart. The details of how you enforce this determine everything.

ssl contrastive-learning dino vision-transformers

Blog Post·2024-06-19·4 min read

I-JEPA: Self-Supervised Vision at Scale

I-JEPA applies the JEPA idea to images: predict the representations of target patches from a context region, without any view-level augmentations. The result transfers better to semantic tasks than pixel-level methods.

ssl jepa i-jepa vision-transformers

Blog Post·2024-06-19·5 min read

Masked Autoencoders: Learning by Filling in the Blanks

Mask 75% of an image's patches. Train a model to reconstruct them. The result is a rich visual representation — and the recipe works because pixels are redundant and structure is not.

ssl mae masked-autoencoders vision-transformers