Blog Post··4 min read
V-JEPA: Predicting the Future in Representation Space
V-JEPA extends JEPA to video: predict the representations of future or masked frames from context frames. No pixel reconstruction, no contrastive loss — just abstract prediction across time.