Summary
Giving talks on AI embedding at MIT, Harvard and Flatiron
It was great being in Boston and New York. I have visited CSAIL at MIT , Kempner AI Institute at Harvard and finally the Flatiron Institute (Neuroscience Dept) at NYC, headed by Prof. Simoncelli.
In all places I gave a talk related to our recent works on the represention and embedding of foundation models, see abstract below.
I had great conversations with excellent and thoughtfool people. Thanks for the hospitality of Tamr Rott-Shaham (MIT), Yonatan Belinkov (Harvard) and Guy Ohayon (Flatiron) who were very kind and helpful organizing the visits.
Abstract of talk:
How to Encode World Knowledge?
Foundation models are a key platform which implicitly encodes world knowledge. In this talk we first focus on vision-language models, such as CLIP, and investigate their geometric behavior and logic behind the high-dimensional feature encoding. For instance, we find that as image or text become more rare and distinct they are encoded further from the center of the embedding. We explain why InfoNCE loss leads to that behavior. We also find out empirically that each modality can be well modeled statistically as admitting a multivariate Gaussian distribution. This finding is then proved formally, where it is shown that InfoNCE asymptotically induces Gaussian distribution. Finally, we connect two major image foundation models – encoders and generators, through a universal normal embedding hypothesis. A surprising consequence of this hypothesis is demonstrated, where generative diffusion “noise” contains semantic data which can be accessed by linear probing for either classification or editing. This talk is based on 6 recent papers in the group published at ICML 2025, ICLR 2026 and CVPR 2026.
Some pics








