Beautiful illustrations  
I find, &#x27;Playing&#x27; is just the free and motivated version of &#x27;exploration&#x27;.<p>One thought on your nicely illustrated &quot;key observation [is] that neural networks tend to place features along directions&quot;: my guess is that the neural net was TOLD to behave that way by choosing e.g. Cosine Loss?

&gt; Because you somehow need a giant training set which describes images in natural language, no?<p>That&#x27;s definitely one way - they train a text encoder together with an image encoder on a labelled set of images. WL &amp; 3b1b made a nice video on it: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iv-5mZ_9CPY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iv-5mZ_9CPY</a>

Very nice visualizations, thanks for that!<p>One thing I still struggle with in my head is how these vision embeddings can then be used to give LLMs eyes.<p>Because you somehow need a giant training set which describes images in natural language, no? Is that actually how it works, or is there some smart trick so you don&#x27;t need to pay labellers a bunch of money to look at pictures and describe them.