Open AI rang in the new year with a major announcement: two new revolutionary pieces of research: 1)DALL-E which can generate images from text, and 2)CLIP which provides a one-shot image classification approach without the requirement of training a model. This article focuses on CLIP, specifically, how the Vector robot can classify objects that it sees as long as an input list of possible text sequences that describe the expected objects is provided.

First, what is the big deal about CLIP. One of the major challenges in deep learning is the requirement of labelled datasets required to train a model


Avid biker. VMware engineer. Robotics. Thoughts in this forum reflect my own opinions. Write about Robotics, Vector, Cozmo, and VMware.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store