Skip to content

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

Google AI team crawled the web looking for images with well-defined HTML alt-text. This way, they were able to assemble a large dataset of image/text pairs for automatic image captioning. Their baseline is the MS-COCO dataset.

They focused on an automatic contextualization of the alt-text, where they improved captions by leveraging different NLP techniques.

They validated their work by training models using the dataset and comparing performance with the baseline.