Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning¶
Google AI team crawled the web looking for images with well-defined HTML alt-text. This way, they were able to assemble a large dataset of image/text pairs for automatic image captioning. Their baseline is the MS-COCO dataset.
They focused on an automatic contextualization of the alt-text, where they improved captions by leveraging different NLP techniques.
They validated their work by training models using the dataset and comparing performance with the baseline.