Introduction¶

For a variety of recognition applications, the most accurate results are now obtained by adapting large foundation models pre-trained on massive curated or raw data [1]

CLIP as a foundation model could power inumerous ethnicity classification solutions, and as such it is imperial that we explored inherent bias in its zero-shot performance.

References¶

Visual Prompt Tuning. Jia et al. 2022