Vis enkel innførsel

dc.contributor.authorMohammadi, Mahsa
dc.contributor.authorEftekhari, Mahdi
dc.contributor.authorHassani, Amirhossein
dc.date.accessioned2023-11-21T14:55:25Z
dc.date.available2023-11-21T14:55:25Z
dc.date.created2023-11-10T12:19:42Z
dc.date.issued2023
dc.identifier.citationIEEE Access. 2023, 11, 123209-123222.en_US
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/11250/3103915
dc.description.abstractCross-modal representation learning aims to learn a shared representation space where data from multiple modalities can be effectively compared, fused, and understood. This paper investigates the role of increased diversity in the similarity score matrix in enhancing the performance of the CLIP (Contrastive Language-Image Pretraining), a multi-modal learning model that establishes a connection between images and text within a joint embedding space. Two transforming approaches, sine and sigmoid (including two versions), are incorporated into the CLIP model to amplify larger values and diminish smaller values within the similarity matrix (logits). Hardware limitations are addressed using a more compact text encoder (DistilBERT) and a pre-trained ResNet50 image encoder. The proposed adaptations are evaluated on various benchmarks, including image classification and image/text retrieval tasks, using 10 benchmark datasets such as Food101, Flickr30k, and COCO. The performance of the adapted models is compared to the base CLIP model using Accuracy, mean per class, and Recall@k metrics. The results demonstrate improvements in Accuracy (up to 5.32% enhancement for the PatchCamelyon dataset), mean per class (up to 14.48% enhancement for the FGVCAircraft dataset), and retrieval precision (with an increase of up to 45.20% in Recall@1 for the COCO dataset), compared to the baseline algorithm (CLIP).en_US
dc.language.isoengen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.titleImage-Text Connection: Exploring the Expansion of the Diversity Within Joint Feature Space Similarity Scoreen_US
dc.title.alternativeImage-Text Connection: Exploring the Expansion of the Diversity Within Joint Feature Space Similarity Scoreen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.rights.holder© 2023 The Authors.en_US
dc.source.pagenumber123209-123222en_US
dc.source.volume11en_US
dc.source.journalIEEE Accessen_US
dc.identifier.doi10.1109/ACCESS.2023.3327339
dc.identifier.cristin2195034
dc.relation.projectNILU: 120132en_US
dc.relation.projectNILU: 121128en_US
dc.relation.projectEC/H2020: 101037648en_US
dc.relation.projectEEA and Norway Grants: 2019/35/J/HS6/03992en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal