SemEval 2025 Shared Task 7:
Multilingual and Crosslingual Fact-Checked Claim Retrieval

About the Task

The SemEval-2025 shared task 7 on Multilingual and Crosslingual Fact-Checked Claim Retrieval addresses the critical challenge of efficiently identifying previously fact-checked claims across multiple languages — a task that can be time-consuming for professional fact-checkers even within a single language and becomes much more difficult to perform manually when the claim and the fact-check may be in different languages. Given the global spread of disinformation narratives, the range of languages that one would need to cover not to miss existing fact-checks is vast.

Participants in the SemEval-2025 shared task 7 will develop systems to retrieve relevant fact-checked claims for given social media posts across multiple languages, addressing the challenge of efficiently identifying previously fact-checked claims in a multilingual and cross-lingual context – supporting fact-checkers and researchers in their efforts to curb the global spread of misinformation.

The Shared Task utilizes a dataset that is derived from and builds upon the original MultiClaim dataset, with modifications and enhancements made specifically for this task. For additional details about the dataset, one may also refer to the dataset paper.

How to participate

Please register through our CodaBench competition:
[LINK TO THE REGISTRATION]

  1. Go to the ‘My Submission’ tab to register for the competition.
  2. After registration, wait for approval and you’ll receive the dataset link for the development phase.
  3. Decide if you want to participate as an individual or as an organization.
  4. To participate as an organization:
    • Click on your icon in the top right corner.
    • Select “Create Organizations”.
    • Enter your desired Team Name as the Organization name in the Registration Form.
  5. Choose your task: Monolingual or crosslingual. You can choose to submit to one task only.
  6. When submitting:
    • Choose “Submit As” and select your name or organization from the dropdown list.
    • Specify your submission name in the Fact Sheet (this will appear on the leaderboard).
    • Select the correct task(s) from the dropdown list. By default, both tasks are selected.
  7. Check the leaderboard to see your submissions. If submitting as an organization, your Team Name will appear in the “Participant” column.
  8. For more detailed guidelines, refer to the [GUIDELINES] on how to participate.

Sample Data

[LINK TO THE SAMPLE DATA AT GITHUB]

In this task, you are given social media posts (SMP), and a bunch of fact-checks (FC). The goal is to find the most relevant fact-checks for each social media post.

In this trial data (sampled from MultiClaim), we have in total 50 SMP-FC pairs. Out of them, we have 10 eng-eng pairs and 40 pairs in 2 different languages (each has 20 examples, 10 for monolingual (e.g., kor-kor) and 10 for multilingual (e.g., kor-eng)).

Potential retrieval setup can be various. For example, for each SMP, the FC search pool can be limited to the target language, different languages, and different mix of languages.

In this task, we aim to evaluate in both monolingual and crosslingual setup. Common metrics include mean reciprocal rank and success @ K.

The sample data consists of three csv files:

1) trial_fact_check_post_mapping.csv (50 pairs)

This file contains the mapping between fact checks and social media posts. It has three fields:

  • fact_check_id: the id of the fact check in the trial_fact_checks.csv
  • post_id: the id of the post in the trial_posts.csv
  • pair_lang: the language info about this mapped pair. For example, spa-eng stands for SMP in Spanish and FC in English.

2) trial_fact_checks.csv (50 fact-checks)

This file contains all fact-checks. It has four fields:

  • fact_check_id
  • claim – This is the translated text (see below) of the fact-check claim, original text is also contained.
  • instances – Instances of the fact-check – a list of timestamps and URLs.
  • title – This is the translated text (see below) of the fact-check title

3) trial_posts.csv (47 social media posts)

This file contains all social media posts. It has five fields:

  • post_id
  • instances – Instances of the fact-check – a list of timestamps and what were the social media platforms.
  • ocr – This is a list of texts and their translated versions (see below) of the OCR transcripts based on the images attached to the post.
  • verdicts – This is a list of verdicts attached by Meta (e.g., False information)
  • text – This is the text and translated text (see below) of the text written by the user.

Loading the Data

Along with the CSV files, we provide Python script load.py to load the data.

What is a translated text?

A tuple of text, its translation to English and detected languages, e.g., in the sample below we have an original Croatian text, its translation to English and finally the predicted language composition (hbs = Serbo-Croatian):

( ‘“…bolnice su pune ? ti ina, muk…upravo sada, bolnica Rebro..tragi no sme no’, ‘“…hospitals are full? silence, silence… right now, Rebro hospital… tragically funny’, [(‘hbs’, 1.0)] )

API used for ocr and translation

We use following services to obtain transcripts and translations:

  • Google Vision API. We use Google Vision API to extract text from images attached to the post. The API also returns a list of languages found in each image with their percentage.
  • Google Translate API. We use Google Translate API to translate all the texts into English. The API also returns a most probable language.

Organizers

OrganizersAffiliation
Qiwei PengUniversity of Copenhagen
Michal GregorKempelen Institute of Intelligent Technologies
Ivan SrbaKempelen Institute of Intelligent Technologies
Simon OstermannGerman Research Center for Artificial Intelligence
Marián ŠimkoKempelen Institute of Intelligent Technologies
Jaroslav KopčanKempelen Institute of Intelligent Technologies
Juraj PodroužekKempelen Institute of Intelligent Technologies
Matúš MesarčíkKempelen Institute of Intelligent Technologies
Róbert MóroKempelen Institute of Intelligent Technologies
Anders SøgaardUniversity of Copenhagen

If you have any questions, please contact us at semeval@disai.eu.