SemEval 2025 Shared Task 7:
Multilingual and Crosslingual Fact-Checked Claim Retrieval

About the Task

The SemEval-2025 shared task 7 on Multilingual and Crosslingual Fact-Checked Claim Retrieval addresses the critical challenge of efficiently identifying previously fact-checked claims across multiple languages — a task that can be time-consuming for professional fact-checkers even within a single language and becomes much more difficult to perform manually when the claim and the fact-check may be in different languages. Given the global spread of disinformation narratives, the range of languages that one would need to cover not to miss existing fact-checks is vast.

Participants in the SemEval-2025 shared task 7 will develop systems to retrieve relevant fact-checked claims for given social media posts across multiple languages, addressing the challenge of efficiently identifying previously fact-checked claims in a multilingual and cross-lingual context – supporting fact-checkers and researchers in their efforts to curb the global spread of misinformation.

The Shared Task utilizes a dataset that is derived from and builds upon the original MultiClaim dataset, with modifications and enhancements made specifically for this task. For additional details about the dataset, one may also refer to the dataset paper.

How to participate

Please register through our CodaBench competition:
[LINK TO THE REGISTRATION]

Go to the ‘My Submission’ tab to register for the competition.
After registration, wait for approval and you’ll receive the dataset link for the development phase.
Decide if you want to participate as an individual or as an organization.
To participate as an organization:
- Click on your icon in the top right corner.
- Select “Create Organizations”.
- Enter your desired Team Name as the Organization name in the Registration Form.
Choose your task: Monolingual or crosslingual. You can choose to submit to one task only.
When submitting:
- Choose “Submit As” and select your name or organization from the dropdown list.
- Specify your submission name in the Fact Sheet (this will appear on the leaderboard).
- Select the correct task(s) from the dropdown list. By default, both tasks are selected.
Check the leaderboard to see your submissions. If submitting as an organization, your Team Name will appear in the “Participant” column.
For more detailed guidelines, refer to the [GUIDELINES] on how to participate.

Sample Data

[LINK TO THE SAMPLE DATA AT GITHUB]

In this task, you are given social media posts (SMP), and a bunch of fact-checks (FC). The goal is to find the most relevant fact-checks for each social media post.

In this trial data (sampled from MultiClaim), we have in total 50 SMP-FC pairs. Out of them, we have 10 eng-eng pairs and 40 pairs in 2 different languages (each has 20 examples, 10 for monolingual (e.g., kor-kor) and 10 for multilingual (e.g., kor-eng)).

Potential retrieval setup can be various. For example, for each SMP, the FC search pool can be limited to the target language, different languages, and different mix of languages.

In this task, we aim to evaluate in both monolingual and crosslingual setup. Common metrics include mean reciprocal rank and success @ K.

The sample data consists of three csv files:

1) trial_fact_check_post_mapping.csv (50 pairs)

This file contains the mapping between fact checks and social media posts. It has three fields:

fact_check_id: the id of the fact check in the trial_fact_checks.csv
post_id: the id of the post in the trial_posts.csv
pair_lang: the language info about this mapped pair. For example, spa-eng stands for SMP in Spanish and FC in English.

2) trial_fact_checks.csv (50 fact-checks)

This file contains all fact-checks. It has four fields:

fact_check_id
claim – This is the translated text (see below) of the fact-check claim, original text is also contained.
instances – Instances of the fact-check – a list of timestamps and URLs.
title – This is the translated text (see below) of the fact-check title

This file contains all social media posts. It has five fields:

post_id
instances – Instances of the fact-check – a list of timestamps and what were the social media platforms.
ocr – This is a list of texts and their translated versions (see below) of the OCR transcripts based on the images attached to the post.
verdicts – This is a list of verdicts attached by Meta (e.g., False information)
text – This is the text and translated text (see below) of the text written by the user.

Loading the Data

Along with the CSV files, we provide Python script load.py to load the data.

What is a translated text?

A tuple of text, its translation to English and detected languages, e.g., in the sample below we have an original Croatian text, its translation to English and finally the predicted language composition (hbs = Serbo-Croatian):

( ‘“…bolnice su pune ? ti ina, muk…upravo sada, bolnica Rebro..tragi no sme no’, ‘“…hospitals are full? silence, silence… right now, Rebro hospital… tragically funny’, [(‘hbs’, 1.0)] )

API used for ocr and translation

We use following services to obtain transcripts and translations:

Google Vision API. We use Google Vision API to extract text from images attached to the post. The API also returns a list of languages found in each image with their percentage.
Google Translate API. We use Google Translate API to translate all the texts into English. The API also returns a most probable language.

SemEval Shared Task Results

Monolingual Evaluation

Rank	Team	S@10 (avg)	S@10 (eng)	S@10 (fra)	S@10 (deu)	S@10 (por)	S@10 (spa)	S@10 (tha)	S@10 (msa)	S@10 (ara)	S@10 (tur)	S@10 (pol)
1	PINGAN AI	0.960	0.916	0.972	0.958	0.926	0.974	0.995	1	0.986	0.948	0.926
2	PALI	0.947	0.904	0.954	0.936	0.908	0.970	1	1	0.982	0.930	0.888
3	TIFIN India	0.938	0.880	0.954	0.936	0.902	0.960	0.995	1	0.966	0.904	0.886
4	Sherlock	0.938	0.900	0.942	0.928	0.896	0.952	0.995	1	0.966	0.916	0.882
5	QUST-NLP	0.936	0.894	0.950	0.902	0.890	0.948	0.995	1	0.970	0.930	0.886
6	UniBuc-AE	0.934	0.886	0.920	0.932	0.880	0.962	1	1	0.970	0.910	0.876
7	UWBa	0.927	0.880	0.944	0.916	0.854	0.938	1	1	0.954	0.912	0.872
8	ipezoTU	0.926	0.890	0.944	0.918	0.880	0.930	0.984	0.989	0.940	0.910	0.874
9	kubapok	0.925	0.870	0.942	0.922	0.868	0.942	0.995	0.978	0.944	0.912	0.872
10	fact_check_AI	0.923	0.882	0.944	0.926	0.866	0.942	0.995	0.989	0.940	0.884	0.864
11	YNU-HPCC	0.922	0.852	0.954	0.904	0.874	0.940	0.973	0.989	0.958	0.896	0.878
12	UWOB	0.919	0.880	0.934	0.906	0.854	0.938	0.978	0.968	0.954	0.906	0.872
13	TM_Trek	0.919	0.844	0.938	0.918	0.818	0.948	0.995	1	0.966	0.884	0.878
14	joeblack	0.911	0.816	0.928	0.916	0.804	0.934	0.989	1	0.962	0.880	0.876
15	DKE-Research	0.898	0.820	0.924	0.868	0.834	0.916	0.951	1	0.936	0.874	0.856
16	CAIDAS	0.895	0.834	0.910	0.900	0.834	0.886	0.984	0.968	0.934	0.868	0.836
17	UPC-HLE	0.892	0.794	0.946	0.922	0.834	0.914	0.978	0.989	0.946	0.792	0.800
18	ClaimCatchers	0.878	0.810	0.910	0.842	0.800	0.886	0.973	0.957	0.932	0.874	0.796
19	NCL-AR	0.872	0.834	0.918	0.898	0.798	0.878	0.945	0.946	0.884	0.814	0.800
20	South NLP	0.867	0.796	0.894	0.858	0.796	0.866	0.940	0.978	0.912	0.828	0.806
21	MultiMind	0.808	0.674	0.864	0.800	0.748	0.776	0.923	0.957	0.848	0.746	0.744
22	JU_NLP	0.787	0.654	0.870	0.732	0.646	0.684	0.929	0.957	0.874	0.780	0.742
23	Duluth	0.688	0.452	0.814	0.690	0.558	0.546	0.842	0.849	0.820	0.686	0.626
24	Word2winners	0.549	0.616	0.546	0.532	0.600	0.548	0.437	0.473	0.708	0.522	0.506
25	UMUTeam	0.545	0.414	0.564	0.384	0.470	0.420	0.787	0.688	0.716	0.538	0.464
26	FirstNo1	0.257	0.234	0.236	0.284	0.248	0.266	0.295	0.280	0.220	0.256	0.254
27	FactDebug	0.221	0.362	0.264	0.424	0.236	0.182	0.235	0.065	0.446	0	0
28	TransformerHHU	0.150	0.198	0.108	0.180	0.188	0.222	0.115	0.204	0.066	0.114	0.104

Organizers

Organizers	Affiliation
Qiwei Peng	University of Copenhagen
Michal Gregor	Kempelen Institute of Intelligent Technologies
Ivan Srba	Kempelen Institute of Intelligent Technologies
Simon Ostermann	German Research Center for Artificial Intelligence
Marián Šimko	Kempelen Institute of Intelligent Technologies
Jaroslav Kopčan	Kempelen Institute of Intelligent Technologies
Juraj Podroužek	Kempelen Institute of Intelligent Technologies
Matúš Mesarčík	Kempelen Institute of Intelligent Technologies
Róbert Móro	Kempelen Institute of Intelligent Technologies
Anders Søgaard	University of Copenhagen

If you have any questions, please contact us at semeval@disai.eu.