Data

Touché24-Image-Retrieval-and-Generation-for-Arguments
About 9,000 images crawled for controversial topics. doi events publications
Touché24-ValueEval
About 3000 texts in 9 languages annotated for 19 human values (Schwartz' system) and value attainment as part of the ValuesML project. doi events publications
Webis-Follow-Up-Questions-24
About 19,000 simulated continuations of conversational search sessions with 3883 human judgments. doi publications

Touché23-Image-Retrieval-for-Arguments
About 56,000 images crawled for controversial topics. doi events publications
Webis-Topic-Ontologies
About 9,000,000 argument units from 45 corpora with topics labels. doi publications
Touché23-ValueEval
About 8900 arguments annotated for 54 human value aspects (based on Schwartz values), extending the Webis-ArgValues-22. doi events publications
Webis-Generated-Game-Art-23
Report and 110 images generated by an expert when designing a game. doi publications
Webis-Nudged-Questions-23
About 8600 questions in response to 30 spoken information snippets in 18 variants with different nudging. doi publications

Webis-ArgValues-22
About 5300 arguments annotated for 54 human value aspects (based on Schwartz values). doi publications
Webis-Web-Archive-Quality-22
Reproduced web pages from the Webis-Web-Archive-17. doi publications
Touché22-Image-Retrieval-for-Arguments
About 24,000 images crawled for controversial topics. doi events publications

SCAI-QReCC-21
About 14,000 conversational search sessions. doi events publications
Webis-Exhibition-Questions-21
About 850 English and German questions on a virtual exhibition from 63 participants. doi publications
Webis-ArgImages-21
About 1000 images judged for relevance to 20 controversial topics. doi publications
Webis-Conversational-Query-Reformulations-21
About 2700 conversational queries and reformulation from 4 search domains. doi publications
Webis-SCSmeta-21
Meta-information annotations for the 1044 dialog turns in the Spoken Conversational Search dataset. doi publications
Webis-WebSeg-20-Algorithm-Segmentations
About 250,000 algorithmic segmentations for the web pages in the Webis-WebSeg-20 (itself based on the Webis-Web-Archive-17). doi publications

Webis-EditorialSum-20
About 1300 curated extractive summaries for 266 editorials from 3 news portals. doi publications
TweetsCOV19
About 41,000,000 tweets from October 2019 to August 2022 on COVID-19. publications
Webis-WebSeg-20
About 42,000 human segmentations of the web pages in the Webis-Web-Archive-17. doi publications
Webis-Voice-based-and-Conversational-Argument-Search-20
Responses from about 500 participants on their expectations for conversational argument search. doi publications

args.me corpus
About 390,000 arguments from online debate portals. doi events publications
Webis-Web-Errors-19
Annotations for the 10,000 web pages of the Webis-Web-Archive-17: mostly advertisement, cut off, still loading, pornographic, pop-ups, CAPTCHAs, error messages. doi publications

PAN-SemEval-Hyperpartisan-News-Detection-19
About 750,000 news articles, half of them from hyperpartisan news publishers, and 645 articles with manual hyperpartisanship annotations. doi events publications
BuzzFeed-Webis Fake News Corpus 16
About 1600 news articles of 9 publishers from September 2016 fact-checked and categorized by media bias. doi publications

Webis-Web-Archive-17
About 10,000 web page archives from mid-2017 including screenshots and annotations of web archive quality. doi publications
Webis-Mnemonics-17
About 1000 human-selected sentences for password generation and memorization. doi publications
Webis-Simple-Sentences-17
About 470,000,000 sentences from the ClueWeb12 crawl, randomly sampled to mirror the sentence complexity of password mnemonics. doi publications

An Open Testbed for Author Name Disambiguation Evaluation
About 32,000 publications from about 3,000 different authors with 1,000 different names. doi publications
Webis-Editorials-16
About 14,000 argumentative units from 300 editorials annotated for argumentative role. doi publications