Standalone Neural Ranking Model (SNRM)

SNRM is a framework based on neural networks for end to end document retrieval. SNRM learns a sparse representation for both queries and documents and builds an inverted index based on the learned representations. At the query time, it retrieves documents directly from the whole collection using the learned inverted index. An open-source implementation of SNRM is available here.

  1. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, and Jaap Kamps In Proceedings of the 27th ACM International on Conference on Information and Knowledge Management, 2018 (CIKM ’18) [Preprint] [Code]

ISTAS: An In Situ Dataset for Target Apps Selection

This dataset provides an in situ dataset for target apps selection as part of a unified mobile search system (also see the UniMobile dataset below). In contrast to the UniMobile dataset, ISTAS contains more realistic queries with associated contextual information captured from the mobile sensors and logs of background processes. ISTAS includes over 5000 queries. This dataset is a result of a joint effort by researchers from the Università della Svizzera italiana (USI), Lugano, Switzerland and the University of Massachusetts, Amherst, MA, USA. To download the UniMobile dataset, please visit here. Citation:

  1. In Situ and Context-Aware Target Apps Selection for Unified Mobile Search Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft In Proceedings of the 27th ACM International on Conference on Information and Knowledge Management, 2018 (CIKM ’18) [Preprint]

UniMobile: A Collection of Cross-App Mobile Search Queries

As the first step towards developing a unified search framework for mobile devices, the task of Target Apps Selection has been defined. To train and evaluate models for this task, a dataset with over 5000 queries has been built using crowdsourcing. This dataset is a result of a joint effort by researchers from the Università della Svizzera italiana (USI), Lugano, Switzerland and the University of Massachusetts, Amherst, MA, USA. To download the UniMobile dataset, please visit here. Citation:

  1. Target Apps Selection: Towards a Unified Search Framework for Mobile Devices Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018 (SIGIR ’18) [Preprint] [Data]

Citation Worthiness Dataset

Does this sentence need citation? To train and evaluate models for addressing this question, we construct a citation worthiness dataset using the articles of ACL Anthology Reference Corpus (ARC). We use the SEPIC corpus, which includes sentence-level segmentation of 10,921 articles from ACL ARC 1.0, up to February 2007. The sentence splitter and chunker of the Apache OpenNLP 1.5 3 in addition to the Stanford tokenizer and POS tagger, and the MaltParser tools were used. More information is provided in the following paper, and the data can be downloaded from here.

  1. Citation Worthiness of Sentences in Scientific Reports Hamed Bonab, Hamed Zamani, Erik Learned-Miller, and James Allan In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018 (SIGIR ’18) [Preprint] [Data]

Million Playlist Dataset

The ACM Recommender Systems Challenge 2018 focuses on a novel task in the field of recommender systems and information retrieval: Automatic Playlist Continuation. RecSys Challenge 2018 is organized by Spotify, University of Massachusetts Amherst, and Johannes Kepler University Linz. For this challenge, Spotify has released a dataset containing one million playlists generated by Spotify users. Please visit http://www.recsyschallenge.com/2018/ for more information. Citation:

  1. RecSys Challenge 2018: Automatic Music Playlist Continuation Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani In Proceedings of the 12th ACM Conference on Recommender Systems, 2018 (RecSys ’18)

Tweet Rating Dataset

This dataset contains tweets of users about the items of four popular and diverse web applications: IMDb (movie), YouTube (video clip), Pandora (music), and Goodreads (book). This dataset contains ~500K tweets from ~20K users about ~230K items (movie, music, etc.). This dataset is freely available for research purposes. Tweet Rating Dataset can be downloaded from here. Citation:

  1. Adaptive User Engagement Evaluation via Multi-task Learning Hamed Zamani, Pooya Moradi, and Azadeh Shakery In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015 (SIGIR ’15)

PAL: Preference/Ranking Aggregation Library

This Ruby library contains a few simple rank aggregation methods that we used in ACM RecSysChallenge 2014. The package is open-sourced and can be found here. Citation:

  1. Regression and Learning to Rank Aggregation for User Engagement Evaluation Hamed Zamani, Azadeh Shakery, and Pooya Moradi In Proceedings of the 2014 Recommender Systems Challenge, 2014 (RecSysChallenge ’14)

Wikipedia English-Persian Parallel Corpus

This parallel corpus is automatically extracted from English and Persian Wikipedia articles. We extensively evaluate our created parallel corpus to show its high quality compared to the existing English-Persian parallel corpora. This dataset is freely available for research purposes. To download the parallel corpus, please visit here. Citation:

  1. Sentence Alignment Using Local and Global Information Hamed Zamani, Heshaam Faili, and Azadeh Shakery Computer Speech & Language, 2016 (CSL)