Description
Given an implicit reference to a scientific paper, i.e., a social media post (tweet) that mentions a research publication without a URL, this method enables to retrieve the mentioned paper from a pool of candidate papers. It was initially developed to leverage CORD19, a corpus of academic papers about COVID-19 and related coronavirus research, however, it can be used with any corpus of publications with enough metadata.
The method takes an input claim or sentence from the user, computes its similarity with the publication titles and abstracts in the corpus, and returns a ranked list of matching publications. The similarity between the input claim and the publications is calculated using BM25.
Use Cases
- To find which publication is possible mentioned in a claim/statement.
- To find topically similar publications to a claim/statement.
Input Data
The input data consists of social media posts having the following fields:
- post_id : unique post ID in the collection
- tweet_text : text of the post (tweet)
Example Input:
- post_id: 12345678901
- tweet_text: Published in the journal Antiviral Research, the study from Monash University showed that a single dose of Ivermectin could stop the coronavirus growing in cell culture, effectively eradicating all genetic material of the virus within two days.
Output Data
This output aims to show an example publication matching for the given input.
post_id : unique post ID in the collection
tweet_text : text of the post (tweet)
cord_uid: identifier of the matching publication
bm25_topk: top-k matching publications based on BM25 similarity score
in_topx: Float value indicating the rank of the matching publication in the top-k list
bm25_topk: [‘htlvpvz5’, ‘h7hj64q5’, ‘rwgqkow3’, ‘dbgtslc8’, ‘am11yqbf’]
in_topx: 1.0 Example Output:
post_id: 12345678901
tweet_text: Published in the journal Antiviral Research, the study from Monash University showed that a single dose of Ivermectin could stop the coronavirus growing in cell culture, effectively eradicating all genetic material of the virus within two days.
cord_uid: htlvpvz5 (Effectiveness of Covid-19 Vaccines against the B.1.617.2 (Delta) Variant)
bm25_topk: [‘htlvpvz5’, ‘h7hj64q5’, ‘rwgqkow3’, ‘dbgtslc8’, ‘am11yqbf’]
in_topx: 1.0
Hardware Requirements
The method runs on a small virtual machine provided by cloud computing company (2 x86 CPU core, 4 GB RAM, 40GB HDD).
Environment Setup
The method is implemented in Python and requires the following libraries which can be installed via pip
:
pip install -r requirements.txt
How to Use
Please follow the instructions in the notebook.
Technical Details
Published in the journal Antiviral Research, the study from Monash University showed that a single dose of Ivermectin could stop the coronavirus growing in cell culture – effectively eradicating all genetic material of the virus within two days.
Peer-reviewed in the New England Journal of Medicine regarding Delta (B.1.617.2):
- Pfizer is ~90% effective
- AstraZeneca is ~70% effective.
This falls in line with vaccine efficacy of other variants. Yes, the vaccines ARE indeed effective against Delta.
Contact Details
For questions or feedback, contact Yavuz Selim Kartal via YavuzSelim.Kartal@gesis.org.