| 
 |   | 
 PSB 2016 Social Media Mining Shared Task WorkshopIntroduction This workshop is a platform for teams to exercise their best NLP
  techniques applied to Social Media data. Specifically, to detecting and
  extracting mentions of adverse reactions. The workshop complements the Social
  Media Mining for Public Health Monitoring and Surveillance session. Teams or individuals can participate in one or more of the
  proposed tasks, each posing distinct challenges. Problem Background Adverse drug reactions (ADRs), defined
  as accidental injuries resulting from correct medical drug use, present a
  serious and costly health problem contributing to 5.3% of all hospital
  admissions each year [1]. The process of detection, assessment,
  understanding, and prevention of these events is called pharmacovigilance
  [2]. To facilitate pharmacoviglance efforts,
  governments worldwide have diverse surveillance programs. One example, in the
  U.S., is MedWatch [3]; it enables both patients and
  providers to manually submit ADR information. However, these programs are
  chronically underutilized. A systematic review encompassing 12 countries,
  estimated an 85-94% under-reporting rate [3] of ADRs in local, regional, and
  national level reporting systems. To improve detection rates, researchers
  have begun turning to alternative sources of healthcare data, such as social
  media. Recent studies suggest that 26% of adult internet users discussed
  personal health issues online, with 42% of them discussing current conditions
  on social media and 30% reportedly changing their behavior as a result [4,
  5]. Recent studies have focused on automatic classification of ADR assertive
  user posts [6, 7, 8, 9], and the automatic extraction of ADR mentions from
  posts [10, 11, 12, 13]. However, prior to our recent pilot studies [8, 12],
  public availability of data has been scarce, and a direct comparison of the
  approaches was not possible. Therefore, the release of a gold standard and
  the proposed task will foster advances on this topic.  The task is divided into three
  subtasks: (i) automatic classification of Adverse
  Drug Reaction (ADR) assertive user posts, (ii) automatic extraction of ADR
  mentions from user posts, and (iii) normalization ADR mentions into UMLS
  (Unified Medical Language System) concept IDs. The task will take advantage
  of a large expert annotated data from Twitter that has already been made
  publicly available. The task is designed to capitalize on the interest in
  social media mining and appeal to a diverse set of researchers working on
  distinct topics such as natural language processing, biomedical informatics,
  and machine learning. The task presents a number of interesting challenges
  including the noisy nature of the data, the informal language of the user
  posts, misspellings, and data imbalance.  Tasks Task 1: Binary Classification of ADRs The first proposed
  sub-task focuses on automatic classification of ADR assertive user posts.
  This task will utilize the binary annotations in the data. Participants will
  be provided with a training/development set, containing the annotations.
  Evaluation will be performed on a blind set not released prior to the
  evaluation deadline. Systems will be evaluated on their ability to
  automatically classify ADR containing posts. Data The training data
  consists of 7,574 instances (~70% of the original corpus) containing binary
  annotations. The evaluation set consists of 3,284 instances with a similar
  ADR to nonADR ratio as the training set. For each
  tweet, the publicly available data set contains: (i)
  the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating
  the presence or absence of ADRs, as shown below. The evaluation data will
  contain the same information, but without the classes. Participating teams
  should submit their results in the same format as the training set (shown
  below). User ID Tweet ID Class 349294537367236611  149749939 0 354256195432882177  54516759           0 352456944537178112  1267743056       1 Details about the
  download script and the data are available at: task 1 data   Task 2: ADR Extraction This sub-task is a
  Named Entity Recognition (NER) task, and the aim is to automatically extract
  the ADR mentions reported in user posts. This includes identifying the text
  span of the reported ADRs. Participants may use advanced machine learning
  systems to extract the mentions and correctly distinguish ADRs from similar
  non-ADR mentions. Data The data for this
  sub-task includes 2000+ tweets which are fully annotated for mentions of ADR
  and indications (reasons to use the drug). This set contains a subset of the
  tweets from sub-task 1 that were tagged as hasADR
  plus a random set of 800 nonADR tweets. The nonADR subset was annotated for mentions of indications,
  in order to allow participants to develop techniques to deal with this
  confusion class. The annotations are stored in a text file that contains the
  following details for each annotation: tweet ID, start offset, end offset, semantic type (ADR/Indication), UMLS ID,
  annotated text span and the related drug.  Participating teams
  must submit their results on the test set in the same format as the training
  set. The data is available
  at: task 2 data Task 3: Normalization of ADR mentions This is a concept
  normalization task. Given an ADR mention in natural language (colloquial or
  other), participant systems are required to identify the UMLS concept ID for
  the mention.  Data Training data will
  consist of a set of ADR mentions and their corresponding, human-assigned UMLS
  CUIs, as shown below. Submissions should follow an identical format. Schizophrenia c0036341 tension in my nerves
  c0027769 shaking c0040822 Systems will be evaluated based on the closeness of their predictions to the
  gold standard. A system prediction will be considered correct if the
  predicted CUI is identical, is a synonym, or has a is-a relationship to the gold standard
  concept. The data for this task can be found at: task 3 data Evaluations Specific evaluation details for each task will
  be posted here
  soon. Registration To register, send an
  email to Abeed Sarker (abeed.sarker@asu.edu)
  with the following information: ·        
  Name of your team; ·        
  Names of team members
  and their affiliations. We will send you a
  confirmation message once the registration is completed. Timeline May 15,
  2015: release of training data  August 15,
  2015: release of evaluation data  August 20,
  2015: deadline for submissions  September
  1, 2015: release of results and ranks  October 1,
  2015: system descriptions due Task Organizers Dr. Graciela Gonzalez (ggonzal@asu.edu), Arizona State University Dr. Abeed Sarker (abeed.sarker@asu.edu),
  Arizona State University Azadeh Nikfarjam
  (anikfarj@asu.edu), Arizona State
  University  Queries to:  Please upload
  your file using the following link: Name your files as: TeamName_AssignedTeamNumber_TaskNumber Example: DiegoLab_21_1 References [1] C. Kongkaew, P. R. Noyce, and D. M. Ashcroft, Hospital
  admissions associated with adverse drug reactions: a systematic review of
  prospective observational studies, Ann. Pharmacother.,
  vol. 42, no. 7, pp. 1017:1025, 2008.  [2] World Health
  Organization. The importance of pharmacovigilance. World Health Organization,
  2002.  [3] Office of the
  Commissioner, MedWatch: The FDA Safety Information
  and Adverse Event Reporting Program. [Online]. Available:
  http://www.fda.gov/Safety/MedWatch/default.htm. [Accessed: 28-Sep-2014].  [4] J. Parker, Y. Wei,
  A. Yates, O. Frieder, and N. Goharian,
  A framework for detecting public health trends with Twitter, in Proceedings
  of the 2013 IEEE/ACM International Conference on Advances in Social Networks
  Analysis and Mining, 2013, pp. 556:563.  [5] Twenty six percent
  of online adults discuss health information online; privacy cited as the
  biggest barrier to entry | Business Wire. [Online]. Available:
  http://www.businesswire.com/news/home/20121120005872/en/Twenty-percent-online-adultsdiscuss-healthinformation#.UvQ4M4WmWGQ.
  [Accessed: 07-Feb-2014].  [6] K. Jiang, Y. Zheng,
  Mining Twitter Data for Potential Drug Effects, Advanced Data Mining and
  Applications 8346 (2013) 434:443.  [7] J. Bian, U. Topaloglu, F. Yu.
  Towards largescale twitter mining for drug-related
  adverse events, in: Proceedings of the 2012 international workshop on Smart
  health and wellbeing, 2012, pp. 25:32.  [8] R. Ginn, P. Pimpalkhute, A. Nikfarjam, A. Patki, K.
  O'Connor, A. Sarker, K. Smith, G. Gonzalez, Mining
  Twitter for Adverse Drug Reaction Mentions: A Corpus and Classification
  Benchmark, in: Proceedings of the Fourth Workshop on Building and Evaluating
  Resources for Health and Biomedical Text Processing, 2014.  [9] A. Patki, A. Sarker, P. Pimpalkhute, A. Nikfarjam, R. Ginn, K. O'Connor, K. Smith, G.
  Gonzalez, Mining Adverse Drug Reaction Signals from Social Media: Going
  Beyond Extraction, in: Proceedings of BioLinkSig
  2014, 2014.  [10] R. Leaman, L. Wojtulewicz, R.
  Sullivan, A. Skariah, J. Yang, G. Gonzalez, Towards
  Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User
  Posts to HealthRelated Social Networks, in:
  Proceedings of the 2010 Workshop on Biomedical Natural Language Processing,
  2010, pp. 117:125.  [11] A. Nikfarjam, G. Gonzalez, Pattern Mining for Extraction of
  Mentions of Adverse Drug Reactions from User Comments, in: Proceedings of the
  American Medical Informatics Association (AMIA) Annual Symposium, 2011, pp.
  1019:1026.  [12] K. O'Connor, A. Nikfarjam, R. Ginn, P. Pimpalkhute, A. Sarker, K.
  Smith, and G. Gonzalez, Pharmacovigilance on Twitter? Mining Tweets for
  Adverse Drug Reactions, in American Medical Informatics Association (AMIA)
  Annual Symposium, 2014. [13] A. Yates, N. Goharian, ADRTrace:
  detecting expected and unexpecfted adverse drug
  reactions from user reviews on social media sites, in: Proceedings of the
  35th European conference on Advances in Information Retrieval, 2013, pp.
  816:819. DIEGO LAB 2015. Email:
  Competition Organisers.
   |