Behrouz Bokharaeian

Package icon NegDrugDDI.zip1.04 MB
Package icon NegDrugBank2013.zip1.15 MB
Package icon NegDDI_DrugBank2013.zip1.15 MB
Package icon NegDDI_MedLine.zip213.22 KB

Behrouz Bokharaeian is PhD candidate in the Department of Software Engineering and Artificial Intelligence in the Computer Science Faculty of the Universidad Complutense de Madrid. He received his bachelor in software engineering from Sharif University of technology, Tehran, Iran. And he has obtained two master degrees in software engineering and biomedical informatics from polytechnic of Tehran, Iran.
Behrouz’s main research interests lie with machine learning, natural language processing and bioinformatics.

His latest CV in pdf format can be downloaded from here.

Extracting Drug-Drug interactions using negation

A drug-drug interaction (DDI) occurs when one drug affects the level or activity of another drug, this may happen, for instance, in the case of drug concentrations. This interaction can result on decreasing its effectiveness or possibly altering its side effects that may even the cause of health problems to patients.

There is a great amount of DDI databases and this is why health care experts have difficulties to be kept up-to-date of everything published on drug-drug interactions. This fact means that the development of tools for automatically extracting DDIs from biomedical resources is essential for improving and updating the drug knowledge and databases.

The DrugDDI corpus was developed for the Workshop on Drug-Drug Interaction Extraction that took place in 2011 in Huelva, Spain. The DrugDDI corpus contains 579 documents extracted from the DrugBank database.

To study the effectiveness of negation for this task the annoation of the DrugDDI corpus has been extended with the scope of negation. This corpus can be downloaded from the attachment.


• Bokharaeian, B.; Diaz, A; Chitsaz, H.R., ‘Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency’, PLOS One 11(10): e0163480, 2016.

• Bokharaeian, B.; Diaz, A, ‘Extraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency‘,Journal of AI and Data Mining, Volume 4, Issue 2, Summer and Autumn 2016, Page 203-212.

• Bokharaeian, B. ; Diaz, A,‘Automatic extraction of SNP-trait associations from text through detecting linguistic-based negation’, 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), IEEE, Zahedan, Iran, 2015.

• Bokharaeian, B. ; Diaz, A,‘Automatic extraction of drug-drug interaction from literature through detecting clause dependency and linguistic-based negation’, First Signal Processing and Intelligent Systems Conference (SPIS), Tehran, Iran, 2015 ,pages 25-30.

• Bokharaeian, B.; Diaz, A; Neves,M; Francisco, V, ‘Exploring Negation Annotations in the DrugDDI Corpus’, Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, May 2014.

• A.Kadir, R; Bokharaeian, B, ‘Overview of Biomedical Relations Extraction
using Hybrid Rule-based Approaches’, Proceedings of the 2013 4th International Conference on Future Information Technology (ICFIT 2013), Melaka, Malazia,October 2013.

• Bokharaeian, B.; Diaz, A; ‘NIL_UCM: Extracting Drug-Drug interactions from text through combination of sequence and tree kernels’, Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), June 2013.

• Bokharaeian, B.; Diaz, A; Ballesteros ,M. ‘Extracting Drug-Drug interaction from text using negation features’ , 29th conference of the Spanish society for natural language processing, September 2013.
• Safa, S.; Bokharaeian, B., ‘Brain MR Segmentation through Fuzzy Expectation Maximization and Histogram Based K-Means’, Proceedings of 4th IEEE conference on computer science and information technology ICCTSIT 2011,Vol 3, Chengdu, China, June 2011 pp.603-608.

• Safa, S., Bokharaeian, B., ‘New methods in Brain MR Segmentation with Fuzzy EM Algorithm’, International Journal of Information and Education Technology, Vol. 1, No. 4, October 2011,pp.280-285.

• Bokharaeian B., Shafiee H. , Alai H. ,” Enhancing Agents’ Decision Making Process in Multi agent Models of stock Market Using Fuzzy Open CLA”, Proceedings of IEEE 2011 International joint Conference on Economics Business and Marketing Management and Financial Theory and Engineering (EBMM 2011), Shanghai china, March 2011, pp: 607-611.

• Shafiee, H.M., Bokharaeian, B. and Alaei, H. (2011), ‘How to calibrate the features of a distributed artificial financial market’, Journal of International Business and Entrepreneurship Development, Vol. 5, No. 4, pp.315–338.

• Bokharaeian, B. ,Shafiee, H. , “Extending the Capability of Neural Network Based Models of Stock Market Using Open Cellular Learning Automata”, Thirds Annual GSM-FEP conference on economic and management (UPM) , Kuala Lampur , December 2010.

• Bokharaeian, B., Towhidkhah F.,’Effectiveness of computer-based working memory training for children with mild mental retardation’ ,journal of Computers in Biology and Medicine,submitted .

• Sharifinejad, M., Bokharaeian, B., 'Text Mining for Automated GIS HealthCare Data Collection:A case study in national information management system for science and
Technology in health sector', First International conference on geographic information systems,Tehran Iran,Nov 2009 .

• Bokharaeian, B., Janaseir, M., 'Therapeutic capabilities of video games in improving the cognitive skills of children', Second Iranian conference on electronic kid, Tehran Iran, Dec 2009.

• Bokharaeian, B. , mazhari, M. ,zehtab ,H. , 'Tailoring of RUP methodology based on chaos theory for health care organization a case study', Third congress on health management ,November 2009 .

• Bokharaeian, B., Zehtab, H., Sharifinejad, M.; 'Applications of Text mining in medical e-learning: A case study on Master of medical informatics ', Third Conference on E-learning in biomedical education, February 2010.

• Jahanseir, M., Bokharaeian B., 'General platforms of e-lab: a case study on master of medical informatics’, Third conference on e-learning in biomedical education, Tehran , Feb 2010 .

SNPPhenA: A corpus for extracting ranked associations of SNP and phenotypes from literature

Package icon SNPPhenA.zip585.38 KB

SNPPhenA corpus

The SNPPhenA corpus

The SNPPhenA corpus consists of medical and biological texts annotated for snp-phenotype associations, negation, modality markers and degree of confidence of associations. This was done to allow a comparison between the development of systems for association extraction as well as the degree of confidence and strength of associations The corpus is publicly available for research purposes.

The annotation guidelines: pdf 
Annotation principles are also discussed in the following paper:

Corpus download

Information provided in the search engine was used to collect genome-wide association abstracts. GoPubMed is a webserver that allows users to explore PubMed search results with Gene Ontology . Here is DTD for the xml files containing the annotations: DTD

Abstracts of the SNPPhenA corpus: xml v1.0

The full corpus  in XML and BRAT formats is available in one file: zip

An online association extraction system that utilizes the SNPPhenA  corpus is available  here.

Inter-agreement analysis

In order to evaluate the quality of the corpus and the reliability of the annotations, inter-annotator agreement score was measured for the task of classifying candidate sentences into positive, negative and neutral classes, and also for task of determining the confidence level of the association. two annotators independently have tagged the corpus. In the case of disagreement between two tags, ‎a third annotator was asked to decide about the correct one. For the task of classifying types of association, inter-annotator agreement was 86%, which means that in 86% of cases, the two annotators have agreed. Additionally, ‎we computed Cohen's Kappa coefficient, ‎for two annotators‎, ‎which takes into account the amount of agreement that could be expected to occur through chance‎. For our two annotators and the type of association task ‎, ‎the Kappa value was 0.79‎. For the task of annotating confidence level of the association, the Kappa value was 0.80.

The results show that annotating confidence level of association is a more difficult task than simply classifying candidate sentences to positive, negative and neutral classes.

Corpus statistics

In the table below, some detailed statistics of the linguistic and nonlinguistic properties of the corpus, in terms of test and training parts, are presented.

Item Train Test Total
Abstracts 270 90 360
Key Sentences 362 121 483
SNP 691 244 935
Phenotypes 496 158 654
SNP-Phenotype association candidates  935 365 1300
Neutral Candidates 142 166 308
Positive Candidates 702 170 872
Negative Candidates 91 29 120
Strong degree of confidence candidates 213 20 233
Medium degree of confidence candidates 92 32 124
Weak degree of cofindence candidates 390 125 515