Start of topic | Skip to actions

Text Analytics Toolkit

JULIELab NLP Toolsuite

Most of the Text Analytics Toolkit components can be freely downloaded from the FSU Jena website (http://www.julielab.de/Resources/Software/Tools.html) and are licensed under the terms of the Common Public License. The components are available as PEAR packages which is the UIMA standard for packaging and automatically deploying components. Such PEAR packages contain compiled classes, the source code, and example models, if necessary.

Coreference Resolver

Coreference resolver identifies different mentions of an entity, e.g., "The Thy-1 gene promoter", "it" and "the promoter" in "The Thy-1 gene promoter resembles a " housekeeping " promoter in that it is located within a methylation-free island...Using transgenic mice , we show that this promoter does not confer any tissue specificity and...". It is a important component in a text-mining system. This demo is developed using a large scale coreference corpus, MedCo, which consists of 1,999 medline abstracts in the GENIA data set. Our coreference resolution system achieved a high precision of 85.2% with a reasonable recall of 65.3%, obtaining an F-measure of 73.9%. http://nlp.i2r.a-star.edu.sg/demo_biocoref.html)

Gene Regulation Text Retriever

This text retriever is designed to retrieve Ecoli Gene Regulation (GR) information. Incorporating named entity recognition(NER) in the retrieval process, it performs especially much better than traditional document retrieval engine on 5 entity types of queries with Transcription Regulator, Gene, RNA, Proteinand Cell Components, eg "What [TRANSCRIPTION REGULATORS] are involved in the transcription controlled by RNA polymerase sigma S factor (RpoS) upon entry into stationary phase?". Sixty sample queries are provided on 4 core GR categories, and 9 important GR events in E. Coli functional systems. The system also helps the users on accessing the specific retrieved abstracts with the relevant entities of interests directly instead of browsing hundreds of retrieved documents.

NeMine

This REST web service for gene/protein name recognition has been developed within the BOOTStrep project. http://text0.mib.man.ac.uk/~sasaki/bootstrep/nemine.html & http://www.ebi.ac.uk/tc-test/textmining/medevi/

Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou, How to make the most of NE dictionaries in statistical NER, BMC Bioinformatics, 9(Suppl 11):S5, 2008.

Norm

This REST web service for gene/protein name normalization has been developed within the BOOTStrep project. http://text0.mib.man.ac.uk/~sasaki/bootstrep/norm.html

Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability, BMC Bioinformatics 2008, 9(Suppl 3):S2.

MLdic

This REST web service retrieves protein names similar to a given protein name. This has been developed within the BOOTStrep project. http://text0.mib.man.ac.uk/~sasaki/bootstrep/mldic.html

Yoshimasa Tsuruoka, John McNaught, Jun'ichi Tsujii, and Sophia Ananiadou. 2007. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression, Bioinformatics, Vol. 23, No. 20, pp. 2768-2774.

SeMine

This REST web service automatically annotates semantic role labels relevant to gene regulation events, which has been developed within the BOOTStrep project. http://text0.mib.man.ac.uk/~sasaki/bootstrep/semine.html

GREMine

This web service automatically extracts gene regulation events from biological literature. This has been developed within the BOOTStrep project. http://text0.mib.man.ac.uk/~sasaki/bootstrep/gremine.html

MedEvi

MedEvi has been developed as part of the BOOTStrep project. The search engine identifies sentences in Medline abstracts that contain the query terms. All sentences are sorted, prioritized and aligned according to the query terms. MedEvi effectively deals with multi-term queries by imposing positional restriction on query occurrences, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other. http://www.ebi.ac.uk/Rebholz-srv/MedEvi/

Kim, J.J., Pezik, P., and Rebholz-Schuhmann, D. (2008). MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 24(11):1410-1412, 2008.

MeshUp

This solution is a prototype to deliver phenotypic annotations to documents. The user submits the document and receives the appropriate MeSH terms to the documents. MeSH annotations can be exploited to predict the disease relevance of a gene, protein or even a gene regulatory event. http://wwwdev.ebi.ac.uk/tc-test/textmining/MeshUp/

Trieschnigg, D., P. Pezik, V. Lee, F. de Jong, W. Kraaij, D. Rebholz-Schuhmann. "MeSH Up: effective MeSH text classification for improved document retrieval." Bioinformatics. (2009):1412-8. Epub 2009 Apr 17.

PaperMaker (makes use of the BioLexicon)

This prototype analyses a scientific document and delivers feedback on the content. PaperMaker compares the use of terminology in the document against the publicly available bioinformatics reference data resources and gives the user feedback on the use of terms, acronyms, their definitions, the selection of preferred terms over synonyms and the annotations of the documents with additional information (GO or MeSH terms). PaperMaker incorporates solutions that make use of resources from the BootStrep project, for example the BioLexicon is used in different parts of the program. http://www.ebi.ac.uk/Rebholz-svr/PaperMaker