Text Analytics Toolkit
JULIELab NLP Toolsuite
Most of the Text Analytics Toolkit components can be freely downloaded from the FSU Jena website (
http://www.julielab.de/Resources/Software/Tools.html) and are licensed under the terms of the Common Public License. The components are available as PEAR packages which is the UIMA standard for packaging and automatically deploying components. Such PEAR packages contain compiled classes, the source code, and example models, if necessary.
Coreference Resolver
Coreference resolver identifies different mentions of an entity, e.g., "The Thy-1 gene promoter", "it" and "the promoter" in "The Thy-1 gene promoter resembles a " housekeeping " promoter in that it is located within a methylation-free island...Using transgenic mice , we show that this promoter does not confer any tissue specificity and...". It is a important component in a text-mining system. This demo is developed using a large scale coreference corpus,
MedCo, which consists of 1,999 medline abstracts in the GENIA data set. Our coreference resolution system achieved a high precision of 85.2% with a reasonable recall of 65.3%, obtaining an F-measure of 73.9%.
http://nlp.i2r.a-star.edu.sg/demo_biocoref.html)
Gene Regulation Text Retriever
This text retriever is designed to retrieve Ecoli Gene Regulation (GR) information. Incorporating named entity recognition(NER) in the retrieval process, it performs especially much better than traditional document retrieval engine on 5 entity types of queries with Transcription Regulator, Gene, RNA, Proteinand Cell Components, eg "What [TRANSCRIPTION REGULATORS] are involved in the transcription controlled by RNA polymerase sigma S factor (
RpoS) upon entry into stationary phase?". Sixty sample queries are provided on 4 core GR categories, and 9 important GR events in E. Coli functional systems. The system also helps the users on accessing the specific retrieved abstracts with the relevant entities of interests directly instead of browsing hundreds of retrieved documents.
NeMine
This REST web service for gene/protein name recognition has been developed within the BOOTStrep project.
http://text0.mib.man.ac.uk/~sasaki/bootstrep/nemine.html &
http://www.ebi.ac.uk/tc-test/textmining/medevi/
Yutaka Sasaki, Yoshimasa Tsuruoka, John
McNaught, and Sophia Ananiadou, How to make the most of NE dictionaries in statistical NER, BMC Bioinformatics, 9(Suppl 11):S5, 2008.
Norm
This REST web service for gene/protein name normalization has been developed within the BOOTStrep project.
http://text0.mib.man.ac.uk/~sasaki/bootstrep/norm.html
Yoshimasa Tsuruoka, John
McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability, BMC Bioinformatics 2008, 9(Suppl 3):S2.
MLdic
This REST web service retrieves protein names similar to a given protein name. This has been developed within the BOOTStrep project.
http://text0.mib.man.ac.uk/~sasaki/bootstrep/mldic.html
Yoshimasa Tsuruoka, John
McNaught, Jun'ichi Tsujii, and Sophia Ananiadou. 2007. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression, Bioinformatics, Vol. 23, No. 20, pp. 2768-2774.
SeMine
This REST web service automatically annotates semantic role labels relevant to gene regulation events, which has been developed within the BOOTStrep project.
http://text0.mib.man.ac.uk/~sasaki/bootstrep/semine.html
GREMine
This web service automatically extracts gene regulation events from biological literature. This has been developed within the BOOTStrep project.
http://text0.mib.man.ac.uk/~sasaki/bootstrep/gremine.html
MedEvi
MedEvi has been developed as part of the BOOTStrep project. The search engine identifies sentences in Medline abstracts that contain the query terms. All sentences are sorted, prioritized and aligned according to the query terms.
MedEvi effectively deals with multi-term queries by imposing positional restriction on query occurrences, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other.
http://www.ebi.ac.uk/Rebholz-srv/MedEvi/
Kim, J.J., Pezik, P., and Rebholz-Schuhmann, D. (2008).
MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 24(11):1410-1412, 2008.
MeshUp
This solution is a prototype to deliver phenotypic annotations to documents. The user submits the document and receives the appropriate
MeSH terms to the documents.
MeSH annotations can be exploited to predict the disease relevance of a gene, protein or even a gene regulatory event.
http://wwwdev.ebi.ac.uk/tc-test/textmining/MeshUp/
Trieschnigg, D., P. Pezik, V. Lee, F. de Jong, W. Kraaij, D. Rebholz-Schuhmann. "
MeSH Up: effective
MeSH text classification for improved document retrieval." Bioinformatics. (2009):1412-8. Epub 2009 Apr 17.
PaperMaker (makes use of the BioLexicon)
This prototype analyses a scientific document and delivers feedback on the content.
PaperMaker compares the use of terminology in the document against the publicly available bioinformatics reference data resources and gives the user feedback on the use of terms, acronyms, their definitions, the selection of preferred terms over synonyms and the annotations of the documents with additional information (GO or
MeSH terms).
PaperMaker incorporates solutions that make use of resources from the
BootStrep project, for example the
BioLexicon is used in different parts of the program.
http://www.ebi.ac.uk/Rebholz-svr/PaperMaker