Literature and learning from past

Scientific knowledge has been accumulated for hundreds or even thousands of years. With 'accumulation', it means scientific discoveries have been recorded and published, mostly through scientific literature. By accessing the accumulated knowledge, development of new knowledge could be much more efficient.

As the progress of technology, however, the velocity of knowledge extension is rapidly increasing, and it is becoming almost impossible for human researchers to comprehend literature even in their own expert areas.

Structured databases

As accumulation of human knowledge increases, instant access to necessary pieces of knowledge becomes more and more important. Many and various structured databases have been developed, to make instant access to knowledge pieces efficient. Particularly, life science is an area with rich public databases, e.g. Entrez Gene, UniProt, PubChem.

To fully benefit from the various databases, however, it is desired that relevant entities across multiple databases are to be interlinked to each other.

Linked Data

Linked Data (LD) is emerging as a new way of data publication. LD enables relevant data pieces across multiple databases to be linked to each other through a standard protocol. It may be said that while the amount of databased data pieces increased during the development of structured databases (mostly relational databases), the linkage between the data pieces is being significantly improved thanks to the technology of LD and Semantic Web.

Compared to knowledge represented in scientific literature, however, the pieces of knowledge in structured databases or linked data often miss their contexts, e.g., experimental environments.

Linked Annotation

Often, literature is the only place where the contexts of an individual data piece can be found, and this is why database curators want to record references to relevant pieces of literature, e.g., PMID for individual data entries. Still, many databases miss such references, or have largely incomplete references. Finding the contexts of database entries from literature, and linking them to each other is thus an important task, raising the utility of databases. From a perspective of literature, it is called annotation (plus normalization or grounding), and it improves the accessibility to the content of literature, and increases the chance for mining across literature and structured databases.

Google Map vs Linked Annotation

It is conceptually similar to Google Map which links various entities and structures to 2-dimensional unstructured data (map).

Linked Annotation is to link various entities and their structures to 1-dimensional unstructured data (text).


We recognize Google map is one of the most successful public-sourcing annotation systems: users can easily create annotations (geographical annotations), and share them with anyone else.

PubAnnotation ( is an annotation repository which is developed to implement a Google map-like system for public-sourcing and publishing of annotations to literature.


BLAH is organized to develop linked literature annotation as a community effort. The BioNLP community has made substantial progress for the last decades to produce various annotations to the biomedical literature. Now it is time to put more effort to improve accessibility to the invaluable resources. Through the linked annotation effort, we believe accessibility and also productivity of annotation may be significantly improved.