Syllabus and materials of “How to publish in a format that enhances literature-based discovery?” course
Date: 31/07 - 04/08/2017
Place: University of California, San Diego
Info: Registration and Housing
FORCE11, a leading organization in scholarly communication, is organizing summer Institute this year (https://www.force11.org/fsci). It will take place at The University of California, San Diego, from July 30 through August 4. There will be practical courses on the newest methodologies in academic communication, open access, semantic technologies, data management, alternative metrics, library management, peer-review practices, and dozens of other topics. Anyone who is related to communication of scientific results is invited to join this amazing event!
My name is Roman Gurinovich, and I’m architect at sci.AI. I will be presenting the course, “How to publish in a format that enhances literature-based discovery?”. You’ll learn there how to transform biomedical papers into machine-readable format. We’ll investigate how millions of such papers can be used further to make literature-based discoveries.
The key idea of the course: using semantic technology, journals will become the driving force of data-intensive, global-scale research.
Below you can find the course description, syllabus, lecture notes (to be added later), and additional materials.
Intermediate. Participants should be aware of the scientific editorial process and the concept of the Semantic Web.
Members of editorial and innovation teams working for biomedical publishers, text-mining specialists, experts involved in annotating research results in biomedicine, librarians, and other researchers affected by semantic technology.
A published paper can have a much wider influence if it is prepared in a machine-readable format. The objective of this course is to give participants the practical expertise they need to enhance biomedical papers with a semantic layer, including detailed tagging of specific terms, such as chemical elements and proteins.
The course will explore how publishers of the future will enable literature-based discovery (LBD) with the help of the sci.AI system. Participants will learn to use the structured format within the publishing process and to link to the global knowledge network to enable enhanced levels of discovery.
The course will cover the following practices:
- Automatically adding a semantic layer to individual publications.
- Validation of the semantic layer by authors and submission to a journal.
- Improving the peer-review process through semantic preprints.
- Generating publications based on the Resource Description Framework (RDF) and incorporating them into the editorial process.
- Visualizing hints and interactions in a paper.
- Increasing the visibility of a research paper and linking it to the global knowledge graph.
This course will cover both Open Access (OA) papers and publications behind a paywall.
Course Learning Objectives:
Participants of this course will learn how to:
- Semanticize a research paper with organisms, proteins, drugs, chemical elements, etc. metadata.
- Label biomedical objects automatically via the sci.AI Engine.
- Validate machine-performed labeling via sci.AI UI; can be done by authors during paper submission or by editorial team members for already published papers.
- Export semanticized paper in JATS, RDFa (this format is under consideration); this creates a standalone, machine-readable description of the particular research.
- Organize the peer-review process in sci.AI preprint. (under consideration)
- Publish the paper in enhanced HTML format; this includes detailed biomedical objects description from Wikipedia, Uniprot, MeSH, ChEBI, etc.
- Add this machine-readable research to the sci.AI Registry.
- Explain how machine-readable papers can be retrieved with semantic search and relationship queries.
- Describe how algorithms search for connections within machine-readable texts as compared to plain texts.
- Identify and construct (depending on individual background in biomedicine) candidate discoveries.
The following topics will be covered:
- Semantic labeling of biomedical objects in the text.
- Validating machine results.
- Publishing extended JATS and HTML papers.
- Semantic Web, science, and searching for discoveries in a linked set of data.
For a topic introduction and overview, we will most likely start by seeing an example of discovery that is replicated from machine-readable literature. Then, attendees will learn how to make this possible by preparing research communication in machine-readable format. Practical exercises will make up most of the course.
Schedule description is approximate and may be adjusted till August.
Day 1 – Monday:
Consider example of connected researches.
Overview of Semantic Web, Information Retrieval, and Information Extraction basics
Practice. Paper writing, semanticizing, result validation, and journal submission.
Day 2 – Tuesday:
Practice. Peer-review interaction via the semantic preprint.
Practice: Publishing the JATS version.
Practice. Publishing the extended HTML version.
Practice. Adding to the Registry of the machine-readable papers; querying the paper and related researches.
Course Materials and Supplies:
A laptop will provide you with a platform to practice paper semanticization. If you have favorite published paper, ideally in the neuroscience domain, and with proteins and chemical processes, please drop me a link in advance at firstname.lastname@example.org.
The course is about the intersections of the computer science, academic publishing, and drug discovery fields. I don’t know answers to all possible questions and my understanding of the subject evolves daily. I’m computer scientist and consider biomedical publications processes from a formal logic point of view. Deeper knowledge of the particular field by group members will be very helpful in spotlighting the details of our work.
The sci.AI platform is in an intensive R&D mode. This syllabus may be adjusted right up to the moment of this course in order to provide you with the most up-to-date information possible.
Further Reading and Learning
Biochemistry, Molecular Biology, Cell Biology
- Biochemistry for Dummies (For Dummies). John T. Moore
- Lehninger Principles of Biochemistry. Albert L. Lehninger, David L. Nelson, Michael M. Cox
Information Extraction and Retrieval
- An Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. (2008)
- Graph-based Natural Language Processing And Information Retrieval. Rada Mihalcea, Dragomir Radev
- The next Web. Tim Berners-Lee
- Semantic Web Technologies video course by Dr. Harald Sack (Very comprehensive, a bit long, but still the best resource so far on the topics of Semantic Web, RDF, SPARQL, Ontologies (including OWL), and applications of Semantic Technology.)
- What is RDF and what is it good for?
- List of generic Web ontologies