5R has had a significant impact in improving our efficiency
5
We are investing in new sources of data and faster validation
6
We need tools to make sense of data & make better and faster decisions
7
Finding a drug target can be formulated as a hybrid recommendation problem • Scientists need to parse large amount of information and make a ranking prediction • Different formats, data models, locat…
8
Multiple objective optimization
9
Traditional recsys approaches
10
We assemble a large scale knowledge graph from public and AZ internal data
11
KG pipeline on
12
Pipeline - series of notebooks
13
Pipeline stages
14
Node dictionary
15
Mappings table
16
Edge assertions
17
Keep evidence & context for each assertion
18
Focus on NLP
19
Use natural language processing to extract precise information at scale
20
NLP Termite on Spark
21
Syntax parsing increases precision of entity recognition
22
Relationship from literatures reduce sparsity of biological KG
23
Language models lead to improvements in recall and precision
24
Learned sentence representation can be used for downstream tasks
25
Graph embedding pipeline
26
Approximate nearest neighbor search
27
Lessons learned
28
Acknowledgements
Description:
Discover how to build a knowledge graph using Spark and NLP to recommend novel drugs to scientists. Learn about AstraZeneca's "5R" framework and its impact on improving efficiency in drug discovery. Explore the challenges of parsing large amounts of information from various formats and data models, and how to formulate drug target finding as a hybrid recommendation problem. Delve into the process of assembling a large-scale knowledge graph from public and internal data, focusing on NLP techniques to extract precise information at scale. Gain insights into graph embedding pipelines, approximate nearest neighbor search, and valuable lessons learned in the field of drug discovery and recommendation systems.
Building a Knowledge Graph with Spark and NLP for Novel Drug Recommendations