Play all

Intro

Drug discovery is hard

AstraZeneca introduced the "5R" framework

5R has had a significant impact in improving our efficiency

We are investing in new sources of data and faster validation

We need tools to make sense of data & make better and faster decisions

Finding a drug target can be formulated as a hybrid recommendation problem • Scientists need to parse large amount of information and make a ranking prediction • Different formats, data models, locat…

Multiple objective optimization

Traditional recsys approaches

We assemble a large scale knowledge graph from public and AZ internal data

KG pipeline on

Pipeline - series of notebooks

Pipeline stages

Node dictionary

Mappings table

Edge assertions

Keep evidence & context for each assertion

Focus on NLP

Use natural language processing to extract precise information at scale

NLP Termite on Spark

Syntax parsing increases precision of entity recognition

Relationship from literatures reduce sparsity of biological KG

Language models lead to improvements in recall and precision

Learned sentence representation can be used for downstream tasks

Graph embedding pipeline

Approximate nearest neighbor search

Lessons learned

Acknowledgements

Description:

Discover how to build a knowledge graph using Spark and NLP to recommend novel drugs to scientists. Learn about AstraZeneca's "5R" framework and its impact on improving efficiency in drug discovery. Explore the challenges of parsing large amounts of information from various formats and data models, and how to formulate drug target finding as a hybrid recommendation problem. Delve into the process of assembling a large-scale knowledge graph from public and internal data, focusing on NLP techniques to extract precise information at scale. Gain insights into graph embedding pipelines, approximate nearest neighbor search, and valuable lessons learned in the field of drug discovery and recommendation systems.

Building a Knowledge Graph with Spark and NLP for Novel Drug Recommendations

Databricks

Add to list

#Computer Science #Artificial Intelligence #Knowledge Graphs #Data Science #Bioinformatics #Machine Learning #Big Data #Apache Spark #Science #Life Science #Drug Discovery #Recommendation Systems #Mathematics #Graph Theory #Graph Embeddings