We hold bi-weekly talks on Fridays from PM to 5 PM CET for and by researchers and practitioners designing and implementing data systems. The objective is to establish a new forum for the Dutch Data …
Description:
Explore a 21-minute research talk that delves into GitSchemas, a groundbreaking corpus of database schema information extracted from SQL scripts in public code repositories. Learn how this extensive collection, containing schema information for over 150,000 schemas, 1 million tables, and nearly 600,000 foreign key relationships, addresses the limitations of existing tabular data collections. Discover how this comprehensive schema information can be leveraged to benchmark and improve various data management challenges, including foreign key detection and constraint predictions. The presentation, delivered by Till Döhmen, a PhD student at RWTH Aachen University and guest researcher at UvA Intelligent Data Engineering Lab, offers valuable insights into practical database system usage and advances in table representation learning approaches for semantic annotation, data imputation, and automated error detection.
Database Schemas in the Wild - Learning from a Large Corpus of Relational Database Schemas