Главная
Study mode:
on
1
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the Data Provenance Initiative, a groundbreaking effort to audit and trace over 1800 text datasets used in AI training. Learn about the legal and ethical concerns surrounding dataset licensing and attribution in the AI industry. Discover the tools and standards developed to trace dataset lineage, from sources and creators to license conditions and subsequent use. Examine the landscape analysis revealing stark differences between commercially open and closed datasets, including their composition and focus areas. Gain insights from speakers Anthony Chen, an engineer at Google DeepMind, and Shayne Longpre, a PhD candidate at MIT, as they present their findings and discuss the implications for data transparency and understanding in AI development. Delve into the challenges of dataset monopolization in areas such as low-resource languages, creative tasks, and synthetic training data.

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing and Attribution in AI

USC Information Sciences Institute
Add to list
0:00 / 0:00