Generative AI + Law (GenLaw) ’24

Workshop date: 27 July 2024

Paper submission deadline (CFP): 11 June 2024

AI Policy Social: 25 July 2024 [RSVP]
(Co-hosted with UK AISI + Stanford HAI)

We are very excited to announce the second Workshop on Generative AI and Law (GenLaw ’24)! Please join us in Vienna, Austria at ICML ’24, where we’ll be bringing together experts in privacy, ML, policy, and law to discuss the intellectual property (IP) and privacy challenges that generative AI raises with a special focus on UK and EU issues.

Read our report from last year, an explainer on training dataset curation, a piece on the copyright issues generative AI raises, and a piece on memorization and copyright.

Progress in generative AI depends not only on better model architectures, but on terabytes of scraped Flickr images, Wikipedia pages, Stack Overflow answers, and websites. But generative models ingest vast quantities of intellectual property (IP), which they can memorize and regurgitate verbatim. Several recently-filed lawsuits relate such memorization to copyright infringement. These lawsuits will lead to policies and legal rulings that define our ability, as ML researchers and practitioners, to acquire training data, and our responsibilities towards data owners and curators.

AI researchers will increasingly operate in a legal environment that is keenly interested in their work — an environment that may require future research into model architectures that conform to legal requirements. Understanding the law and contributing to its development will enable us to create safer, better, and practically useful models.

About the Workshop

We’re excited to share a series of tutorials from renowned experts in both ML and law and panel discussions, where researchers in both disciplines can engage in semi-moderated conversation.

Our workshop will begin to build a comprehensive and precise synthesis of the legal issues at play. Beyond IP, the workshop will also address privacy and liability for dangerous, discriminatory, or misleading and manipulative outputs. It will take place on 27 July 2024.

Speakers

  • Kyle Lo

    Kyle Lo

    Lead Scientist, Allen Institute for AI

    [website]

  • Gabriele Mazzini

    Gabriele Mazzini

    Architect and lead author, AI Act

    Fellow, MIT

    [website]

  • Martin Senftleben

    Martin Senftleben

    Professor of Intellectual Property Law and Director, Institute for Information Law (IViR), University of Amsterdam

    Of Counsel, Bird & Bird, The Hague, The Netherlands

    [website]

  • Julia Powles

    Julia Powles

    Associate Professor, University of Western Australia

    [website]

  • Connor Dunlop

    Connor Dunlop

    European Public Policy Lead, Ada Lovelace Institute

    [website]

  • Sabrina Küspert

    Sabrina Küspert

    Policy Officer, European AI Office of the European Commission

  • Katja Filippova

    Katja Filippova

    Research Scientist, Google DeepMind

    [website]

  • Kimberly Mai

    Kimberly Mai

    Principal Technology Adviser, UK ICO

  • Herbie Bradley

    Herbie Bradley

    Research Scientist, UK AI Safety Institute

    [website]

  • Sabrina Ross

    Sabrina Ross

    AI and Privacy Policy Director, Meta

    [website]

  • Paul Ohm

    Paul Ohm

    Professor of Law, Georgetown University Law Center

    [website]

Schedule

  • Opening remarks from GenLaw lead organizers, Katherine Lee and A. Feder Cooper

  • Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities, risks and limitations. In this talk, I’ll shed light on many of these seldom-discussed data curation practices, sharing challenges faced from the perspective of the language model developer. I’ll draw upon real scenarios encountered during our work curating and releasing Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. Finally, I’ll present some of my concerns and perceived obstacles to open data amid a rapidly changing language model data landscape.

    Bio: Kyle Lo is a research scientist at the Allen Institute for AI in Seattle working on topics in natural language processing, machine learning and human-AI interaction, with emphasis on language model development and adapting language models to specialized (scientific) texts. He is a tech lead on the OLMo project, focusing on dataset curation. Kyle's works on language model adaptation, scientific text processing, long document summarization, and AI-powered reading assistance have won paper awards at top conferences including ACL, CHI, EMNLP and EACL. His works on open language models and datasets have been featured in articles published in Nature, Science, TechCrunch, MIT Tech Review, GeekWire, and others. In 2020, Kyle was a co-lead for a White House OSTP initiative to curate and release the largest collection of COVID-19 research to-date in support of computational use-cases like automated literature review. Kyle graduated with a degree in Statistics from the University of Washington. He enjoys board games, boba tea, D&D, and relaxing with his cat Belphegor.

  • Bio: Gabriele Mazzini is the architect and lead author of the proposal on the Artificial Intelligence Act (AI Act) at the European Commission, where he has focused on the legal and policy questions raised by new technologies since August 2017. Between 2009 and 2017 Gabriele studied and worked in the United States. He was Associate General Counsel at the Millennium Villages Project, an international development initiative across several sub-Saharan countries founded and directed by Dr. Jeffrey Sachs, Columbia University economist and senior United Nations advisor, and he worked with early stage start-ups in the field of emergency communications and smart energy solutions. As EU official, he previously served in the European Parliament and the EU Court of Justice. He holds a LLM from Harvard Law School, a PhD in Italian and Comparative Criminal Law from the University of Pavia and a Law Degree from the Catholic University in Milan. He is Connection Science Fellow at the Massachusetts Institute of Technology (Media Lab) and is qualified to practice law in Italy and New York.

  • Abstract: Copyright law grants exclusive rights that may preclude the use of human literary and artistic works for GenAI training. Seeking to reconcile the societal interest in the development of high-quality AI systems with copyright protection, lawmakers around the globe have taken different approaches: from the general exemption of text and data mining to fair use solutions and specific exceptions to copyright. The EU has adopted a particularly complex approach. EU copyright law – now flanked by provisions in the AI Act – oblige AI trainers to observe rights reservations that exclude text and data mining. AI trainers must also provide lists of literary and artistic works used for AI development. Surveying this complex legal framework, the talk will discuss the potential impact of EU regulations on commercial and nonprofit AI training projects inside and outside the EU. It will also shed light on infringement risks arising from memorization and address the potential marginalization of EU work repertoires in training datasets (and biases evolving from this marginalization). The discussion will lead to the insight that current licensing initiatives may come too late to dispel concerns about copyright claims relating to large-scale use of copyrighted material years ago – in the early days of GenAI development.  

    Bio: Martin Senftleben is Professor of Intellectual Property Law and Director, Institute for Information Law (IViR), University of Amsterdam. His activities focus on the reconciliation of private intellectual property rights with competing public interests of a social, cultural or economic nature. Current research topics include generative AI systems and author remuneration; open science and digital autonomy of researchers; platform and digital ecosystem regulation; copyright data improvement and content recommender systems; quality journalism and the economic viability of public interest media; behavioural advertising and consumer empowerment; the development of sustainable intellectual property policy.

    Professor Senftleben is a member of the Benelux Council for Intellectual Property and a former member of the Copyright Advisory Committee of the Dutch State. He provided advice to WIPO in copyright, trademark and unfair competition projects. For the European Commission, he prepared studies on data access and reuse in research contexts. He is a member of the Trademark Law Institute (TLI), the European Copyright Society (ECS) and the Executive Committee of the Association littéraire et artistique internationale (ALAI). As a visiting professor, he was invited to the National University of Singapore, the Engelberg Center at NYU Law School, the Oxford Intellectual Property Research Centre, Tel Aviv University and the Intellectual Property Research Institute of Xiamen University. His numerous publications include Copyright, Limitations and the Three-Step Test (2004), European Trade Mark Law (with Annette Kur, 2017), The Copyright/Trademark Interface (2020) and Generative AI and Author Remuneration (2023). As a guest lecturer, he provides courses at the Munich Intellectual Property Law Center (MIPLC), the Jagiellonian University Krakow and the University of Catania.

  • Panelists: Julia Powles, Kyle Lo, Martin Senftleben

    Moderator: A. Feder Cooper

    Julia Powles is the Director of the UWA Tech & Policy Lab and Associate Professor of Law and Technology at the University of Western Australia. An expert in privacy, intellectual property, internet governance, and the law and politics of data, automation and artificial intelligence, Julia serves on Australian federal and state committees on generative AI in education, AI and copyright, privacy and responsible information sharing, responsible AI, and robotics.

    Martin Senftleben is Professor of Intellectual Property Law and Director, Institute for Information Law (IViR), University of Amsterdam. His activities focus on the reconciliation of private intellectual property rights with competing public interests of a social, cultural or economic nature. Current research topics include generative AI systems and author remuneration; open science and digital autonomy of researchers; platform and digital ecosystem regulation; copyright data improvement and content recommender systems; quality journalism and the economic viability of public interest media; behavioural advertising and consumer empowerment; the development of sustainable intellectual property policy.

    Professor Senftleben is a member of the Benelux Council for Intellectual Property and a former member of the Copyright Advisory Committee of the Dutch State. He provided advice to WIPO in copyright, trademark and unfair competition projects. For the European Commission, he prepared studies on data access and reuse in research contexts. He is a member of the Trademark Law Institute (TLI), the European Copyright Society (ECS) and the Executive Committee of the Association littéraire et artistique internationale (ALAI). As a visiting professor, he was invited to the National University of Singapore, the Engelberg Center at NYU Law School, the Oxford Intellectual Property Research Centre, Tel Aviv University and the Intellectual Property Research Institute of Xiamen University. His numerous publications include Copyright, Limitations and the Three-Step Test (2004), European Trade Mark Law (with Annette Kur, 2017), The Copyright/Trademark Interface (2020) and Generative AI and Author Remuneration (2023). As a guest lecturer, he provides courses at the Munich Intellectual Property Law Center (MIPLC), the Jagiellonian University Krakow and the University of Catania.

  • Bio: Connor is based in Brussels, and is responsible for leading and delivering Ada’s EU strategy on the governance and regulation of AI in Europe. Prior to joining Ada, Connor worked in public affairs, where he led his team’s work on the EU’s AI Act and AI Liability Directive. Besides this, Connor has experience in a range of analytical and researcher roles, such as with the UN Refugee Agency and The Hague Centre for Strategic Studies.

  • Bio: Sabrina Küspert is a Policy Officer at the newly established European AI Office of the European Commission. In her role, she focuses on setting up the governance system for general-purpose AI. This is part of implementing the EU AI Act as the world’s first-ever comprehensive legal framework on AI, working towards a global approach to trustworthy AI. During her time as Seconded Policy Expert, she was part of the team negotiating the AI Act on the side of the European Commission. Previously, she was Fellow and Expert on AI at the European Tech Think Tank Stiftung Neue Verantwortung in parallel to her stay at the University of Oxford. Her research focused on exploring the role of Germany and Europe for trustworthy AI worldwide, including through regulation, international cooperation and innovation policy. This included publications on the AI value chain and on unreliability, misuse and systemic risks of general-purpose AI. When she was a strategy consultant with the Boston Consulting Group (BCG), she helped establish their global Responsible AI practice. 

  • "Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI" by Hönig, Robert; Rando, Javier; Carlini, Nicholas; Tramer, Florian

    "Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law" by Franceschelli, Giorgio; Cevenini, Claudia; Musolesi, Mirco

    "Machine Unlearning Fails to Remove Data Poisoning Attacks" by Pawelczyk, Martin; Sekhari, Ayush; Di, Jimmy Z; Lu, Yiwei; Kamath, Gautam; Neel, Seth

    "Ignore Safety Directions. Violate the CFAA?" by Siva Kumar, Ram Shankar; Albert, Kendra; Penney, Jonathon

    "Ordering Model Deletion" by Wilf-Townsend, Daniel

    "Fantastic Copyrighted Beasts and How (Not) to Generate Them" by He, Luxi; Huang, Yangsibo; Shi, Weijia; Xie, Tinghao; Liu, Haotian; Wang, Yue; Zettlemoyer, Luke; Zhang, Chiyuan; Chen, Danqi; Henderson, Peter

  • Abstract: Machine unlearning (MU) – an umbrella term covering a variety of methods which have in common that they all remove some information from an already trained model – is gaining traction as a potential solution to many challenges posed by large deep neural networks. Most MU methods rely on the strong assumption that the “forget set” Df – the training examples whose influence we want to remove from a trained model – is given. When MU is to deal with undesirable outputs, a proposed strategy is to identify the forget set using state-of-the-art training data attribution methods which are designed to find training examples “responsible” for a prediction. This talk will give a short overview of common MU strategies and share recent results that show that the combination of TDA + MU might not live up to its high expectations.

    Matthew Jagielski is a research scientist at Google DeepMind, working on Andreas Terzis's team. He works on security, privacy, and memorization in machine learning systems. This includes directions like privacy auditing, memorization in generative models, data poisoning, and model stealing. Matthew received his PhD from Northeastern University, where he was fortunate to be advised by Alina Oprea and Cristina Nita-Rotaru, as a member of the Network and Distributed Systems Security Lab (NDS2).

    Katja Filippova is a research scientist at Google DeepMind (Berlin) whose recent research focuses on understanding (large) language models, with a particular emphasis on practical model explainability and model trustworthiness. Apart from that, Katja is interested in interdisciplinary questions at the intersection of cognitive science, ethics and machine learning. She holds a Ph.D. from the Technical University of Darmstadt. 

  • Abstract: This talk aims to demystify data protection and how it fits into the UK’s wider regulatory framework for AI. The talk also introduces the ICO’s current consultation series on how aspects of data protection law should apply to the development and use of generative AI models and calls for contributions from the machine learning community.

    Bio: Kimberly Mai is a Principal Technology Adviser at the Information Commissioner’s Office (ICO), the UK’s information rights regulator, and a PhD researcher at University College London. She currently works in the ICO’s AI Compliance team, a new specialist team which focuses on identifying novel data protection and privacy challenges. She also provides technical advice to help improve compliance in the AI market.

  • Abstract: Many of the outstanding problems in AI Governance today depend on AI researchers and engineers to provide methodologies, tools, and solutions that can be used by policymakers. This talk will go over some open technical problems highly relevant for AI governance & policy, and offer some suggestions for how technical AI researchers can work closer with governments, drawing upon my experience in the UK AI Safety Institute.

    Bio: Herbie Bradley is a Research Scientist in the UK AI Safety Institute, working on research to support AI governance and evaluations for advanced AI systems. Herbie is also a PhD student at the University of Cambridge, and prior to joining the UK’s Frontier AI Taskforce spent the past few years studying the behaviour of large language models and their implications for AI governance in collaboration with several AI start-ups and non-profit research groups, including EleutherAI.

  • Sabrina Ross, Matthew Jagielski, Herbie Bradley, Niloofar Mireshghallah, Moderated by Paul Ohm

    Sabrina Ross is an AI and Privacy Policy Director at Meta. Prior to joining Meta, she served as Uber's Global Head of Policy for Marketplace, building out ethical AI audits, designing responsible innovation efforts into product development, and engaging with policymakers. Prior to her career in policy, Sabrina was a Privacy Legal Director at Uber and Nauto, and practiced privacy law at Apple as well as two global law firms – Sidley Austin and Ropes & Gray.

    Niloofar Mireshghallah is a post-doctoral scholar at the Paul G. Allen Center for Computer Science & Engineering at University of Washington. She received her Ph.D. from the CSE department of UC San Diego in 2023. Her research interests are Trustworthy Machine Learning and Natural Language Processing. She is a recipient of the National Center for Women & IT (NCWIT) Collegiate award in 2020 for her work on privacy-preserving inference, a finalist of the Qualcomm Innovation Fellowship in 2021 and a recipient of the 2022 Rising star in Adversarial ML award.

    Paul Ohm is a Professor of Law at the Georgetown University Law Center in Washington, D.C. In his research, service, and teaching, Professor Ohm builds bridges between computer science and law, utilizing his training and experience as a lawyer, policymaker, computer programmer, and network systems administrator. His research focuses on information privacy, computer crime law, surveillance, technology and the law, and artificial intelligence and the law. Professor Ohm has published landmark articles about the failure of anonymization, the Fourth Amendment and new technology, and broadband privacy. His work has defined fields of scholarly inquiry and influenced policymakers around the world.

    Katherine Lee is a senior research scientist at Google DeepMind and co-founder of The GenLaw Center. In her past work, she was a language model builder and lead author and contributor on the landmark T5 paper. More recently, her work has provided essential empirical evidence and measurement for grounding discussions around concerns that language models infringe copyright, and about how language models can respect an individuals’ right to privacy and control of their data. Additionally, she has proposed methods of reducing memorization. Her work has received recognition at ACL, USENIX, ICLR, AAAI, and ICML.

AI Policy Panelists

  • Gabriele Mazzini

    Gabriele Mazzini

    Architect and lead author, AI Act

    Fellow, MIT

    [website]

  • Connor Dunlop

    Connor Dunlop

    European Public Policy Lead, Ada Lovelace Institute

    [website]

  • Sabrina Ross

    Sabrina Ross

    AI and Privacy Policy Director, Meta

    [website]

  • David Bau

    David Bau

    Assistant Professor of Computer Science, Northeastern Khoury College

    [website]

Organizers

  • Katherine Lee

    Katherine Lee

    Senior Research Scientist at Google DeepMind

    [website]

  • A. Feder Cooper

    A. Feder Cooper

    Postdoctoral Researcher at Microsoft Research

    Affiliate Researcher at Stanford HAI, CRFM and RegLab

    Incoming Assistant Professor of Computer Science at Yale University

    [website]

  • Niloofar Mireshghallah

    Niloofar Mireshghallah

    Post-Doctoral Researcher at University of Washington, Paul G. Allen Center for Computer Science and Engineering

    [website]

  • Lydia Belkadi

    Lydia Belkadi

    Doctoral Researcher in Privacy Preserving Biometrics at KU Leuven Center for IT & IP Law Science

    [website]

  • James Grimmelmann

    James Grimmelmann

    Professor of Digital and Information Law at Cornell Law School and Cornell Tech

    [website]

  • Matthew Jagielski

    Matthew Jagielski

    Research Scientist at Google DeepMind

    [website]

  • Milad Nasr

    Milad Nasr

    Research Scientist at Google DeepMind

    [website]

Advisors

  • Pamela Samuelson

    Pamela Samuelson

    Distinguished Professor of Law and Information at University of California, Berkeley

  • Colin Raffel

    Colin Raffel

    Associate Professor and Associate Research Director at University of Toronto and Vector Institute

  • Andres Guadamuz

    Andres Guadamuz

    Reader in Intellectual Property Law at University of Sussex

    Editor in Chief at Journal of World Intellectual Property

  • Brittany Smith

    Brittany Smith

    UK Policy and Partnerships Lead at OpenAI

  • Herbie Bradley

    Herbie Bradley

    Research Scientist at UK AI Safety Institute

  • Hoda Heidari

    Hoda Heidari

    K&L Gates Career Development Assistant Professor in Ethics and Computational Technologies at Carnegie Mellon University

  • Michèle Finck

    Michèle Finck

    Professor of Law and Artificial Intelligence at University of Tübingen

    Co-director at CSZ Institute for Artificial Intelligence and Law

GenLaw is grateful for support from the following sponsors and partners: