Datasets are critical to AI and machine learning, and they are becoming a key driver of the economy. Collection of sensitive data is increasing rapidly, covering almost every aspect of people’s lives. In its current form, this data collection puts both individuals and businesses at risk. I hope that 2020 will be the year when we build the foundation for a responsible data economy.
Today, users have almost no control over how data they generate are used. All kinds of data are shared and sold, including fine-grained locations, medical prescriptions, gene sequences, and DMV registrations. This activity often puts personal privacy and sometimes even national security at risk. As individuals become more aware of these issues, they are losing trust in the services they use.
At the same time, businesses and researchers face numerous challenges in taking advantage of data. First, large scale data breaches continue to plague businesses. Second, with Europe’s General Data Protection Regulation, California’s Consumer Privacy Act, and similar laws, it is becoming more difficult and expensive for businesses to comply with privacy regulations. Third, valuable data are siloed, impeding technical progress. For example, easier use of medical data across institutions for machine learning could lead to improvements in healthcare for everyone.
Changing this broken system into a responsible data economy requires creating new technologies, regulations, and business models. These should aim to provide trustworthy protection and control to data owners (both individuals and businesses) through secure computation, the ability to audit, and machine learning that maintains data privacy. Secure computation can be provided by secure hardware (such as Intel SGX and Keystone Enclave) and cryptographic techniques. Those computations can be made auditable by tying encrypted storage and computation to a distributed ledger.
Greater challenges remain on the machine learning side. In 2020, we can expand on current efforts in differentially private data analytics and machine learning, building scalable systems for practical deployment with large, heterogeneous datasets. Further research and deployment of federated learning also will be important for certain use cases. Finally, advances in robust learning from limited and noisy data could help enable a long tail of ML use cases without compromising privacy.
We are building parts of this vision at Oasis Labs, but there is much more to be done. I hope this year that technologists, businesses, regulators, and the AI community will join us in building the foundation for a truly responsible data economy.
Dawn Song is chief executive and co-founder of Oasis Labs and a professor of computer science and electrical engineering at UC Berkeley.