Data Science Friday Webinar


Friday, May 11, 2018, 12:00pm to 1:00pm


Countway Library Ware Room 505

Data Science Friday at Countway Library Presents: The BD2K Guide to the Fundamentals of Data Science Series Lunch and Learn!

The webinars provided essential training suitable for individuals at an introductory overview level, and consist of presentations from experts across the country covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” in biomedicine. This webinar series is a collaboration between the TCC, the NIH Office of the Associate Director for Data Science, and BD2K Centers Coordination Center (BD2KCCC).

Join us in the Countway Library Ware Room for Data Science Friday. These Lunch and Learn sessions will provide a platform for those interested to view the webinar and engage in a discussion afterwards. And don't forget to bring your lunch!


Friday May 11: Biomedicine and the Foundations of Data


Speaker: Michael Mahoney,University of California, Berkeley

Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's Fall 2013 program on the Theoretical Foundations of Big Data Analysis, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is currently running the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley.

Lecture Abstract

Recent technological advances have permitted the generation of enormous quantities of data in a wide range of application domains, from the social sciences and social media to electronic and traditional commerce to the physical and biomedical sciences. This has in turn generated interest in foundational issues. Examples of such issues are to understand what is common and what is distinct between data in each of these areas and methods applied to data from each of these areas, to address theoretical questions underlying machine learning and data analysis tools, and to ask what does it even mean to provide a foundation for an area as diverse as what is currently called data science. Dr. Mahoney will address some of these questions, including how biomedicine may fit within this area. He will provide a "test case" example of how work on foundational topics has been applied to biomedical problems: the development of algorithmically and statistically principled and interpretable low-rank matrix decompositions and how they can then be implemented and applied to terabytes of data to solve very practical genetics and medical imaging problems. He will then conclude by describing some challenges and opportunities.


Attend: Optional Registration​​​​​​​