標題:Optimal Distributed Subsampling for Big Data Analysis
報告時間:2025年6月6日(星期五)13:30-14:30
報告地點:人民大街校區惟真樓523會議室
主講人: 艾明要
主辦單位:數學與統計學院
報告內容簡介:
Subsampling methods are effective techniques to reduce computational burden and maintain statistical inference efficiency for big data. In this talk, we will review different subsampling techniques for different models from linear model, to generalized linear model, and to estimation equations. If the data volume is so large that nonuniform subsampling probabilities cannot be calculated all at once, subsampling with replacement is infeasible to implement. This problem is solved by using a new subsampling without replacement, called Poisson subsampling. To deal with the situation that the full data are stored in different blocks or at multiple locations, a distributed subsampling framework is developed, in which statistics are computed simultaneously on smaller partitions of the full data. Finally, the proposed strategies are illustrated and evaluated through numerical experiments on both simulated and real data sets.
主講人簡介:

艾明要,北京大學數學科學學院二級教授,北京大學教材建設博雅特聘教授。全國應用統計專業學位研究生教育指導委員會委員、培養組組長,中國現場統計研究會副理事長,中國概率統計學會第十一屆理事會秘書長,中國統計學會常務理事。擔任四個國際重要SCI期刊Stat Sinica、JSPI、SPL和Stat編委,國內核心期刊 《系統科學與數學》、《數理統計與管理》、《數學進展》編委,科學出版社《統計與數據科學叢書》編委。主要從事大數據采樣理論與算法、試驗設計與分析、計算機仿真與建模、應用統計的教學和研究工作,在AOS、JASA、Biometrika、《中國科學》等國內外重要期刊發表學術論文八十余篇。主持國家自然科學基金重點項目1項(252萬)、國際合作研究項目1項(200萬)、重點項目子課題1項、面上項目5項,參與完成科技部重點研發計劃項目2項。兩次獲得北京大學優秀博士學位論文指導教師,獲北京大學優秀教學成果一等獎、北京市高等學校優秀教學成果二等獎。