On Distributed Larger-Tha
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions
arXiv:2402.16442v2 Announce Type: replace
Abstract: Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is commonly …