PCA-Guided Quantile Sampling: Preserving Data Structure in Large-Scale Subsampling
arXiv:2506.18249v2 Announce Type: replace-cross Abstract: We introduce Principal Component Analysis guided Quantile Sampling (PCA QS), a novel sampling framework designed to preserve both the statistical and geometric structure of large scale datasets. Unlike conventional PCA, which reduces dimensionality at the cost of interpretability, PCA QS retains the original feature space while using leading principal components solely to guide a quantile based stratification scheme. This principled design ensures that sampling remains representative without distorting the underlying data semantics. We […]