主 题:Surprise sampling: an optimal subsampling design
内容简介:Sampling for surprise is a working principle of efficient sampling for the saving of computational workload among other purposes. A sample is deemed surprising if it has large error of pilot prediction or large absolute score, and will be sampled with larger sampling probability, as it in general contains more information than non-surprising samples. Such sampling schemes are particularly useful when dealing with imbalanced data. Following the working principle, we propose a sample design called surprise sampling. It caters to the specific forms of a variety of objectives. The estimation procedure is valid even if the model is misspecified and/or the pilot estimator is inconsistent. The proposed surprise sampling includes as a special case the local case-control sampling (Fithian and Hastie, 2014), which high efficiency by utilizing a clever adjustment pertained only to the logistic model. The proposed estimator also performs no worse than that of (Fithian and Hastie, 2014) under same model specification. We present theoretical justifications of the claimed advantages and optimality of the estimation and the sampling design. Numerical studies are carried out and the evidence in support of the theory is shown.
报告人:郁文 副教授
时 间:2018-11-14 14:00
地 点:竞慧东楼302