pvclass: An R Package for p Values for Classification

Niki Zumbrunnen; Lutz Dümbgen

doi:10.18637/jss.v078.i04

Journal of Statistical Software (Jun 2017)

pvclass: An R Package for p Values for Classification

Niki Zumbrunnen,
Lutz Dümbgen

Affiliations

Niki Zumbrunnen
Lutz Dümbgen

DOI: https://doi.org/10.18637/jss.v078.i04
Journal volume & issue: Vol. 78, no. 1
pp. 1 – 19

Abstract

Read online

Let (X, Y) be a random variable consisting of an observed feature vector X and an unobserved class label Y ∈ {1, 2, . . . , L} with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X, Y). Instead of providing point predictors (classifiers) for Y , we compute for each b ∈ {1, 2, . . . , L} a p value π_b (X, D) for the null hypothesis that Y = b, treating Y temporarily as a fixed parameter, i.e., we construct a prediction region for Y with a certain confidence. The advantages of this approach over more traditional ones are reviewed briefly. In principle, any reasonable classifier can be modified to yield nonparametric p values. We describe the R package pvclass which computes nonparametric p values for the potential class memberships of new observations as well as cross-validated p values for the training data. Additionally, it provides graphical displays and quantitative analyses of the p values.

Published in Journal of Statistical Software

ISSN: 1548-7660 (Online)
Publisher: Foundation for Open Access Statistics
Country of publisher: United States
LCC subjects: Social Sciences: Statistics
Website: http://www.jstatsoft.org/

About the journal

Abstract

Keywords