[2] viXra:2504.0171 [pdf] submitted on 2025-04-27 09:46:44
Authors: M. Marinescu, L. Martino, G. Villacres, S. G. Arcidiacono, O. Barquero
Comments: 23 Pages.
Feature selection remains a highly relevant and actively researched topic across signal processing, statistics, and machine learning. It has gained new relevance recently, especially because of renewed interest in the so-called Shapley values. However, beyond the Shapley values, many possibilities exist to measure (explicitly or implicitly) the importance of a variable for a specific task. Given a measure of importance, we can obtain a ranking of the input features (involved, e.g., in a regression or classification problem), as provided by an algorithm and/or expert system. Consequently, it is also necessary to evaluate the obtained rankings, for instance to identify the most effective ranking method or to aggregateall results into an average ranking, akin to an ensemble average of expert opinions. In this work, we provide an exhaustive review of several scoring functions and techniques designed for evaluating the ranking methods with or without an available ground-truth. Moreover, the work contains some novel elements such as the use of other famous indices, for instance, the Gini coefficient and effective sampling size (ESS) measures. It is important to remark thatthe paper incorporates insights from a variety of sources across diverse scientific disciplines, including computational statistics, quantitative economics, and machine learning. Finally, we test the described schemes in a controlled experiment on feature selection, in order to compare different ranking methods and to assess their performance and robustness.
Category: Statistics
[1] viXra:2504.0119 [pdf] submitted on 2025-04-17 20:19:43
Authors: Bamba Gueye, Laure Gouba
Comments: 22 Pages. 20 Figures
In this paper, we discuss the role of statistics in simple linear regression, multiple linear regression, and logistic regression. Python has been used to implement the algorithms in these models.
Category: Statistics