[4] viXra:2103.0173 [pdf] submitted on 2021-03-27 02:04:39
Authors: Toshihiro Umehara
Comments: 8 Pages.
Data processing and data cleaning are essential steps before applying statistical or machine learning procedures. R provides a flexible way for data processing using vectors. R packages also provide other ways for manipulating data such as using SQL and using chained functions. I present yet another way to process data in a row by row manner using data manipulation oriented script, DataSailr script. This article introduces datasailr package, and shows potential benefits of using domain specific language for data processing.
Category: Statistics
[3] viXra:2103.0079 [pdf] submitted on 2021-03-12 01:15:46
Authors: Stephen P. Smith
Comments: 8 Pages.
The scale invariant prior is revisited, for a single variance parameter and for a variance-covariance matrix. These results are generalized to develop different scale invariant priors where probability measure is assigned through the sum of variance components that represent partitions of total variance, or through a sum of variance-covariance matrices representing partitions of a total variance-covariance matrix.
Category: Statistics
[2] viXra:2103.0018 [pdf] submitted on 2021-03-03 14:39:39
Authors: Joh. J. Sauren
Comments: 3 Pages.
In this communication a short and straightforward algorithm, written in Octave (version 6.1.0 (2020-11-26))/Matlab (version '9.9.0.1538559 (R2020b) Update 3'), is proposed for brute-force computation of the principal constants $d_{2}$ and $d_{3}$ used to calculate control limits for various types of variables control charts encountered in statistical process control (SPC).
Category: Statistics
[1] viXra:2103.0008 [pdf] submitted on 2021-03-02 17:19:51
Authors: L. Martino, V. Elvira, J. Lopez-Santiagoy G. Camps-Valls
Comments: 41 Pages.
In many inference problems, the evaluation of complex and costly models is often required. In this context, Bayesian methods have become very popular in several fields over the last years, in order to obtain parameter inversion, model selection or uncertainty quantification. Bayesian inference requires the approximation of complicated integrals involving (often costly) posterior distributions. Generally, this approximation is obtained by means of Monte Carlo (MC) methods. In order to reduce the computational cost of the corresponding technique, surrogate models (also called emulators) are often employed. Another alternative approach is the so-called Approximate Bayesian Computation (ABC) scheme. ABC does not require the evaluation of the costly model but the ability to simulate artificial data according to that model. Moreover, in ABC, the choice of a suitable distance between real and artificial data is also required. In this work, we introduce a novel approach where the expensive model is evaluated only in some well-chosen samples. The selection of these nodes is based on the so-called compressed Monte Carlo (CMC) scheme. We provide theoretical results supporting the novel algorithms and give empirical evidence of the performance of the proposed method in several numerical experiments. Two of them are real-world applications in astronomy and satellite remote sensing.
Category: Statistics