Previous months:
2010 - 1003(10) - 1004(7) - 1005(4) - 1006(1) - 1007(2) - 1008(4) - 1010(1) - 1011(1)
2011 - 1105(2) - 1107(1) - 1111(1) - 1112(1)
2012 - 1203(1) - 1204(2) - 1205(1) - 1208(1) - 1210(1) - 1211(6) - 1212(1)
2013 - 1301(1) - 1304(3) - 1306(1) - 1307(1) - 1310(2)
2014 - 1402(1) - 1403(3) - 1404(2) - 1405(2) - 1407(1) - 1409(4) - 1410(4) - 1411(13) - 1412(4)
2015 - 1503(1) - 1505(2) - 1506(2) - 1507(3) - 1508(3) - 1509(1) - 1511(2) - 1512(6)
2016 - 1601(6) - 1602(3) - 1603(4) - 1604(2) - 1605(1) - 1607(5) - 1608(1) - 1609(4) - 1610(1) - 1611(1) - 1612(2)
2017 - 1701(4) - 1702(3) - 1703(5) - 1704(11) - 1705(12) - 1706(8) - 1707(2) - 1708(2) - 1709(1) - 1710(3) - 1711(5) - 1712(6)
2018 - 1801(5) - 1802(3) - 1803(4) - 1804(4) - 1805(3) - 1806(5) - 1807(2) - 1808(1) - 1809(3) - 1810(5) - 1811(4) - 1812(2)
2019 - 1901(3) - 1903(1) - 1904(2) - 1905(4) - 1906(1) - 1907(2) - 1908(1) - 1909(1) - 1910(2) - 1911(3) - 1912(1)
2020 - 2001(3) - 2002(1) - 2003(1) - 2004(3) - 2005(2) - 2006(2) - 2007(1) - 2008(3) - 2009(2) - 2010(2) - 2011(2) - 2012(12)
2021 - 2101(3) - 2102(3) - 2103(4) - 2104(1) - 2106(2) - 2107(2) - 2109(1) - 2110(2) - 2111(3) - 2112(3)
2022 - 2201(1) - 2202(2) - 2204(2) - 2207(1) - 2209(2) - 2212(1)
2023 - 2301(1) - 2302(1) - 2303(1) - 2304(1) - 2305(1) - 2306(1) - 2307(1) - 2308(1) - 2309(1) - 2310(2) - 2311(1) - 2312(2)
2024 - 2402(2) - 2404(2) - 2406(3) - 2407(2) - 2408(1) - 2411(2)
Any replacements are listed farther down
[346] viXra:2411.0080 [pdf] submitted on 2024-11-11 20:56:07
Authors: Andrea Berdondini
Comments: 6 Pages.
The fundamental problem of causal inference defines the impossibility of associating a causal link to a correlation, in other words: correlation does not prove causality. This problem can be understood from two points of view: experimental and statistical. The experimental approach tells us that this problem arises from the impossibility of simultaneously observing an event both in the presence and absence of a hypothesis. The statistical approach, on the other hand, suggests that this problem stems from the error of treating tested hypotheses as independent of each other. Modern statistics tends to place greater emphasis on the statistical approach because, compared to the experimental point of view, it also shows us a way to solve the problem. Indeed, when testing many hypotheses, a composite hypothesis is constructed that tends to cover the entire solution space. Consequently, the composite hypothesis can be fitted to any data set by generating a random correlation. Furthermore, the probability that the correlation is random is equal to the probability of obtaining the same result by generating an equivalent number of random hypotheses.
Category: Statistics
[345] viXra:2411.0008 [pdf] submitted on 2024-11-02 07:03:48
Authors: Debjoy Thakur
Comments: 28 Pages.
The application of deep neural networks in geospatial data has become a trending research problem in the present day. A significant amount of statistical research has already been introduced, such as generalized least square optimization by incorporating spatial variance-covariance matrix, considering basis functions in the input nodes of the neural networks, and so on. However, for lattice data, there is no available literature about the utilization of asymptotic analysis of neural networks in regression for spatial data. This article proposes a consistent localized two-layer deep neural network-based regression for spatial data. We have proved the consistency of this deep neural network for bounded and unbounded spatial domains under a fixed sampling design of mixed-increasing spatial regions. We have proved that its asymptotic convergence rate is faster than that of cite{zhan2024neural}'s neural network and an improved generalization of cite{shen2023asymptotic}'s neural network structure. We empirically observe the rate of convergence of discrepancy measures between the empirical probability distribution of observed and predicted data, which will become faster for a less smooth spatial surface. We have applied our asymptotic analysis of deep neural networks to the estimation of the monthly average temperature of major cities in the USA from its satellite image. This application is an effective showcase of non-linear spatial regression. We demonstrate our methodology with simulated lattice data in various scenarios.
Category: Statistics
[344] viXra:2408.0103 [pdf] submitted on 2024-08-26 02:11:07
Authors: Parker Emmerson
Comments: 10 Pages. (Abstract added by viXra Admin as required - Please conform!)
We analyze the frequency of digit occurrences in that digit's factorial expression in different bases. I write programs for visualizing it.
Category: Statistics
[343] viXra:2407.0015 [pdf] submitted on 2024-07-02 06:50:01
Authors: Taha Sochi
Comments: 185 Pages.
This book is a collection of notes and solved problems about probability theory. The book also contains proposed exercises attached to the solved problems as well as computer codes (in C++ language) added to some of these problems for the purpose of calculation, test and simulation. Illustrations (such as figures and tables) are added when necessary or appropriate to enhance clarity and improve understanding. In most cases intuitive arguments and methods are used to make the notes and solutions natural and instinctive. Like my previous books, maximum clarity was one of the main objectives and criteria in determining the style of writing, presenting and structuring the book as well as selecting its contents.
Category: Statistics
[342] viXra:2407.0002 [pdf] submitted on 2024-07-01 14:25:14
Authors: Kazi Sakib Hasan
Comments: 24 Pages.
This paper presents an easier and new robust method for hypothesis testing to conclude significant mean differences between two independent or paired samples using the concepts of location, variability, confidence intervals and Gaussian distribution. For hypothesis testing of two samples, t-test is widely used. Beside this, Wilcoxon signed-rank test and often permutation test is also conducted. Each of these methods have their own rigorousness and drawbacks for which general people and non-statistics students often find it hard to conduct experiments using these. To fix these issues, a new method of hypothesis testing is proposed in this paper that basically utilises the properties of normally distributed data and resampling, and is relatively easier to calculate using only pen and paper. The time complexity analysis of each program is also conducted to give a concise overview about which hypothesis testing algorithm is more efficient and faster to execute, since statisticians use a lot of software nowadays for their analytical tasks.
Category: Statistics
[341] viXra:2406.0160 [pdf] submitted on 2024-06-27 17:11:50
Authors: Vittorio Lippi
Comments: 4 Pages. presented at International Conference on Mathematical Analysis and Applications in Science and Engineering 20 - 22 June 2024, Porto, Portugal
The frequency response function (FRF) is an established way to describe the outcome of experiments in posture control literature. The FRF is an empirical transfer function between an input stimulus and the induced body segment sway profile, represented as a vector of complex values associated with a vector of frequencies. Having obtained an FRF from a trial with a subject, it can be useful to quantify the likelihood it belongs to a certain population, e.g., to diagnose a condition or to evaluate the human likeliness of a humanoid robot or a wearable device. In this work, a recently proposed method for FRF statistics based on confidence bands computed with Bootstrap will be summarized, and, on its basis, possible ways to quantify the likelihood of FRFs belonging to a given set will be proposed
Category: Statistics
[340] viXra:2406.0159 [pdf] submitted on 2024-06-27 20:15:06
Authors: Vittorio Lippi
Comments: 2 Pages. Presented at 9th International Posture Symposium, Smolenice 2023 (Note by viXra Admin: An abstract on the article is required)
The frequency response function (FRF) is an established way to describe the outcome of experiments in posture control literature. Specifically, the FRF is an empirical transfer function between an input stimulus and the induced body movement. By definition, the FRF is a complex function of frequency. When statistical analysis is performed to assess differences between groups of FRFs (e.g., obtained under different conditions or from a group of patients and a control group), the FRF's structure should be considered. Usually, the statistics are performed defined a scalar variable to be studied, such as the norm of the difference between FRFs, or considering the components independently (that can be applied to real and complex components separately), in some cases both approaches are integrated, e.g., the comparison frequency-by frequency is used as a post hoc test when the null hypothesis is rejected on the scalar value. The two components of the complex values can be tested with multivariate methods such as Hotelling’s T2 as done in on the averages of the FRF over all the frequencies, where a further post hoc test is performed applying bootstrap on magnitude and phase separately. The problem with the definition of a scalar variable as the norm of the differences or the difference of the averages in the previous examples is that it introduces an arbitrary metric that, although reasonable, has no substantial connection with the experiment unless the scalar value is assumed a priori as the object of the study as in where a human-likeness score for humanoid robots is defined on the basis of FRFs difference. On the other hand, testing frequencies (and components) separately does not consider that the FRF's values are not independent, and applying corrections for multiple comparisons, e.g., Bonferroni can result in a too conservative approach destroying the power of the experiment. In order to properly consider the nature of the FRF, a method based on random field theory is presented. A case study with data from posture control experiments is presented. To take into account the two components (imaginary and real) as two independent variables, the fact that the same subject repeated the test in the two conditions, a 1-D implementation of the Hotelling T2 is used as presented in but applied in the frequency domain instead of the time domain.
Category: Statistics
[339] viXra:2406.0055 [pdf] submitted on 2024-06-12 21:00:31
Authors: L. Martino, F. Llorente
Comments: 22 Pages.
Improper priors are not allowed for the computation of the Bayesian evidence Z = p(y) (a.k.a., marginal likelihood), since in this case Z is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name "fake evidences" (or "areas under the likelihood" in the case of uniform improper priors). We also show that, in this model selection scenario, using a diu2002use prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Category: Statistics
[338] viXra:2404.0105 [pdf] submitted on 2024-04-21 12:12:42
Authors: L. Martino, E. Morgado, R. San Millan Castillo
Comments: 20 Pages.
An index of effective number of variables (ENV) is introduced for model selection in nested models. This is the case, for instance, when we have to decide the order of a polynomial function or the number of bases in a nonlinear regression, or choose the number of clusters in a clustering problem, or the number of feature in a variable selection application (to name few examples). It is inspired by the concept of maximum area under the curve (AUC) idea and the Gini index. The interpretation of the ENV index is identical to the effective sample size (ESS) indices with respect to a set of samples. The ENV index improves some drawback the elbow detectors described in the literature, and introduces different measures of uncertainty and reliability of the proposed solution. These novel reliability measures can be employed also jointly with the use different information criteria such as the well-known AIC and BIC. Comparisons with classical and recent schemes are provided in different experiments involving real datasets. Related Matlab code is given.
Category: Statistics
[337] viXra:2404.0064 [pdf] submitted on 2024-04-13 20:44:31
Authors: Mathis Antonetti
Comments: 3 Pages.
In this note, we establish a uniform lower bound (w.r.t. the number of players) for the probability of k players tied for first place in the geometric case. To derive this bound, we introduce the concept of supertelescoping series as a generalization of telescoping series. We also provide an insight on the relationship between supertelescopic series and supermartingales.
Category: Statistics
[336] viXra:2402.0093 [pdf] submitted on 2024-02-18 11:04:32
Authors: L. Beleña, E. Curbelo, L. Martino, V. Laparra
Comments: 14 Pages.
Volatility estimation and quantile regression are relevant active research areas in statistics, machine learning and econometrics. In this work, we propose two procedures to estimate local variances in generic regression problems by using of kernel smoothers. The proposed schemes can be applied in multidimesional scenarios (not just for time series analysis) and easily in a multi-output framework, as well. Moreover, they allow the possibility of providing uncertainty estimation using a generic kernel smoother technique. Several numerical experiments show the benefits of the proposed methods, even comparing with benchmark techniques. One of these experiment involves a real dataset analysis.
Category: Statistics
[335] viXra:2402.0061 [pdf] submitted on 2024-02-12 07:13:17
Authors: Dajun Chen
Comments: 2 Pages.
This paper proposes two methods for fitting probability density function only with samples from the distribution. The methods are inspired by Generative Adversarial Networks . The demos run in Pytorch and they are available on https://github.com/chendajunAlpha/Fit-probability-density-function
Category: Statistics
[334] viXra:2312.0089 [pdf] submitted on 2023-12-17 14:49:03
Authors: Hans Lugtigheid
Comments: 15 Pages.
This article analyses the conjecture that excess mortality is underestimated with the pandemic.I use the numbers from the CBS (Dutch Central Bureau for Statistics) as an example. As a baseline we take the expected mortality for 2021 and 2022 from 2019. I correct this expected mortality with the estimated number of people who died in earlier years than expected because of the pandemic. For 2021 this correction is 8K. The CBS expects the mortality to be almost equal to the estimate from 2019. Then the excess mortality increases from 16K (CBS) to 24K.I present the following idea to explain the difference. At the beginning of very year the numbers of people in year groups are usually adjusted by applying a historical determined percentage to the population at January first. Covid hits the weakest the hardest. This changes the distribution of the expected remaining life years in the year group. And thus the average expected remaining life years. Hence the percentage has to be adjusted. Then the expected mortality decreases and the excess mortality increases.The excess mortality within a year are people who for example died in April from covid but who would have died in October without the pandemic. With this number total excess mortality rises with 6K to 30K.Excess mortality is divided in covid and non-covid. De large increase in non-covid deaths is striking.The analysis supports the conjecture that excess mortality is underestimated.Note: The numbers in this article are for the Netherlands. For you own country use the appropriate numbers.
Category: Statistics
[333] viXra:2312.0088 [pdf] submitted on 2023-12-17 23:25:17
Authors: Hans Lugtigheid
Comments: 4 Pages.
This article discusses the influence of a disturbance like covid on the calculation of life expectancy in year groups etcetera. Life expectancies in year-groups are usually adjusted in the beginning of the year based on the population in the beginning of the year. This is done with a percentage based on previous years. This percentage is a reflection of volume. With the pandemic the weak were hit heavily by covid. A consequence is that the distribution of life expectancy changes in the year groups. This increases the life expectancy and decreases the expected mortality in the year group. Then the calculation for the year groups has to be adjusted accordingly. In this article I give an example of such adjustment. One can accordingly adjust likewise statistics.
Category: Statistics
[332] viXra:2311.0085 [pdf] submitted on 2023-11-19 02:50:22
Authors: Zenin Easa Panthakkalakath, Neeraj, Jimson Mathew
Comments: 12 Pages.
The challenges posed by epidemics and pandemics are immense, especially if the causes are novel. This article introduces a versatile open-source simulation framework designed to model intricate dynamics of infectious diseases across diverse population centres. Taking inspiration from historical precedents such as the Spanish flu and COVID-19, and geographical economic theories such as Central place theory, the simulation integrates agent-based modelling to depict the movement and interactions of individuals within different settlement hierarchies. Additionally, the framework provides a tool for decision-makers to assess and strategize optimal distribution plans for limited resources like vaccines or cures as well as to impose mobility restrictions.
Category: Statistics
[331] viXra:2310.0050 [pdf] submitted on 2023-10-10 22:03:45
Authors: Bukac Josef
Comments: 11 Pages. We use interpolation to get the starting values of parameters. Another paper about The singularity of the atrix appearing in the Gauss-Newton method will follow.
We describe models of proportions depending on some independent quantitative variables. An explicit formula for inverse matrices facilitatesinterpolation as a way to calculate the starting values for iterations in nonlinear regression with logistic functions or ratios of exponential functions.
Category: Statistics
[330] viXra:2310.0032 [pdf] submitted on 2023-10-06 15:52:16
Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.
In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches.
Category: Statistics
[329] viXra:2309.0010 [pdf] submitted on 2023-09-01 07:15:18
Authors: Randolph L. Gerl
Comments: 6 Pages.
Several traditional problems in probability theory are discussed and a resolution to them is proposed. The use of probability theory in the study of physical reality is contrasted with its use in pure mathematics and the latter is found to be problematic. The proposed resolution is postulated to work for all physical reality but is inclusive enough to cover many situations in pure mathematics.
Category: Statistics
[328] viXra:2308.0183 [pdf] submitted on 2023-08-27 16:13:11
Authors: Josef Bukac
Comments: 7 Pages. A paper on interpolation by generalized logistic functions will follow
We study the properties of regression coefficients when the sum of the dependent variables is one,ie, the dependent variables are compositional.We show that the sum of intercepts is equal toone and the sum of other corresponding regressioncoefficients is zero. We do it for simple linearregressions and also for a more general case usingmatrix notation. The last part treats the casewhen the dependent variables do not sum up to one. We simplify the well known formula derived by theuse of Lagrange multipliers.
Category: Statistics
[327] viXra:2307.0056 [pdf] submitted on 2023-07-11 16:50:28
Authors: L. Martino, P. Paradas, L. Carro, M. M. Garcia, C. Goicoechea, S. Ingrassi
Comments: 20 Pages.
Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scanning signal surface and intensity in whole slide imaging. In this work, we tackle the problem of counting microglial cells in biomedical images that represent lumbar spinal cord cross-sections of rats. Note that counting microglial cells is typically a time-consuming task, and additionally entail extensive personnel training. We skip the task of detecting the cells and we focus only on the counting problem. Firstly, a linear predictor is designed based on the information provided by filtered images, obtained applying color threshold values to the labelled images in thedataset. Non-linear extensions and other improvements are presented. The choice of the threshold values is also discussed. Different numerical experiments show the capability of the proposed algorithms. Furthermore, the proposed schemes could be applied to different counting problems of small objects in other types of images (from satellites, telescopes, and/or drones, to name a few).
Category: Statistics
[326] viXra:2306.0081 [pdf] submitted on 2023-06-14 03:36:54
Authors: Richard J. Mathar
Comments: 12 Pages.
The L1 distance between two points in a square lattice is the sum of horizontal and vertical absolute differences of the Cartesian coordinates and - as in graph theory - also the minimumnumber of edges to walk to reach one point from the other. The manuscript contains a Java program that computes in a finite square grid of fixed shapethe number of point pairs as a function of that distance.
Category: Statistics
[325] viXra:2305.0011 [pdf] submitted on 2023-05-03 01:21:57
Authors: Hakon Olav Torvik
Comments: 6 Pages.
The card game "war" is a simple game usually assumed to not include any element of strategy, only luck. I challenge this notion by noticing that the order of placing cards back into the deck can be used as a strategy. I simulate the game with different strategies, and find that the strategies can significantly increase the chances of winning, but usually increase the time it takes to complete the game. This is however dependent on your opponent using specific strategies. The best advice on strategy seems to be tricking your opponent into following an ordered strategy, while you use a random strategy, a strategy some might object to.
Category: Statistics
[324] viXra:2304.0006 [pdf] submitted on 2023-04-01 22:25:09
Authors: Robin Smith, Kristian C. Z. Haverson
Comments: 5 Pages.
In England, it is anecdotally remarked that the number of Greggs bakeries to be found in a town is a reliable measure of the area’s 'Northern-ness'. Conversely, a commercial competitor to Greggs in the baked goods and sandwiches market, Pret-a-Manger, is reputed to be popular in more 'southern' areas of England. Using a Support Vector Machine and an Artificial Neural Network (ANN) Regression Model, the relative geographical distributions of Greggs and Pret have been utilised for the first time to quantify the North-South divide in England. The calculated dividing lines were each compared to another line, based on Gross Domestic Household Income (GDHI). The lines match remarkably well, and we conclude that this is likely because much of England's wealth is concentrated in London, as are most of England's Pret-a-Manger shops. Further studies were conducted based on the relative geographical distributions of popular supermarkets Morrisons and Waitrose, which are also considered to have a North-South association. This analysis yields different results. For all metrics, the North-South dividing line passes close to the M1 Watford Gap services. As a common British idiom, this location is oft quoted as one point along the English North-South divide, and it is notable that this work agrees. This tongue-in-cheek analysis aims to highlight more serious factors highlighting the North-South divide, such as life expectancy, education, and poverty.
Category: Statistics
[323] viXra:2303.0043 [pdf] submitted on 2023-03-07 02:38:32
Authors: Johnny J. Mafra Jr.
Comments: 16 Pages.
A previous work on Covid-19 forecast miserably failed to preview the epidemic evolution with the massive vaccination done during 2021. This paper aims to workaround its weak point, which was to not consider immunity loss in its model. The set of SIR equations was reviewed including immunity loss, Beta profile was recalculated and the model was tuned using real data of 2021. This way was achieved a good conformance between the simulation and data, roughly within the calculated uncertainty of 25%. The simulation for 2022 presented Omicron peak but switched in time. The probable explanation for that is an unbalance in Beta profile in the beginning of 2022, resulting in a bigger peak in January and in consequence a smaller one latter, due to more immune people. It was explored the hypothesis of different immunity losses for natural and vaccine immunities. This case showed a theoretical profile similar to the real data observed. As a limit case theoretical study, was verified that the epidemic evolution in several years more similar to real data was the case in that the vaccination didn’t avoid transmission or avoid as little as 20%. Simulation showed, as expected, that if Beta is below some limit the epidemic vanishes. Data showed that Covid-19 seems to be naturally vanishing by itself, meaning that no measures so far were effective. New approaches are speculated to provide a better performance on epidemic combat based on ventilation and air sterilization using GUV. Suggestions on how to test those approaches are presented.
Category: Statistics
[322] viXra:2302.0081 [pdf] submitted on 2023-02-17 17:09:23
Authors: Joseph Palazzo
Comments: 4 Pages.
We establish that all the pertinent elements of an assertion must be real. That if it contains an element M which cannot be classified as real, we say that the assertion is contaminated. We then show that Bayes’ Theorem is invalid.
Category: Statistics
[321] viXra:2301.0134 [pdf] submitted on 2023-01-25 13:50:10
Authors: Kyumin Nam
Comments: 3 Pages.
Substances representing tier (Iron, Bronze, Silver, Gold, Platinum, Diamond) and its typical price (USD/gram) in several games using a tier system have a positive correlation [1, 2, 5].
Category: Statistics
[320] viXra:2212.0092 [pdf] submitted on 2022-12-09 13:55:42
Authors: Florentin Smarandache
Comments: 10 Pages.
In this paper we exemplify the types of Plithogenic Probability and respectively Plithogenic Statistics. Several applications are given. The Plithogenic Probability of an event to occur is composed from the chances that the event occurs with respect to all random variables (parameters) that determine it. Each such a variable is described by a Probability Distribution (Density) Function, which may be a classical, (T,I,F)-neutrosophic, I-neutrosophic, (T,F)-intuitionistic fuzzy, (T,N,F)-picture fuzzy, (T,N,F)-spherical fuzzy, or (other fuzzy extension) distribution function. The Plithogenic Probability is a generalization of the classical MultiVariate Probability. The analysis of the events described by the plithogenic probability is the Plithogenic Statistics.
Category: Statistics
[319] viXra:2209.0132 [pdf] submitted on 2022-09-23 13:33:45
Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 10 Pages.
We design an automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics
[318] viXra:2209.0123 [pdf] submitted on 2022-09-22 20:33:40
Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 20 Pages.
We introduce a generalized information criterion which contains other well-known information criteria, such as BIC and AIC, as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinalitythat often is much smaller than the total number of possible models. The elements of this subset are "elbows" of the error curve. A practical rule for selecting a unique model withinthe sets of elbows is suggested as well. Several experiments involving ideal scenarios, synthetic data and real data show the benefits of the proposed scheme. Matlab code related to theexperiments is available.
Category: Statistics
[317] viXra:2207.0168 [pdf] submitted on 2022-07-28 22:43:14
Authors: Anurag Dutta, Manan Roy Choudhury, Seemantini Chattopadhyay
Comments: 10 Pages. (A non-essential image deemed to be insensitive is blocked by viXra Admin)
Background: Suicide, the act of self-hurting or killing intentionally is in great spurt these days. It is the result of mental disorders resulting from depression, anxiety, or stress.
Methods: In this study, we have analyzed the dataset of suicide cases for one developing country - "Brazil", and one developed country - "Germany", and have used Statistical Methods, along with Machine Learning techniques to obtain a clear idea.
Results: We discovered that the Suicide Rate in Brazil is quite high in comparison to the Suicide Rate in Germany.
Conclusions: Our results provide a shred of evidence that the development status of the country, along with some more factors, like Per - Capita Income, Employment, Literacy, etc. in some way or the other affects the suicide rate of a country.
Category: Statistics
[316] viXra:2204.0154 [pdf] submitted on 2022-04-26 18:14:52
Authors: Minuk Choi
Comments: 7 Pages.
The proposition that is “the ratio of numbers that have an even number and odd number of prime factors none repeated is 50 : 50” is equivalence relation with Riemann hypothesis. I prove this proposition using the posterior distribution of discrete uniform distribution.
Category: Statistics
[315] viXra:2202.0089 [pdf] submitted on 2022-02-13 23:14:57
Authors: Gary J. Duggan
Comments: 5 Pages.
A simple coin toss game, attributed to Nicolaus Bernoulli in the early 1700s, results in a mathematical paradox which still appears to be subject to what might be described as "conceptual" rather than "mathematical" solutions. A mathematical solution is given showing that, if the number of games is 2^m-1 then the average payout per game for this number of games is m/(2-(1/2^(m-1))).
Category: Statistics
[314] viXra:2202.0084 [pdf] submitted on 2022-02-13 23:24:08
Authors: Kathy Dopp, Stephanie Seneff
Comments: 21 Pages.
As of 6 February 2022, based on publicly available official UK and US data, all age groups under 50 years old are at greater risk of fatality after receiving a COVID-19 inoculation than an unvaccinated person is at risk of a COVID-19 death. All age groups under 80 years old have virtually no benefit from receiving a COVID-19 inoculation, and the younger ages incur significant risk. This analysis is conservative because it ignores the fact that inoculation-induced adverse events such as thrombosis, myocarditis, Bell’s palsy, and other vaccine-induced injuries can lead to shortened life span. When one takes into consideration the fact that there is approximately a 90% decrease in risk of COVID-19 death if early treatment is provided to all symptomatic high-risk persons, one can only conclude that mandates of COVID-19 inoculations are ill-advised. Considering the emergence of antibody-resistant variants like Delta and Omicron, for most age groups COVID-19 vaccine inoculations result in higher death rates than COVID-19 does for the unvaccinated.
Category: Statistics
[313] viXra:2201.0152 [pdf] submitted on 2022-01-23 18:41:23
Authors: Robert Bennett
Comments: 6 Pages.
A quantitative test for the probability that two sets of photos are of the same woman.
The result for 7 facial characteristics in each photo is that the odds are 30 million to 1
that Lucy I and Lucy II are the same person.
Category: Statistics
[312] viXra:2112.0158 [pdf] submitted on 2021-12-30 18:14:01
Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 18 Pages.
In the last years, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a hot topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes).We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R2 > 0.86 and R2 > 0.63 (values obtained after a cross-validation procedure), respectively.
Category: Statistics
[311] viXra:2112.0058 [pdf] submitted on 2021-12-12 20:44:49
Authors: D Williams
Comments: 2 Pages.
A "Simpson's Rule"-like Ordered Sample Mean is compared with the standard version. It appears to be better at least for small sample sizes. A related integral approximation is also given and tested against the Mid Point Rule. Other Types of Ordered Sample Means need investigating.
Category: Statistics
[310] viXra:2112.0013 [pdf] submitted on 2021-12-02 02:52:21
Authors: Josef Bukac
Comments: 28 Pages.
We present a method of minimizing an objective function subject to an inequality constraint. It enables us to minimize the sum of squares of deviations in linear regression under inequality restrictions. We demonstrate how to calculate the coefficients of cubic function under the restriction that it is increasing, we also
mention how to fit a convex quartic polynomial.
We use such results for interpolation as a method for calculation of starting values for iterative methods of fitting some specific functions, such as four-parameter logistic, positive bi-exponential, or Gomperz functions. Curvature-driven interpolation enables such calculations for otherwise solutions to interpolation equations may not exist or may not be unique.
We also present examples to illustrate how it works and compare our approach with that of Zhang (2020).
Category: Statistics
[309] viXra:2111.0150 [pdf] submitted on 2021-11-28 14:29:35
Authors: F. Llorente, L. Martino, D. Delgado
Comments: 17 Pages.
The idea of using a path of tempered posterior distributions has been widely applied in the literature for the computation of marginal likelihoods (a.k.a., Bayesian evidence). Thermodynamic integration, path sampling and annealing importance sampling are well-known
examples of algorithms belonging to this family of methods. In this work, we introduce a generalized thermodynamic integration (GTI) scheme which is able to perform a complete Bayesian inference, i.e., GTI can approximate generic posterior exceptions (not only the
marginal likelihood). Several scenarios of application of GTI are discussed and different numerical simulations are provided.
Category: Statistics
[308] viXra:2111.0145 [pdf] submitted on 2021-11-28 17:11:33
Authors: L. Martino, V. Elvira
Comments: 11 Pages.
In this work, we analyze alternative e ective sample size (ESS) measures for importance sampling algorithms. More specifically, we study a family of ESS approximations introduced in [11]. We show that all the ESS functions included in this family (called Huggins-Roy's family) satisfy all the required theoretical conditions introduced in [17]. We also highlight the relationship of this family with the Renyi entropy. By numerical simulations, we study the performance of different ESS approximations introducing also an optimal linear combination of the most promising ESS indices introduced in literature. Moreover, we obtain the best ESS approximation within the Huggins-Roy's family, that provides almost a perfect match with the theoretical ESS values.
Category: Statistics
[307] viXra:2111.0012 [pdf] submitted on 2021-11-02 20:50:03
Authors: Zhijing Zhang, Yue Yu, Qinghua Ma, Haixiang Yao
Comments: 18 Pages.
In allusion to some contradicting results in existing research, this paper selects China's latest stock data from 2005 to 2020 for empirical analysis. By choosing this periods’ data, we avoid the periods of China's significant stock market reforms to reduce the impact of the government's policy on the factor effect. In this paper, the redundant factors (HML, CMA) are orthogonalized, and the regression analysis of 5*5 portfolio of Size-B/M and Size-Inv is carried out with these two orthogonalized factors. It found that the HML and the CMA are still significant in many portfolios, indicating that they have a strong explanatory ability, which is also consistent with the results of GRS test. All these show that the five-factor model has a better ability to explain the excess return rate. In the concrete analysis, this paper uses the methods of the five- factor 25-group portfolio returns calculation, the five-factor regression analysis, the orthogonal treatment, the five-factor 25-group regression and the GRS test to more comprehensively explain the excellent explanatory ability of the five-factor model to the excess return. Then, we analyze the possible reasons for the strong explanatory ability of the HML, CMA and RMW from the aspects of price to book ratio, turnover rate and correlation coefficient. We also give a detailed explanation of the results, and analyze the changes of China's stock market policy and investors' investment style recent years. Finally, this paper attempts to put forward some useful suggestions on the development of asset pricing model and China's stock market.
Category: Statistics
[306] viXra:2110.0128 [pdf] submitted on 2021-10-22 04:13:21
Authors: Deep Bhattacharjee
Comments: 22 Pages, 5 Figures, TechRxiv (Computations), Ergodic Theory
Time and space average of an ergodic systems following the 5-tuple relations (A,~,J,Σ,μ) through the initial increment from a+bθ to a+c+bθ indicates the entropy to be reserved in the deterministic yet dynamical and conservative systems to hold for the set S_p= S_1 ∑_(i=2)^∞_S_i keeping S as the entropy ∃(S_∞=⋯S_3=S_2 )>S_1 obeying the Poincare ́ recurrence theorem throughout the constant attractor A. This in turn states the facts of the equivalence closure as the property of the induced systems to resemblance an entropy conserving scenarios.
Category: Statistics
[305] viXra:2110.0032 [pdf] submitted on 2021-10-07 09:24:06
Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.
The application of Bayesian inference in physics for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics
[304] viXra:2109.0178 [pdf] submitted on 2021-09-24 07:34:10
Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 13 Pages.
Many applications in signal processing and machine learning require the study of probability density functions (pdfs) that can only be accessed through noisy evaluations. In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy
realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.
Category: Statistics
[303] viXra:2107.0131 [pdf] submitted on 2021-07-23 19:10:35
Authors: Glenn Healey, Shiyuan Zhao
Comments: 31 Pages.
An important and challenging problem in the evaluation of baseball players is the quan-
tification of batted-ball talent. This problem has traditionally been addressed using linear
regression on the value of a statistic derived from a set of observations. We use large sets
of trajectory measurements acquired by in-game sensors to show that the predictive value
of a batted ball depends on its physical properties. This knowledge is exploited to estimate batted-ball distributions defined over a multidimensional measurement space from observed
distributions by using regression parameters that adapt to batted ball properties. This
process is central to a new method for estimating batted-ball talent. The domain of the
batted-ball distributions is defined by a partition of measurement space that is selected to
optimize the accuracy of the estimates. We present examples illustrating facets of the new
approach and use a set of experiments to show that the new method generates estimates
that are significantly more accurate than those generated using current methods. The new
methodology supports the use of fine-grained contextual adjustments and we show that this
process further improves the accuracy of the technique.
Category: Statistics
[302] viXra:2107.0031 [pdf] submitted on 2021-07-05 20:36:40
Authors: Joh. J. Sauren, Aloys J. Sipers
Comments: 8 Pages. [Corrections made by viXra Admin to conform with the requirements of viXra.org]
In this article, the Matlab program proposed in the article viXra:2103.0018 is improved. Further, the constant d3 depends on the constants d2 and a3. Three theorems are stated on the generating functions for the constants d2 and a3. The first two theorems provide analytical expressions for these generating functions, whereas the third theorem relates them.
Category: Statistics
[301] viXra:2106.0144 [pdf] submitted on 2021-06-24 18:41:26
Authors: Igor B. Krasnyuk
Comments: 23 Pages.
An initial value boundary problem for the linear Schr ˙odinger equation with nonlinear functional boundary conditions is considered. It is shown that attractor of problem contains periodic piecewise constant functions on the complex plane with finite points of discontinuities on a period. The method of reduction of the problem to a system of integro-difference equations has been applied. Applications to optical resonators with
feedback has been considered. The elements of the attractor can be interpreted as white and black solitons in nonlinear optics.
Category: Statistics
[300] viXra:2106.0036 [pdf] submitted on 2021-06-07 20:38:46
Authors: Russell Leidich
Comments: 9 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]
There are many applications involving physical measurements which are expected to result in a probability density function (PDF) which is asymptotically Gaussian (normal) or lognormal. In the latter case, we can simply take the logs of the (positive) samples in order to obtain the former, so the math in this paper will focus exclusively on Gaussians.
For example, we would expect the distribution of radio power received at a dish to be lognormally distributed, given a sufficiently broad swath of sky to observe for a sufficiently long duration, and in the relative absence of terrestrial radio interference. However, if we were then to focus on a particular star system, the observed "experimental" PDF could substantially deviate from that "background" PDF. It might not even be lognormal if, for example, the star exhibits peaks in radio power at a few distinct frequencies.
It would therefore be useful to have a means to quantify the "surprise" factor of experimental PDFs relative to an established background PDF which is known to be, or be equivalent to, a Gaussian. If a given experimental PDF where also known to be Gaussian, then we could do this by employing the Kullback-Leibler (KL) divergence from one to the other, as Gupta appears to have done for the multidimensional case.
When the experimental PDF is not known to be Gaussian (or any PDF archetype, for that matter), the situation is more complicated, mainly because we are forced to deal with a real-valued set of samples ordered by increasing positivity -- a 1D point cloud, to be precise, although "vector" will suffice for brevity -- rather than an analytic function. Ranking the information cost of encoding such a vector, versus others arising from other experiments, under the prior assumption of the same background PDF, is the subject of this paper. We also investigate the question of ascertaining which background PDF is the most useful for the sake of discriminating anomalous from mundane experimental PDFs.
[299] viXra:2104.0046 [pdf] submitted on 2021-04-09 17:05:32
Authors: Johnny J. Mafra Jr.
Comments: 20 Pages.
It was researched and adopted a method to introduce a seasonal behavior on SIR model to study the dynamics of covid-19. This method is based on the calculation of β for each week of the year based on observed previous seasonal behavior for several countries and regions, which are the most affected in the world. Was also included in the model the vaccination, which will be a factor of major effect on this dynamic in 2021. The model was used to build a simulator and was done the determination of β and the forecast of covid-19 cases for USA, Brazil and India. β was found to range seasonally from 0,15 to 0,40 or from 0,10 to 0,80 depending on the region. It was found that vaccination will be very effective in reducing the cases in 2021 and that the herd immunity will be reached when around 55% of the population be immune. The simulation took to some unexpected findings, like the effect of lockdown in later waves of the epidemic and about the epidemic dynamics. It was found a condition for exogenic respiratory viruses that triggers a major epidemic and a condition that explains why a respiratory virus for which part of the population is already immune has a seasonal behavior, with a small number of cases. This dynamic explains the evolution of covid-19 in 2020 and 2021 and even the Spanish flu in 1918 and 1919.
Category: Statistics
[298] viXra:2103.0173 [pdf] submitted on 2021-03-27 02:04:39
Authors: Toshihiro Umehara
Comments: 8 Pages.
Data processing and data cleaning are essential steps before applying statistical or machine learning procedures. R provides a flexible way for data processing using vectors. R packages also provide other ways for manipulating data such as using SQL and using chained functions. I present yet another way to process data in a row by row manner using data manipulation oriented script, DataSailr script. This article introduces datasailr package, and shows potential benefits of using domain specific language for data processing.
Category: Statistics
[297] viXra:2103.0079 [pdf] submitted on 2021-03-12 01:15:46
Authors: Stephen P. Smith
Comments: 8 Pages.
The scale invariant prior is revisited, for a single variance parameter and for a variance-covariance matrix. These results are generalized to develop different scale invariant priors where probability measure is assigned through the sum of variance components that represent partitions of total variance, or through a sum of variance-covariance matrices representing partitions of a total variance-covariance matrix.
Category: Statistics
[296] viXra:2103.0018 [pdf] submitted on 2021-03-03 14:39:39
Authors: Joh. J. Sauren
Comments: 3 Pages.
In this communication a short and straightforward algorithm, written in Octave (version 6.1.0 (2020-11-26))/Matlab (version '9.9.0.1538559 (R2020b) Update 3'), is proposed for brute-force computation of the principal constants $d_{2}$ and $d_{3}$ used to calculate control limits for various types of variables control charts encountered in statistical process control (SPC).
Category: Statistics
[295] viXra:2103.0008 [pdf] submitted on 2021-03-02 17:19:51
Authors: L. Martino, V. Elvira, J. Lopez-Santiagoy G. Camps-Valls
Comments: 41 Pages.
In many inference problems, the evaluation of complex and costly models is often required. In this context, Bayesian methods have become very popular in several fields over the last years, in order to obtain parameter inversion, model selection or uncertainty quantification. Bayesian inference requires the approximation of complicated integrals involving (often costly) posterior distributions. Generally, this approximation is obtained by means of Monte Carlo (MC) methods. In order to reduce the computational cost of the corresponding technique, surrogate models (also called emulators) are often employed. Another alternative approach is the so-called Approximate Bayesian Computation (ABC) scheme. ABC does not require the evaluation of the costly model but the ability to simulate artificial data according to that model. Moreover, in ABC, the choice of a suitable distance between real and artificial data is also required. In this work, we introduce a novel approach where the expensive model is evaluated only in some well-chosen samples. The selection of these nodes is based on the so-called compressed Monte Carlo (CMC) scheme. We provide theoretical results supporting the novel algorithms and give empirical evidence of the performance of the proposed method in several numerical experiments. Two of them are real-world applications in astronomy and satellite remote sensing.
Category: Statistics
[294] viXra:2102.0094 [pdf] submitted on 2021-02-17 23:44:54
Authors: Fuming Lin, Yingying Jiang, Yong Zhou
Comments: 57 Pages.
This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models. We prove that the asymptotic covariance matrix of kth power expectile regression converges to that of quantile regression as k converges to one, and hence provide a moment estimator of asymptotic matrix of quantile regression. The kth power expectile regression is then utilized to test for homoskedasticity and conditional symmetry of the data. Detailed comparisons of the local power among the kth power expectile regression tests, the quantile regression test, and the expectile regression test have been provided. When the underlying distribution is not standard normal, results show that the optimal k are often larger than 1 and smaller than 2, which suggests the general kth power expectile regression is necessary.
Category: Statistics
[293] viXra:2102.0027 [pdf] submitted on 2021-02-05 13:05:30
Authors: Stephen P. Smith
Comments: 10 Pages.
Hamiltonian Markov Chain Monte Carlo is one of the established methods to conduct a Bayesian simulation. This method uses evaluations of the probability density and its gradient at particular variables. This present paper describes how to incorporate information from second derivatives that relate to a direction set, and describes how to modify the simulation accordingly.
Category: Statistics
[292] viXra:2102.0026 [pdf] submitted on 2021-02-05 22:05:06
Authors: Michaelino Mervisiano
Comments: 37 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]
On the 23rd June 2016, the United Kingdom (UK) European Union (EU) membership referendum resulted in 51.9% of voters voted to leave EU—popularly termed as Brexit. Given its significant implications, correctly predicting Brexit was crucial but most pollsters predicted incorrectly.
This paper assesses whether Brexit was evident and predictable from the pre-referendum polls data. Unlike previous studies—whose analytical tools are limited to latest poll analysis, descriptive statistics, point estimate, and simple linear regression—this project use more robust and sophisticated statistical methodologies
Category: Statistics
[291] viXra:2101.0082 [pdf] submitted on 2021-01-13 14:07:54
Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 89 Pages. [Corrections made to conform with the requirements on the Submission Form]
This book discusses the special case of Beta distribution as α = λ + 1 and β = 2 – λ. To compare with the continuous Bernoulli distribution, the change of λ affected the pdf of the special Beta distribution. Then find out the sufficient statistic, the point estimator, the confidence interval, the test statistic, and the goodness of fit. The special Beta distribution at the case of λ = 0.5 is different from the continuous Bernoulli distribution. The special Beta distribution pdf is changed in smoothing but the Continuous Bernoulli distribution pdf has a big wave when λ is from small to large. As the sample size becomes large, two distributions are approximated to Normal distribution with different relationships between λ and the sum of samples.
Category: Statistics
[290] viXra:2101.0046 [pdf] submitted on 2021-01-06 17:52:45
Authors: Glenn Healey, Lequan Wang
Comments: 19 Pages.
We use Hawk-Eye measurements to analyze the side force on a baseball.
Category: Statistics
[289] viXra:2101.0034 [pdf] submitted on 2021-01-05 09:18:19
Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 195 Pages.
This book provides four model designs to discuss how continuous Bernoulli distribution extends to the analysis of K categories. By contrast to the discrete polynomial distribution which is extended from Bernoulli distribution depending on the additive property, the random variable of continuous Bernoulli should be tested the pdf, cdf, distribution, and checked if maintain the characteristics of CB distribution or not. Model 1 is from random variable method(variable-added), Model 2 and 3 are from the probability model-building and suitable for the parameter-added or the conditional relationship of variables, respectively. Model 4 is from the continuous trinomial distribution and suitable for the joint relationship of variables.
Category: Statistics
[288] viXra:2012.0221 [pdf] submitted on 2020-12-30 12:07:53
Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 37 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]
We provide the mathematical deduction and numerical explanations to verify that as λ → 0, the continuous Bernoulli approximates to the exponential distribution in Chapter 1 and as λ → 0 and λ → 1, the continuous binomial distribution will approximate to Gamma distribution in Chapter 3. Meanwhile, Chapter 2 describes how to compute the continuous Binomial distribution which can be derived by the continuous Bernoulli.
Category: Statistics
[287] viXra:2012.0088 [pdf] submitted on 2020-12-12 09:51:59
Authors: Kuan-Sian Wang, Mei-Yu Lee
Comments: Pages.
We discussed the simulator and test statistic of continuous Bernoulli distribution which is important to test the pervasive error of variational autoencoders in deep learning. We provided the sufficient statistic, the point estimator, the confidence interval, test statistic, goodness of fit, and one-way test for continuous Bernoulli distribution. Besides, continuous binomial distribution can be derived, so the the confidence interval and the test can be worked under two continuous Bernoulli populations. Continuous trinomial distribution can also be find. Please download the computer software of this book from https://github.com/meiyulee/continuous_Bernoulli
Category: Statistics
[286] viXra:2012.0044 [pdf] submitted on 2020-12-07 13:36:14
Authors: Abdelmajid Ben Hadj Salem
Comments: 7 Pages. In French.
In this note, we give a proof of a theorem of Linnik concerning the theory of errors, stated in his book "Least squares method and the mathematical bases of the statistical theory of the treatment of observations", without proof.
Category: Statistics
[285] viXra:2012.0038 [pdf] submitted on 2020-12-06 14:50:31
Authors: L. Martino, J. Vicent, G. Camps-Valls
Comments: 5 Pages.
This paper introduces an automatic methodology to construct emulators for costly radiative transfer models (RTMs). The proposed method is sequential and adaptive, and it is based on the notion of the acquisition function by which instead of optimizing the unknown RTM underlying function we propose to achieve accurate approximations. The Automatic Gaussian Process Emulator (AGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions and flatness of the interpolation function. We illustrate the good capabilities of the method in toy examples and for the construction of an optimal look-up-table for atmospheric correction based on MODTRAN5.
Category: Statistics
[284] viXra:2012.0037 [pdf] submitted on 2020-12-06 19:04:22
Authors: L.Martino, D. Heestermans Svendsen, J. Vicent, G. Camps-Valls
Comments: 5 Pages.
Many fields of science and engineering require the use of complex and computationally expensive models to understand the involved processes in the system of interest. Nevertheless, due to the high cost involved, the required study becomes a cumbersome process. This paper introduces an interpolation procedure which belongs to the family of active learning algorithms, in order to construct cheap surrogate models of such costly complex systems. The proposed technique is sequential and adaptive, and is based on the optimization of a suitable acquisition function. We illustrate its efficiency in a toy example and for the construction of an emulator of an atmosphere modeling system.
Category: Statistics
[283] viXra:2012.0036 [pdf] submitted on 2020-12-06 19:06:41
Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.
Monte Carlo (MC) algorithms are widely used for Bayesian inference in statistics, signal processing, and machine learning. In this work, we introduce an Markov Chain Monte Carlo (MCMC) technique driven by a particle filter. The resulting scheme is a generalization of the so-called Particle Metropolis-Hastings (PMH) method, where a suitable Markov chain of sets of weighted samples is generated. We also introduce a marginal version for the goal of jointly inferring dynamic and static variables. The proposed algorithms outperform the corresponding standard PMH schemes, as shown by numerical experiments.
Category: Statistics
[282] viXra:2012.0035 [pdf] submitted on 2020-12-06 15:16:02
Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. Two well-known class of MC methods are the Importance Sampling (IS) techniques and the Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce the Group Importance Sampling (GIS) framework where different sets of weighted samples are properly summarized with one summary particle and one summary weight. GIS facilitates the design of novel efficient MC techniques. For instance, we present the Group Metropolis Sampling (GMS) algorithm which produces a Markov chain of sets of weighted samples. GMS in general outperforms other multiple try schemes as shown by means of numerical simulations.
Category: Statistics
[281] viXra:2012.0034 [pdf] submitted on 2020-12-05 11:18:45
Authors: D. Heestermans Svendsen, L. Martino, M. Campos-Taberner, G. Camps-Valls
Comments: 5 Pages.
Solving inverse problems is central in geosciences and remote sensing. Very often a mechanistic physical model of the system exists that solves the forward problem. Inverting the implied radiative transfer model (RTM) equations numerically implies, however, challenging and computationally demanding problems. Statistical models tackle the inverse problem and predict the biophysical parameter of interest from radiance data, exploiting either in situ data or simulated data from an RTM. We introduce a novel nonlinear and nonparametric statistical inversion model which incorporates both real observations and RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid framework for exploiting the regularities between the two types of data, in order to perform inverse modeling. Advantages of the JGP method over competing strategies are shown on both a simple toy example and in leaf area index (LAI) retrieval from Landsat data combined with simulated data generated by the PROSAIL model.
Category: Statistics
[280] viXra:2012.0033 [pdf] submitted on 2020-12-05 11:25:51
Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.
We introduce a Particle Metropolis-Hastings algorithm driven by several parallel particle filters. The communication with the central node requires the transmission of only a set of weighted samples, one per filter. Furthermore, the marginal version of the previous scheme, called Distributed Particle Marginal Metropolis-Hastings (DPMMH) method, is also presented. DPMMH can be used for making inference on both a dynamical and static variable of interest. The ergodicity is guaranteed, and numerical simulations show the
advantages of the novel schemes.
Category: Statistics
[279] viXra:2012.0032 [pdf] submitted on 2020-12-05 22:19:11
Authors: L. Martino, V. Laparra, G. Camps-Valls
Comments: 5 Pages.
Gaussian Processes (GPs) are state-of-the-art tools for regression. Inference of GP hyperparameters is typically done by maximizing the marginal log-likelihood (ML). If the data truly follows the GP model, using the ML approach is optimal and computationally efficient. Unfortunately very often this is not case and suboptimal results are obtained in terms of prediction error. Alternative procedures such as cross-validation (CV) schemes are often employed instead, but they usually incur in high computational costs. We propose a probabilistic version of CV (PCV) based on two different model pieces in order to reduce the dependence on a specific model choice. PCV presents the benefits from both approaches, and allows us to find the solution for either the maximum a posteriori (MAP) or the Minimum Mean Square Error (MMSE) estimators. Experiments in controlled situations reveal that the PCV solution outperforms ML for both estimators, and that PCV-MMSE results outperforms other traditional approaches.
Category: Statistics
[278] viXra:2012.0031 [pdf] submitted on 2020-12-05 22:21:01
Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.
Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning and statistics. The key point for the successful application of the Gibbs sampler is the ability to draw samples from the full-conditional probability density functions efficiently. In the general case this is not possible, so in order to speed up the convergence of the chain, it is required to generate auxiliary samples. However, such intermediate information is finally disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. Theoretical and exhaustive numerical comparisons show the validity of the approach.
Category: Statistics
[277] viXra:2012.0030 [pdf] submitted on 2020-12-05 22:23:48
Authors: D. Heestermans Svendsen, L. Martino, J. Vicent, G. Camps-Valls
Comments: 4 Pages.
This paper introduces a methodology to construct emulators of costly radiative transfer models (RTMs). The proposed methodology is sequential and adaptive, and it is based on the notion of acquisition functions in Bayesian optimization. Here, instead of optimizing the unknown underlying RTM function, one aims to achieve accurate approximations. The Automatic Multi-Output Gaussian Process Emulator (AMOGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions
and flatness of the interpolation function. We illustrate the promising capabilities of the method for the construction of an emulator for a standard leaf-canopy RTM.
Category: Statistics
[276] viXra:2011.0183 [pdf] submitted on 2020-11-26 11:08:04
Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper
A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics
[275] viXra:2011.0015 [pdf] submitted on 2020-11-02 21:38:17
Authors: Tsutomu Matsuura, Hiroshi Okumura, Saburou Saitoh
Comments: 21 Pages.
Professor Rolin Zhang kindly invited in The 6th Int'l Conference on Probability and Stochastic Analysis (ICPSA 2021), January 5-7, 2021 in Sanya, China as a Keynote speaker and so, we will state the basic interrelations with reproducing kernels and division by zero from the viewpoint of the conference topics. The connection with reproducing kernels and Probability and Stochastic Analysis are already fundamental and well-known, and so, we will mainly refer to the basic relations with our new division by zero $1/0=0/0=z/0=\tan(\pi/2) =\log 0 =0, [(z^n)/n]_{n=0} = \log z$, $[e^{(1/z)}]_{z=0} = 1$.
Category: Statistics
[274] viXra:2010.0257 [pdf] submitted on 2020-10-31 19:46:07
Authors: Russell Leidich
Comments: 9 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]
Hidden Markov models (HMMs) are a class of generative stochastic process models which seek to explain, in the simplest possible terms subject to inherent structural constraints, a set of equally long sequences (time series) of observations. Given such a set, an HMM can be trivially constructed which will reproduce the set exactly. Such an approach, however, would amount to verfitting the data, yielding a model that fails to generalize to new observations of the same physical system under analysis. It’s therefore important to consider the information cost (entropy) of describing the HMM itself – not just the entropy of reproducing the observations, which would be zero in the foregoing extreme case, but in general would be the negative log of the probability of such reproduction occurring by chance. The sum of these entropies would then be suitable for the purpose of ranking a set of candidate HMMs by their respective likelihoods of having actually generated the observations in the first place. To the author’s knowledge, however, no approach has yet been derived for the
purpose of measuring HMM entropy from first principles, which is the subject of this paper, notwithstanding the popular use of the Bayesian
information criterion (BIC) for this purpose.
Category: Statistics
[273] viXra:2010.0002 [pdf] submitted on 2020-10-01 10:42:20
Authors: Arturo Tozzi
Comments: 9 Pages.
Physical and biological phenomena are often portrayed in terms of random walks, white noise, Markov paths, stochastic trajectories with subsequent symmetry breaks. Here we show that this approach from dynamical systems theory is not
profitable when random walks occur in phase spaces of dimensions higher than two. The more the dimensions, the more the (seemingly) stochastic paths are constrained, because their trajectories cannot resume to the starting point. This means that high-dimensional tracks, ubiquitous in real world physical/biological phenomena, cannot be
operationally treated in terms of closed paths, symplectic manifolds, Betti numbers, Jordan theorem, topological vortexes. This also means that memoryless events disconnected from the past such as Markov chains cannot exist in high dimensions. Once expunged the operational role of random walks in the assessment of experimental phenomena, we take aim to somewhat “redeem” stochasticity. We suggest two methodological accounts alternative to random walks that partially rescue the operational role of white noise and Markov chains. The first option is to assess multidimensional systems in lower dimensions, the second option is to establish a different role for random walks. We diffusely describe the two alternatives and provide heterogeneous examples from boosting chemistry, tunneling nanotubes, backward entropy, chaotic attractors.
Category: Statistics
[272] viXra:2009.0135 [pdf] submitted on 2020-09-19 11:03:03
Authors: L. Martino, J. Read
Comments: 50 Pages.
The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics
[271] viXra:2009.0082 [pdf] submitted on 2020-09-12 12:49:17
Authors: Krish Bajaj
Comments: 10 Pages.
This paper aims to highlight the prominent position of statistics as a foundational pillar for descriptive and inferential statistical analysis to deduce underlying patterns in a population by looking at a sample drawn from the population. It focusses on the intuitive aspects of the statistical tools and its relevance and applicability .The paper concludes by highlighting some common misconceptions and misuse of statistics.
Category: Statistics
[270] viXra:2008.0131 [pdf] submitted on 2020-08-18 20:30:23
Authors: Glenn Healey, Lequan Wang
Comments: 24 Pages.
An accurate model for the lift force on a baseball is important for several applications. The precision of previous models has been limited by the use of small samples of measurements acquired in controlled experiments. The increased prevalence of ball-tracking radar systems provides an abundant source of data for modeling, but the effective use of these data requires overcoming several challenges. We develop a new model that uses this radar data and is constrained by the physical principles and measurements derived from the controlled experiments. The modeling process accounts for the uncertainty in different data sources while exploiting the size and diversity of the radar measurements to mitigate the effects of systematic biases, outliers, and the lack of geometric information that is typically available in controlled experiments. Fine-grained weather data is associated with each radar measurement to enable compensation for the local air density. We show that the new model is accurate enough to capture changes in lift due to small changes in surface roughness which could not be discerned by previous models.
Category: Statistics
[269] viXra:2008.0107 [pdf] submitted on 2020-08-15 11:37:29
Authors: Idd Sifael Omary, Ngong-homa Jackson, Timothy A. Peter
Comments: 35 Pages. BSc. (Mathematics and Statistics) Research Report Mwenge Catholic University July, 2016.
Completion Rate and Enrollment forecasting is an essential element in budgeting, resource allocation, and the overall planning for the growth of education sector. Our paper purposeful demonstrated the use of Markov chain techniques in studying progression of BSMST Programme Students from the time of entry/enrollment in each academic year to graduation after the expected year of study in MWECAU. The target population included all BSMST programme students in MWECAU from 2013 to 2015. The model used to determine the student’s completion/dropout rate, retention rate and the expected duration of completing by sex. We established the completion rates for male students and that of female students and dropout rates. We saw how long Markov Transition Probability Matrices of BSMST students at MWECAU will be at a steady state. How the established completion and dropout rates will be in Absorbing rates/States. Also we saw female expectation of university education compared to male students in BSMST Programme students. The model was only suitable in make a short period projections.
Category: Statistics
[268] viXra:2008.0065 [pdf] submitted on 2020-08-10 16:54:00
Authors: Philippe Hottier, Abdelmajid Ben Hadj Salem
Comments: 137 Pages. In French. Comments welcome.
It is a digital version of a manuscript of a course about the theory of errors given by the Engineer-in-Chief Philippe Hottier at the '80s, at the French National School of Geographic Sciences. The course gives the foundation of the method of the least squares for the case of linear models.
Category: Statistics
[267] viXra:2007.0240 [pdf] submitted on 2020-07-30 21:01:00
Authors: D Williams
Comments: 15 Pages.
An alternative model of probability theory is give and compared with the standard version. Difficulties in extending the Central Limit Theorem for sums of random variables (rather than averages) are shown then resolved using the new model and dx-less integrals.
Some new types of sample means are proposed and tested against the standard version.
Category: Statistics
[266] viXra:2006.0023 [pdf] submitted on 2020-06-03 09:40:46
Authors: Vikas Ramachandra
Comments: 14 Pages.
The exponential spread of the COVID-19 pandemic has caused countries to impose drastic measures on the public including social distancing, movement restrictions and lockdowns. These government interventions have led to different mobility patterns for the populations. We propose a method of causal inference using community mobility datasets to determine the treatment effects of government interventions on population mobility related outcomes. We first identify the changepoint based on the data of government interventions. We also perform changepoint detection to verify that there is indeed a changepoint at the time of intervention. Then we estimate the mobility trends using a Bayesian structural causal model and project the counterfactual. This is compared to the actual values after interventions to give the treatment effect of interventions. As a specific example, we analyze mobility trends in India before and after interventions. Our analysis shows that there are significant changes in mobility due to government interventions. Our paper aims to provide insights into changes in response to government measures and we hope that it is helpful to those making critical decisions to combat COVID-19.
Category: Statistics
[265] viXra:2006.0014 [pdf] submitted on 2020-06-01 12:18:06
Authors: Ilija Barukčić
Comments: 27 pages. (C) Ilija Barukčić, 2020, Jever, Germany. All rights reserved.
Aims: Different processes or events which are objectively given and real are equally one of the foundations of human life (necessary conditions) too. However, a generally accepted, logically consistent (bio)-mathematical description of these natural processes is still not in sight.
Methods: Discrete random variables are analysed.
Results: The mathematical formula of the necessary condition is developed. The impact of study design on the results of a study is considered.
Conclusion: Study data can be analysed for necessary conditions.
Category: Statistics
[264] viXra:2005.0215 [pdf] submitted on 2020-05-21 20:15:49
Authors: Huda E. Khalid, Ahmed K. Essa
Comments: 167 Pages. ISBN: 978-1-59973-906-9
على الرغم من أن الإحصاء النيوتروسوفكي قد تم تعريفه منذ العام 1996 ، ثم نشر في عام 1998 بالكتاب المعنون " النيوتروسوفيا/ المنطق، المجموعة والاحتمالية النيوتروسوفكية" إلاّ انه لم ينل حظاً من الاهتمام والتطور إلى يومنا هذا. وكذلك كان الحال مع الاحتمالية النيوتروسوفكية، باستثناء بعض المقالات المتفرقة التي حظيت بتطور بسيط لا يكاد يرتقي لشمولية الفكرة التي تقوم عليها ، وقد نشرت عام 2013 ضمن الكتاب المعنون " مقدمة في القياس، التكامل والاحتمالية النيوتروسوفكية".
يعد الإحصاء النيوتروسوفكي مفهوماً موسعاً للإحصاء التقليدي (الكلاسيكي)، إذ يتم فيه التعامل مع قيم ذات مجموعات بدلاً عن قيم هشة ، بحيث يكون من السهل في اغلب المعادلات والصيغ الإحصائية التقليدية استبدال عدَّة أعداد بمجاميع . أي أن العمليات ستجري على المجاميع بدلاً من إجراء العمليات على الأعداد ، وسيتم ذلك باستخدام المعلمات غير المعينة (غير الدقيقة، التي فيها لاتأكيد ، وحتى تلك التي تكون مجهولة تماماً) بدلاً من استخدام المعلمات الطبيعية المتعارف عليها في الإحصاء التقليدي.
Category: Statistics
[263] viXra:2005.0182 [pdf] submitted on 2020-05-17 18:03:58
Authors: Tobias Martens, Wieland Lühder
Comments: 3 Pages.
The life expectancy of the currently living German population is calculated per age and as weighted average. The same calculation is repeated after considering everyone infected with and potentially killed by SARS-CoV-2 within one year, given the current age-dependent lethality estimates from a study at London Imperial College [1]. For an average life expectancy of 83.0 years in the current population, the reduction due to SARS-CoV-2 infection amounts to 2.0 (1.1-3.9) months. The individual values show a maximum of 7.7 (4.4-15.2) months for a 70-year-old. People below age 50 loose less than 1 month in average.
Category: Statistics
[262] viXra:2004.0452 [pdf] submitted on 2020-04-19 11:42:28
Authors: Ilija Barukčić
Comments: 17 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.
Aim: The relationship between Epstein-Barr virus and multiple sclerosis is assessed once again in order to gain a better
understanding of this disease.
Methods: A systematic review and meta-analysis is provided aimed to answer among other the following question. Is there a
cause effect relationship between Epstein-Barr virus and multiple sclerosis? The conditio sine qua non relationship proofed the
hypothesis without an Epstein-Barr virus infection no multiple sclerosis. The mathematical formula of the causal relationship k
proofed the hypothesis of a cause effect relationship between Epstein-Barr virus infection and multiple sclerosis. Significance
was indicated by a p-value of less than 0.05.
Results: The data of the studies analysed provide evidence that an Epstein-Barr virus infection is a necessary condition
(a conditio sine qua non) of multiple sclerosis. In particular and more than that. The data of the studies analysed provided
impressive evidence of a cause-effect relationship between Epstein-Barr virus infection and multiple sclerosis.
Conclusion: Multiple sclerosis is caused by an Epstein-Barr virus infection.
Category: Statistics
[261] viXra:2004.0425 [pdf] submitted on 2020-04-17 13:15:53
Authors: Luca Martino
Comments: 7 Pages.
We propose a new Monte Carlo technique for Bayesian inversion problem. The power of the noise perturbation in the observation model is also estimated jointly with the rest of parameters. Moreover, it is also used as a tempered parameter. Hence, a sequence of tempered posterior densities is considered where the tempered parameter is automatically selected according to the actual estimation of the power of the noise perturbation.
Category: Statistics
[260] viXra:2004.0060 [pdf] submitted on 2020-04-02 23:05:18
Authors: Tao Guo
Comments: 11 Pages.
It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative
c2 −u2 relationship described by the Lorentz factor.
Category: Statistics
[259] viXra:2003.0340 [pdf] submitted on 2020-03-16 13:55:18
Authors: Glenn Healey
Comments: 20 Pages.
Outcome-based statistics for representing batter and pitcher skill have been shown to have a low degree of repeatability due to the effects of multiple confounding variables such as the defense, weather, and ballpark. Statistics derived from pitch and hit-tracking data acquired by the Statcast system have been shown to provide greater repeatability and predictive value than outcome-based statistics. The wOBA cube representation uses three-dimensional hit-tracking data to compute intrinsic batted ball statistics for batters and pitchers. While providing more reliable measures than outcome-based statistics, this representation also revealed that running speed is an important determinant of batter success. We address this issue by building a four-dimensional model for a batted ball's value as a function of its physical contact parameters and the batter's time-to-first speed.
Category: Statistics
[258] viXra:2002.0368 [pdf] submitted on 2020-02-19 13:31:17
Authors: Ilija Barukčić
Comments: 9 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.
Many different measures of association are used by medical literature, the relative risk is one of these measures. However, to judge whether results of studies are reliable, it is essential to use among other measures of association which are logically consistent. In this paper, we will present how to deal with one of the most commonly used measures of association, the relative risk. The conclusion is inescapable that the relative risk is logically inconsistent and should not be used any longer.
Category: Statistics
[257] viXra:2001.0650 [pdf] submitted on 2020-01-29 12:50:56
Authors: Abdelmajid Ben Hadj Salem
Comments: 9 Pages. In French.
In this paper, we present an example of the use of the least-squares method in topographic and surveying works.
Category: Statistics
[256] viXra:2001.0052 [pdf] submitted on 2020-01-04 16:39:29
Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.
This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability
models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benets, connections and differences among the dierent techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics
[255] viXra:2001.0003 [pdf] submitted on 2020-01-01 06:38:26
Authors: Robert A. Herrmann
Comments: 10 Pages.
In this paper, we show how nonstandard consequence operators, ultralogics, can generate the general informational content displayed by probability models. In particular, a model that states a specific probability that an event will occur and those models that use a specific distribution to predict that an event will occur. These results have many diverse applications and even apply to the collapse of the wave function.
Category: Statistics
[254] viXra:1912.0129 [pdf] submitted on 2019-12-06 12:00:41
Authors: Ilija Barukčić
Comments: 29 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.
Objective: To date, it is quite common to claim that some patient groups benefit from statin therapy in both primary and secondary prevention of cardiovascular disease while equally the use of higher-intensity statin therapies is emphasized. In this Review, the efficacy of statin therapy in light of the study data available is explored.
Methods: All in all, 40 studies with a sample size of n = 88388 were re-analyzed. The exclusion relationship was used to test the null-hypothesis: a certain statin does exclude death due to any cause. The causal relationship k was used to test the data for causality. The level of significance was set to Alpha = 0,05.
Results: The data of the studies reanalyzed provide convincing evidence that statins unfortunately do not exclude death due to any cause. An immediate statin therapy discontinuation should be considered.
Conclusions: Overwhelming evidence suggests that the risk potential harmful effects of statin therapy far outweigh any real or perceived benefit.
Keywords: Statins, death, causal relationship.
Barukcic@t-online.de
Category: Statistics
[253] viXra:1911.0237 [pdf] submitted on 2019-11-13 13:18:58
Authors: Ilija Barukčić
Comments: 26 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.
Objective: To our knowledge, no study has provided strict evidence of a clear relationship between a human cytomegalovirus (HCMV) infection and human essential hypertension (EH).
Methods: To examine the possible role of HCMV in the etiology of EH, a literature searched through the electronic database PubMed was performed. Data were accurately assessed and re-analyzed by new statistical methods.
Results: The meta-analysis results of this study provide evidence that HCMV infection and essential hypertension are connected.
Conclusions: Without HCMV infection no EH.
Keywords: Human cytomegalovirus, essential hypertension, causal relationship.
Category: Statistics
[252] viXra:1911.0184 [pdf] submitted on 2019-11-10 04:05:32
Authors: Ilija Barukčić
Comments: 17 pages. Copyright © 2018 by Ilija Barukčić, Jever, Germany. Published by:
Objective: Despite decades of research and major efforts, a cause or the cause of schizophrenia is still not identified. Although many studies indicate that infectious agents are related to schizophrenia no definite consensus has been reached on this issue.
Methods: The purpose of this study was to investigate relationship between varicella zoster virus (VZS) and schizophrenia while relying on new statistical methods.
Results: The meta-analysis results provide striking evidence that VZV is a necessary condition of schizophrenia.
Conclusions: There is some weak evidence that VZV infection is the cause of schizophrenia.
Keywords: Varicella zoster virus, schizophrenia, causal relationship.
Category: Statistics
[251] viXra:1911.0024 [pdf] submitted on 2019-11-01 11:17:10
Authors: Ilija Barukčić
Comments: 21 pages.
Objective: Sometimes there are circumstances where it is necessary to calculate the P Value of extremely events xt like p(xt) = 1 while reliable methods are rare.
Methods: A systematic approach to the problem of the P Values of extremely events is provided.
Results: New theorems for calculating P Values of extremely likely events are developed. Conclusions: It is possible to calculate the P Values even of extreme events.
E-mail: Barukcic@t-online.de
Keywords: P Value, likely events, cause, effect, causal relationship.
Category: Statistics
[250] viXra:1910.0656 [pdf] submitted on 2019-10-31 17:01:19
Authors: Kouider Mohammed Ridha
Comments: 6 Pages.
The general Pareto distribution (GPD) has been widely used a lot in the extreme value for example to model
exceedance over a threshold. Feature of The GPD that when applied to real data sets depends substantially and clearly on the
parameter estimation process. Mostly the estimation is preferred by maximum likelihood because have a consistent estimator
with lowest bias and variance. The objective of the present study is to develop efficient estimation methods for the maximum
likelihood estimator for the shape parameter or extreme value index. Which based on the numerical methods for maximizing
the log-likelihood by introduce an algorithm for computing maximum likelihood estimate of The GPD parameters. Finally, a
numerical examples are given to illustrate the obtained results, they are carried out to investigate the behavior of the method
Category: Statistics
[249] viXra:1910.0219 [pdf] submitted on 2019-10-13 10:03:32
Authors: Ilija Barukčić
Comments: 46 Pages.
Objective: The possible involvement of viruses, specifically Herpes simplex virus type 1 (HSV-1), in senile dementia of the Alzheimer type has been investigated by numerous publications. Over 120 publications are providing direct or indirect evidence of a potential relationship between Herpes simplex virus type 1 and Alzheimer’s disease (AD) but a causal relation is still not established yet.
Methods: A systematic review and re-analysis of studies which investigated the relationship between HSV-1 and AD by HSV-1 immunoglobulin G (IgG) serology and polymerase chain reaction (PCR) methods was conducted. The method of the conditio sine qua non relationship (SINE) was used to proof the hypothesis: without HSV-1 infection of human brain no AD. The method of the conditio per quam relationship (IMP) was used to proof the hypothesis: if HSV-1 infection of human brain then AD. The mathematical formula of the causal relationship k was used to proof the hypotheses is, whether there is a cause-effect relationship between HSV-1 and AD. Significance was indicated by a p-value of less than 0.05.
Results: The studies analyzed were able to provide strict evidence that HSV-1 is a necessary condition (a conditio sine qua non), a sufficient condition and a necessary and sufficient condition of AD. Furthermore, the cause-effect relationship between HSV-1 and AD was highly significant.
Conclusions: The data analyzed provide sufficient evidence to conclude that HSV-1 is the cause of AD.
Keywords: Herpes simplex virus type 1, Alzheimer’s disease, causal relationship.
Category: Statistics
[248] viXra:1909.0376 [pdf] submitted on 2019-09-17 06:56:34
Authors: Fabrice J. P. R. Pautot
Comments: 17 Pages.
We present a simple, fully probabilistic, Bayesian solution to k-sample omnibus tests for comparison, with the Behrens-Fisher problem as a special case, which is free from the many defects found in the standard, classical frequentist, likelihoodist and Bayesian approaches to those problems. We solve the main measure-theoretic difficulty for degenerate problems with continuous parameters of interest and Lebesgue-negligible point null hypothesis by approximating the corresponding continuous random variables by sequences of discrete ones defined on partitions of the parameter spaces and by taking the limit of the prior-to-posterior ratios of the probability of the null hypothesis for the corresponding discrete problems. Those limits are well defined under proper technicalities thanks to the Henstock-Kurzweil integral that is as powerful as the Lebesgue integral but still relies on Riemann sums, which are essential in the present approach. The solutions to the relative continuous problems take the form of Bayes-Poincaré factors that are new objects in Bayesian probability theory and should play a key role in the general theory of point null hypothesis testing, including other important problems such as the Jeffreys-Lindley paradox.
Category: Statistics
[247] viXra:1908.0288 [pdf] submitted on 2019-08-15 11:43:49
Authors: Glenn Healey, Shiyuan Zhao
Comments: 16 Pages.
We present a method for learning a function over distributions. The method is based on generalizing nonparametric kernel regression by using the earth mover's distance as a metric for distribution space. The technique is applied to the problem of learning the dependence of pitcher performance in baseball on multidimensional pitch distributions that are controlled by the pitcher. The distributions are derived from sensor measurements that capture the physical properties of each pitch. Finding this dependence allows the recovery of optimal pitch frequencies for individual pitchers. This application is amenable to the use of signatures to represent the distributions and a whitening step is employed to account for the correlations and variances of the pitch variables. Cross validation is used to optimize the kernel smoothing parameter. A set of experiments demonstrates that the method accurately predicts changes in pitcher performance in response to changes in pitch distribution.
Category: Statistics
[246] viXra:1907.0430 [pdf] submitted on 2019-07-24 05:53:26
Authors: Ilija Barukčić
Comments: 24 pages. Copyright © 2019 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:
Objective. Under certain circumstances, the results of multiple investigations – particularly, rigorously-designed trials, can be summarized by systematic reviews and meta-analyses. However, the results of properly conducted meta-analyses can but need not be stronger than single investigations, if (publication) bias is not considered to a necessary extent.
Methods. In assessing the significance of publication bias due to study design simple to handle statistical measures for quantifying publication bias are developed and discussed which can be used as a characteristic of a meta-analysis. In addition, these measures may permit comparisons of publication biases between different meta-analyses.
Results. Various properties and the performance of the new measures of publication bias are studied and illustrated using simulations and clearly described thought experiments. As a result, individual studies can be reviewed with a higher degree of certainty.
Conclusions. Publication bias due to study design is a serious problem in scientific research, which can affect the validity and generalization of conclusions. The index of unfairness and the index of independence are of use to quantify publication bias and to improve the quality of systematic reviews and meta-analyses.
Keywords: study design, study type, measuring technique, publication bias
Category: Statistics
[245] viXra:1907.0077 [pdf] submitted on 2019-07-04 06:22:03
Authors: Jianwen Huang, Xinling Liu, Jianjun Wang
Comments: 13 Pages.
Generalized Maxwell distribution is an extension of the classic Maxwell distribution. In this paper, we concentrate on the joint distributional asymptotics of normalized maxima and minima. Under optimal normalizing constants, asymptotic
expansions of joint distribution and density for normalized partial maxima and minima are established. These expansions are used to educe speeds of convergence of joint distribution and density of normalized maxima and minima tending to its corresponding ultimate limits. Numerical analysis are provided to support
our results.
Category: Statistics
[244] viXra:1906.0370 [pdf] submitted on 2019-06-19 09:29:49
Authors: Rafif Alhabib, Moustafa Mzher Ranna, Haitham Farah, A.A. Salama
Comments: 169 Pages.
والجوهر الأساسي لبحثنا هو تطبيق منطق النيتروسوفيك على جزء من نظرية الاحتمالات الكلاسيكية وذلك من
خلال تقديم الاحتمال الكلاسيكي وبعض التوزيعات الاحتمالية وفق منطق النيتروسوفيك ومن ثم د ا رسة أثر استخدام
هذا المنطق على عملية اتخاذ الق ا رر مع المقارنة المستمرة بين المنطق الكلاسيكي ومنطق النيتروسوفيك من خلال
الد ا رسات والنتائج.
تضم الأطروحة خمسة فصول
Category: Statistics
[143] viXra:2406.0055 [pdf] replaced on 2024-06-26 08:31:05
Authors: L. Martino, F. Llorente
Comments: 22 Pages.
Improper priors are not allowed for the computation of the Bayesian evidence Z = p(y) (a.k.a., marginal likelihood), since in this case Z is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name "fake evidences" (or "areas under the likelihood" in the case of uniform improper priors). We also show that, in this model selection scenario, using a use prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Category: Statistics
[142] viXra:2310.0032 [pdf] replaced on 2024-08-04 20:54:26
Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 45 Pages.
In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches.
Category: Statistics
[141] viXra:2310.0032 [pdf] replaced on 2024-02-06 21:09:42
Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.
In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches
Category: Statistics
[140] viXra:2209.0132 [pdf] replaced on 2023-06-14 09:56:05
Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023
We design a Universal Automatic Elbow Detector (UAED) for deciding the effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, and dimension reduction. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics
[139] viXra:2209.0132 [pdf] replaced on 2023-06-06 15:06:55
Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023.
We design a Universal Automatic Elbow Detector (UAED) for deciding the effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, and dimension reduction. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics
[138] viXra:2209.0132 [pdf] replaced on 2022-10-11 11:49:06
Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.
We design a universal automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not requirethe knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages ofthe proposed scheme with benchmark techniques in the literature.
Category: Statistics
[137] viXra:2209.0132 [pdf] replaced on 2022-10-09 09:46:48
Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.
We design a universal automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not requirethe knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages ofthe proposed scheme with benchmark techniques in the literature.
Category: Statistics
[136] viXra:2209.0123 [pdf] replaced on 2023-06-06 14:57:34
Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 22 Pages. (to appear) Expert Systems With Applications, 2023.
We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion(SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows" of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC inseveral numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
Category: Statistics
[135] viXra:2204.0074 [pdf] replaced on 2022-07-17 15:30:07
Authors: Sheng-Ping Wu
Comments: 12 Pages.
This article try to unified the four basic forces by Maxwell equations, the only experimental theory. Self-consistent Maxwell equations with the e-current coming from matter current is proposed, and is solved to electrons and the structures of particles and atomic nucleus. The static properties and decay are reasoned, all meet experimental data. The equation of general relativity sheerly with electromagnetic field is discussed as the base of this theory. In the end the conformation elementarily between this theory and QED and weak theory is discussed.
Category: Statistics
[134] viXra:2201.0152 [pdf] replaced on 2022-02-13 20:15:49
Authors: Robert Bennett
Comments: 6 Pages.
A quantitative test for the probability that two sets of photos are of the same woman.
The result for 7 facial characteristics in each photo is that the odds are 13 million to 1
that Lucy I and Lucy II are the same person.
Category: Statistics
[133] viXra:2112.0158 [pdf] replaced on 2022-07-17 10:28:12
Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 26 Pages. (to appear)) IEEE Transactions on Audio, Speech and Language Processing
In the last years, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a hot topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes).We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R2 > 0.86 and R2 > 0.63 (values obtained after a cross-validation procedure), respectively.
Category: Statistics
[132] viXra:2110.0032 [pdf] replaced on 2022-06-10 12:08:11
Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.
The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics
[131] viXra:2110.0032 [pdf] replaced on 2022-05-11 12:51:09
Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.
The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics
[130] viXra:2110.0032 [pdf] replaced on 2022-03-23 12:51:19
Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 34 Pages.
The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics
[129] viXra:2110.0032 [pdf] replaced on 2021-11-07 07:59:55
Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.
The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics
[128] viXra:2109.0178 [pdf] replaced on 2022-01-13 03:48:54
Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 14 Pages. Signal Processing, Volume 194, 2022, 108455 - doi:10.1016/j.sigpro.2022.108455
Many applications in signal processing and machine learning require the study of probability density functions (pdfs) that can only be accessed through noisy evaluations. In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.
Category: Statistics
[127] viXra:2011.0183 [pdf] replaced on 2021-01-29 20:41:23
Authors: Sergey L. Cherkas
Comments: 7 Pages.
Usually, one wants to have a simple picture of the trustworthiness of the main elections result. However, in some situations only partial information about the elections is available. Here we suggest some criterion of comparing of the available information with the official results. One of the criterions consists in comparison of the mean value over available sample with the
official mean value. A Monte Carlo simulation is performed to calculate a probability of the difference between the average value in some random sample and the average over the total set.
Another method is an analysis of the nature of the peculiarities in the probability distribution functions consisting in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
The last criterion is rather esthetic than exposing. It could be applied to arbitrary elections systems such as United Kingdom or
United States if one wants to extract the main result in a few pictures.
Category: Statistics
[126] viXra:2011.0183 [pdf] replaced on 2020-12-02 10:52:49
Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper
A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics
[125] viXra:2011.0183 [pdf] replaced on 2020-12-01 11:48:00
Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper
A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics
[124] viXra:2009.0135 [pdf] replaced on 2021-07-11 15:23:07
Authors: L. Martino, J. Read
Comments: L. Martino, J. Read, "A Joint introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman filtering and other Kernel Smoothers", Information Fusion, Volume 74, Pages 17-38, 2021
The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical
understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics
[123] viXra:2009.0135 [pdf] replaced on 2021-03-24 18:50:49
Authors: L. Martino, J. Read
Comments: 52 Pages. (to appear) Information Fusion
The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics
[122] viXra:2009.0135 [pdf] replaced on 2020-09-21 16:45:05
Authors: L. Martino, J. Read
Comments: 50 Pages.
The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics
[121] viXra:2004.0425 [pdf] replaced on 2021-02-27 09:46:18
Authors: L. Martino, F. Llorente, E. Curbelo, J. Lopez-Santiago, J. Miguez
Comments: 18 Pages.
We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. A complete Bayesian study over the model parameters and the scale parameter can be also performed. Numerical experiments show the benefits of the proposed approach.
Category: Statistics
[120] viXra:2004.0425 [pdf] replaced on 2020-09-06 08:24:05
Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.
We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics
[119] viXra:2004.0425 [pdf] replaced on 2020-09-03 11:41:36
Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.
We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of
tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics
[118] viXra:2004.0060 [pdf] replaced on 2020-04-22 09:01:21
Authors: Tao Guo
Comments: 14 Pages.
It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative
c2 −u2 relationship described by the Lorentz factor.
Category: Statistics
[117] viXra:2004.0060 [pdf] replaced on 2020-04-14 08:41:36
Authors: Tao Guo
Comments: 14 Pages.
It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative
c2 −u2 relationship described by the Lorentz factor.
Category: Statistics
[116] viXra:2002.0368 [pdf] replaced on 2020-02-29 11:03:38
Authors: Ilija Barukčić
Comments: 10 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.
Many different measures of association are used by medical literature, the relative risk is one of these measures. However, to judge whether results of studies are reliable, it is essential to use among other measures of association which are logically consistent. In this paper, we will present how to deal with one of the most commonly used measures of association, the relative risk. The conclusion is inescapable that the relative risk is logically inconsistent and should not be used any longer.
Category: Statistics
[115] viXra:2001.0052 [pdf] replaced on 2021-02-06 13:32:52
Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 91 Pages.
This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics
[114] viXra:2001.0052 [pdf] replaced on 2020-05-18 05:13:39
Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.
This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics
[113] viXra:2001.0052 [pdf] replaced on 2020-05-15 16:58:59
Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.
This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared
through theoretical comparisons and numerical experiments.
Category: Statistics
[112] viXra:1910.0219 [pdf] replaced on 2019-10-15 00:13:36
Authors: Ilija Barukčić
Comments: 35 Pages.
Objective: The possible involvement of viruses, specifically Herpes simplex virus type 1 (HSV-1), in senile dementia of the Alzheimer type has been investigated by numerous publications. Over 120 publications are providing direct or indirect evidence of a potential relationship between Herpes simplex virus type 1 and Alzheimer’s disease (AD) but a causal relation is still not established yet.
Methods: A systematic review and re-analysis of studies which investigated the relationship between HSV-1 and AD by HSV-1 immunoglobulin G (IgG) serology and polymerase chain reaction (PCR) methods was conducted. The method of the conditio sine qua non relationship (SINE) was used to proof the hypothesis: without HSV-1 infection of human brain no AD. The method of the conditio per quam relationship (IMP) was used to proof the hypothesis: if HSV-1 infection of human brain then AD. The mathematical formula of the causal relationship k was used to proof the hypotheses is, whether there is a cause-effect relationship between HSV-1 and AD. Significance was indicated by a p-value of less than 0.05.
Results: The studies analyzed were able to provide strict evidence that HSV-1 is a necessary condition (a conditio sine qua non), a sufficient condition and a necessary and sufficient condition of AD. Furthermore, the cause-effect relationship between HSV-1 and AD was highly significant.
Conclusions: The data analyzed provide sufficient evidence to conclude that HSV-1 is the cause of AD.
Keywords: Herpes simplex virus type 1, Alzheimer’s disease, causal relationship.
Category: Statistics
[111] viXra:1909.0376 [pdf] replaced on 2019-09-24 07:16:46
Authors: Fabrice J.P.R. Pautot
Comments: 17 Pages.
We present a simple, fully probabilistic, Bayesian solution to -sample omnibus tests for comparison, with the Behrens-Fisher problem as a special case, which is free from the many defects found in the standard, classical, frequentist, likelihoodist and Bayesian approaches to those problems. We solve the main measure-theoretic difficulty for degenerate problems with continuous parameters of interest and Lebesgue-negligible point null hypothesis by approximating the corresponding continuous random variables by sequences of discrete ones defined on partitions of the parameter spaces and by taking the limit of the prior-to-posterior ratios of the probability of the null hypothesis for the corresponding discrete problems. Those limits are well defined under proper technicalities thanks to the Henstock-Kurzweil integral that is as powerful as the Lebesgue integral but still relies on Riemann sums, which are essential in the present approach. The solutions to the relative continuous problems take the form of Bayes-Poincaré factors that are new objects in Bayesian probability theory and should play a key role in the general theory of point null hypothesis testing, including other important problems such as the Jeffreys-Lindley paradox.
Category: Statistics