**Previous months:**

2010 - 1003(10) - 1004(7) - 1005(4) - 1006(1) - 1007(2) - 1008(4) - 1010(1) - 1011(1)

2011 - 1105(2) - 1107(1) - 1111(1) - 1112(1)

2012 - 1203(1) - 1204(2) - 1205(1) - 1208(1) - 1210(1) - 1211(6) - 1212(1)

2013 - 1301(2) - 1304(3) - 1306(2) - 1307(1) - 1310(2)

2014 - 1402(1) - 1403(3) - 1404(2) - 1405(2) - 1409(4) - 1410(4) - 1411(13) - 1412(4)

2015 - 1503(1) - 1505(2) - 1506(2) - 1507(3) - 1508(3) - 1509(1) - 1511(3) - 1512(6)

2016 - 1601(6) - 1602(3) - 1603(4) - 1604(2)

Any replacements are listed further down

[128] **viXra:1604.0302 [pdf]**
*submitted on 2016-04-22 01:25:58*

**Authors:** Bradly Alicea

**Comments:** 13 pages, 7 Figures, 2 Supplemental Figures. Full dataset can be found at doi:10.6084/m9.figshare.944542

What makes a good prediction good? Generally, the answer is thought to be a faithful accounting of both tangible and intangible factors. Among sports teams, it is thought that if you get enough of the tangible factors (e.g. roster, prior performance, schedule) correct, then the predictions will be correspondingly accurate. While there is a role for intangible factors, they are thought to gum up the works, so to speak. Here, I start with the hypothesis that the best and worst teams in a league or tournament are easy to predict relative to teams with average performance. Data from the 2013 MLB and NFL seasons plus data from the 2014 NCAA Tournament were used. Using a model-free approach, data representing various aspects of competition reveal that mainly the teams predicted to perform the worst actually conform to expectation. The reasons for this are then discussed, including the role of shot noise on performance driven by tangible factors.

**Category:** Statistics

[127] **viXra:1604.0009 [pdf]**
*submitted on 2016-04-01 12:11:19*

**Authors:** Ioannis Koukoutsidis

**Comments:** 28 pages, 8 figures

Mobile crowdsensing can facilitate environmental surveys by leveraging sensor equipped mobile devices that carry out measurements covering a wide area in a short time, without bearing the costs of traditional field work. In this paper, we
examine statistical methods to perform an accurate estimate of the mean value of an environmental parameter in a region, based on such measurements. The main focus is on estimates produced by taking a "snapshot" of the mobile device readings at a random instant in time. We compare stratified sampling with different stratification weights to sampling without stratification, as well as an appropriately modified version of systematic sampling. Our main result is that stratification with weights proportional to stratum areas can produce significantly smaller bias, and gets arbitrarily close to the true area average as the number of mobiles and the number of strata increase. The performance of the methods is evaluated for an application scenario where we estimate the mean area temperature in a linear region that exhibits the so-called *Urban Heat Island* effect, with mobile users moving in the region according to the Random Waypoint Model.

**Category:** Statistics

[126] **viXra:1603.0252 [pdf]**
*submitted on 2016-03-17 17:00:15*

**Authors:** Glenn Healey

**Comments:** 4 Pages.

This file contains an intrinsic contact list for batters.

**Category:** Statistics

[125] **viXra:1603.0251 [pdf]**
*submitted on 2016-03-17 17:02:40*

**Authors:** Glenn Healey

**Comments:** 3 Pages.

This file contains an intrinsic contact list for pitchers.

**Category:** Statistics

[124] **viXra:1603.0215 [pdf]**
*submitted on 2016-03-14 21:01:06*

**Authors:** Glenn Healey

**Comments:** 7 Pages.

Given a set of observed batted balls and their outcomes, we develop a method for learning
the dependence of a batted ball’s intrinsic value on its measured parameters.

**Category:** Statistics

[123] **viXra:1603.0180 [pdf]**
*submitted on 2016-03-11 17:50:17*

**Authors:** L. Martino, J. Plata, F. Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo
scheme for diffusion estimation, where global and local parameters are involved in a unique inference problem. This
scenario often appears in distributed inference problems in
wireless sensor networks. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion for obtaining an efficient estimation of the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework. In order to apply the novel scheme, the only assumption required about the model is that the measurements are conditionally independent given the related parameters.

**Category:** Statistics

[122] **viXra:1602.0333 [pdf]**
*submitted on 2016-02-25 18:17:42*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 5 Pages.

The Sequential Importance Resampling (SIR) method is the core of the Sequential Monte Carlo (SMC) algorithms (a.k.a., particle filters). In this work, we point out a suitable choice for weighting properly a resampled particle. This observation entails several theoretical and practical consequences, allowing also the design of novel sampling schemes. Specifically, we describe one theoretical result about the sequential estimation of the marginal likelihood. Moreover, we suggest a novel resampling procedure for SMC algorithms called partial resampling, involving only a subset of the current cloud of particles. Clearly, this scheme attenuates the additional variance in the Monte Carlo estimators generated by the use of the resampling.

**Category:** Statistics

[121] **viXra:1602.0112 [pdf]**
*submitted on 2016-02-09 14:48:10*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In IS context, an approximation of the theoretical ESS definition is widely applied, $\widehat{ESS}$, involving the sum of the squares of the normalized importance weights. This formula $\widehat{ESS}$ has become an essential piece within Sequential Monte Carlo (SMC) methods using adaptive resampling procedures. The expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[120] **viXra:1602.0053 [pdf]**
*submitted on 2016-02-05 03:06:47*

**Authors:** Jason Lind

**Comments:** 2 Pages. Very early stages

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[119] **viXra:1601.0179 [pdf]**
*submitted on 2016-01-16 22:40:19*

**Authors:** D. Luengo, L. Martino, V. Elvira, M. Bugallo

**Comments:** 22 Pages.

Many signal processing applications require performing statistical inference on large datasets, where computational and/or memory restrictions become an issue. In this big data setting, computing an exact global centralized estimator is often unfeasible. Furthermore, even when approximate numerical solutions (e.g., based on Monte Carlo methods) working directly on the whole dataset can be computed, they may not provide a satisfactory performance either. Hence, several authors have recently started considering distributed inference approaches, where the data is divided among multiple workers (cores, machines or a combination of both). The computations are then performed in parallel and the resulting distributed or partial estimators are finally combined to approximate the intractable global estimator. In this paper, we focus on the scenario where no communication exists among the workers, deriving efficient linear fusion rules for the combination of the distributed estimators. Both a Bayesian perspective (based on the Bernstein-von Mises theorem and the asymptotic normality of the estimators) and a constrained optimization view are provided for the derivation of the linear fusion rules proposed. We concentrate on minimum mean squared error (MMSE) partial estimators, but the approach is more general and can be used to combine any kind of distributed estimators as long as they are unbiased. Numerical results show the good performance of the algorithms developed, both in simple problems where analytical expressions can be obtained for the distributed MMSE estimators, and in a wireless sensor network localization problem where Monte Carlo methods are used to approximate the partial estimators.

**Category:** Statistics

[118] **viXra:1601.0174 [pdf]**
*submitted on 2016-01-16 07:32:42*

**Authors:** V. Elvira, L. Martino, D. Luengo, M. F. Bugallo

**Comments:** 34 Pages.

Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functions for the samples and the avoidance of the sample degeneracy. In this paper, we propose three new schemes that considerably improve the performance of the original PMC formulation by allowing for better exploration of the space of unknowns and by selecting more adequately the surviving samples.
A theoretical analysis is performed, proving the superiority of the novel schemes in terms of variance of the associated estimators and preservation of the sample diversity.
Furthermore, we show that they outperform other state of the art algorithms (both in terms of mean square error and robustness w.r.t. initialization) through extensive numerical simulations.

**Category:** Statistics

[117] **viXra:1601.0167 [pdf]**
*submitted on 2016-01-16 03:40:15*

**Authors:** Ilija Barukčić

**Comments:** Pages.

Titans like Bertrand Russell or Karl Pearson warned us to keep our mathematical and statistical hands off causality and at the end David Hume too. Hume's scepticism has dominated discussion of causality in both analytic philosophy and statistical analysis for a long time. But more and more researchers are working hard on this field and trying to get rid of this positions. In so far, much of the recent philosophical or mathematical writing on causation (Ellery Eells (1991), Daniel Hausman (1998), Pearl (2000), Peter Spirtes, Clark Glymour and Richard Scheines (2000), ...) either addresses to Bayes networks, to the counterfactual approach to causality developed in detail by David Lewis, to Reichenbach's Principle of the Common Cause or to the Causal Markov Condition. None of this approaches to causation investigated the relationship between causation and the law of independence to a necessary extent. Nonetheless, the relationship between causation and the law of independence, one of the fundamental concepts in probability theory, is very important. May an effect occur in the absence of a cause? May an effect fail to occur in the presence of a cause? In so far, what does constitute the causal relation? On the other hand, if it is unclear what does constitute the causal relation, maybe we can answer the question, what does not constitute the causal relation. So far, a cause as such can not be independent from its effect and vice versa, if there is a deterministic causal relationship. This publication will prove, that the law of independence defines causation to some extent ex negativo.

**Category:** Statistics

[116] **viXra:1601.0070 [pdf]**
*submitted on 2016-01-07 16:41:10*

**Authors:** J.Tiago de Oliveira

**Comments:** 37 Pages.

Statistical Analysis of Extremes
chapter 3

**Category:** Statistics

[115] **viXra:1601.0069 [pdf]**
*submitted on 2016-01-07 16:42:58*

**Authors:** J.Tiago de Oliveira

**Comments:** 11 Pages.

Statistical Analysis of Extremes
chapter 4

**Category:** Statistics

[114] **viXra:1601.0032 [pdf]**
*submitted on 2016-01-05 10:37:48*

**Authors:** M. Srinivas, S. Sambasiva Rao

**Comments:** 7 Pages. This paper has been published in Indian Journal of Physical Education and Allied Sciences, ISSN: 2395-6895, Vol.1, No.5, pp.37-44.

The statistical analysis of angular data is typically encountered in biological and geological studies, among several other areas of research. Circular data is the simplest case of this category of data called directional data, where the single response is not scalar, but angular or directional. A statistical analysis pertaining to two dimensional directional data is generally referred to as “Circular Statistics”. In this paper, an attempt is made to review various fundamental concepts of circular statistics and to discuss its applicability in sports science.

**Category:** Statistics

[113] **viXra:1512.0448 [pdf]**
*submitted on 2015-12-26 16:50:32*

**Authors:** J.Tiago de Oliveira

**Comments:** 36 Pages.

Second chapter
Statistical Analysis of Extremes
Pendor, Lisbon, 1997

**Category:** Statistics

[112] **viXra:1512.0436 [pdf]**
*submitted on 2015-12-26 12:04:44*

**Authors:** J.Tiago de Oliveira

**Comments:** 9 Pages. First chapter

J. Tiago de Oliveira last book followed the research started by Emil Julius Gumbel

**Category:** Statistics

[111] **viXra:1512.0420 [pdf]**
*submitted on 2015-12-25 09:53:50*

**Authors:** L. Martino, J. Read, V. Elvira, F. Louzada

**Comments:** 21 Pages.

We design a sequential Monte Carlo scheme for the joint purpose of Bayesian inference and model selection, with application to urban mobility context where different modalities of movement can be employed. In this case, we have the joint problem of online tracking and detection of the current modality.
For this purpose, we use interacting parallel particle filters each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of the models given the data. The interaction occurs by a parsimonious distribution of the computational effort, adapting on-line the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and provides good results in different numerical experiments with artificial and real data.

**Category:** Statistics

[110] **viXra:1512.0319 [pdf]**
*submitted on 2015-12-14 09:37:41*

**Authors:** H. Jabbari1, M. Erfaniyan

**Comments:** 10 Pages.

Let fXn; n 1g be a strictly stationary sequence of negatively associated random
variables, with common continuous and bounded distribution function F. We consider
the estimation of the two-dimensional distribution function of (X1;Xk+1) based on kernel
type estimators as well as the estimation of the covariance function of the limit empirical
process induced by the sequence fXn; n 1g where k 2 IN0. Then, we derive uniform
strong convergence rates for the kernel estimator of two-dimensional distribution function
of (X1;Xk+1) which were not found already and do not need any conditions on the covari-
ance structure of the variables. Furthermore assuming a convenient decrease rate of the
covariances Cov(X1;Xn+1); n 1, we prove uniform strong convergence rate for covari-
ance function of the limit empirical process based on kernel type estimators. Finally, we
use a simulation study to compare the estimators of distribution function of (X1;Xk+1).

**Category:** Statistics

[109] **viXra:1512.0294 [pdf]**
*submitted on 2015-12-12 02:35:48*

**Authors:** Amelia Carolina Sparavigna

**Comments:** 4 Pages. Published in International Journal of Sciences, 2015, 4(10):1-4. DOI:10.18483/ijSci.845

Mutual information of two random variables can be easily obtained from their Shannon entropies. However, when nonadditive entropies are involved, the calculus of the mutual information is more complex. Here we discuss the basic matter about information from Shannon entropy. Then we analyse the case of the generalized nonadditive Tsallis entropy

**Category:** Statistics

[108] **viXra:1512.0293 [pdf]**
*submitted on 2015-12-12 02:40:18*

**Authors:** Amelia Carolina Sparavigna

**Comments:** 4 Pages. Published in International Journal of Sciences, 2015, 4(10):47-50. DOI:10.18483/ijSci.866

Tsallis and Kaniadakis entropies are generalizing the Shannon entropy and have it as their limit when their entropic indices approach specific values. Here we show some relations existing between Tsallis and Kaniadakis entropies. We will also propose a rigorous discussion of the conditional Kaniadakis entropy, deduced from these relations.

**Category:** Statistics

[107] **viXra:1511.0233 [pdf]**
*submitted on 2015-11-24 04:47:27*

**Authors:** M. F. Bugallo, L. Martino, J. Corander

**Comments:** Digital Signal Processing, Volume 47, Pages 36–49, 2015.

In Bayesian signal processing, all the information about the unknowns of interest is contained in their posterior distributions.
The unknowns can be parameters of a model, or a model and its parameters. In many important problems, these distributions
are impossible to obtain in analytical form. An alternative is to generate their approximations by Monte Carlo-based methods
like Markov chain Monte Carlo (MCMC) sampling, adaptive importance sampling (AIS) or particle filtering (PF). While MCMC
sampling and PF have received considerable attention in the literature and are reasonably well understood, the AIS methodology remains relatively unexplored. This article reviews the basics of AIS as well as provides a comprehensive survey of the state-of the-art of the topic. Some of its most relevant implementations are revisited and compared through computer simulation examples.

**Category:** Statistics

[106] **viXra:1511.0232 [pdf]**
*submitted on 2015-11-24 05:31:30*

**Authors:** V. Elvira, L. Martino, D. Luengo, M. F. Bugallo

**Comments:** 38 Pages.

Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its
standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighting procedures when more than one proposal is available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.

**Category:** Statistics

[105] **viXra:1511.0003 [pdf]**
*submitted on 2015-11-01 06:07:39*

**Authors:** John R. Dixon

**Comments:** 41 Pages.

This is the technical report to accompany:
Dixon, John R., Michael R. Kosorok, and Bee Leng Lee. "Functional inference in semiparametric models using the piggyback bootstrap." Annals of the Institute of Statistical Mathematics 57, no. 2 (2005): 255-277.

**Category:** Statistics

[104] **viXra:1509.0048 [pdf]**
*submitted on 2015-09-04 05:40:14*

**Authors:** L. Martino, F. Louzada

**Comments:** 13 Pages.

The adaptive rejection sampling (ARS) algorithm is a universal random generator for drawing samples efficiently from a univariate log-concave target probability density function (pdf). ARS generates independent samples from the target via rejection sampling with high acceptance rates. Indeed, ARS yields a sequence of proposal functions that converge toward the target pdf, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. In this work, we propose a novel ARS scheme, called Cheap Adaptive Rejection Sampling (CARS), where the computational effort for drawing from the proposal remains constant, decided in advance by the user. For generating a large number of desired samples, CARS is faster than ARS.

**Category:** Statistics

[103] **viXra:1508.0265 [pdf]**
*submitted on 2015-08-27 02:35:07*

**Authors:** B. B. Khare, Habib Ur Rehman, U. Srivastava

**Comments:** 10 Pages.

In this paper, a study of improved chain ratio-cum regression type estimator for population
mean in the presence of non-response for fixed cost and specified precision has been made.
Theoretical results are supported by carrying out one numerical illustration.

**Category:** Statistics

[102] **viXra:1508.0256 [pdf]**
*submitted on 2015-08-27 02:50:36*

**Authors:** B. B. Khare

**Comments:** 8 Pages.

The auxiliary information is used in increasing the efficiency of the estimators for the
parameters of the populations such as mean, ratio, and product of two population means. In this context, the estimation procedure for the ratio and product of two population means using auxiliary characters in special reference to the non response problem has been discussed.

**Category:** Statistics

[101] **viXra:1508.0142 [pdf]**
*submitted on 2015-08-18 02:29:47*

**Authors:** L. Martino, F. Louzada

**Comments:** 17 Pages.

The multiple Try Metropolis (MTM) algorithm
is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[100] **viXra:1507.0125 [pdf]**
*submitted on 2015-07-16 09:20:20*

**Authors:** editors Rajesh Singh, Florentin Smarandache

**Comments:** 54 Pages.

The present book aims to present some improved estimators using auxiliary and attribute information in case of simple random sampling and stratified random sampling and in some cases when non-response is present.
This volume is a collection of five papers, written by seven co-authors (listed in the order of the papers): Sachin Malik, Rajesh Singh, Florentin Smarandache, B. B. Khare, P. S. Jha, Usha Srivastava and Habib Ur. Rehman.
The first and the second papers deal with the problem of estimating the finite population mean when some information on two auxiliary attributes are available. In the third paper, problems related to estimation of ratio and product of two population mean using auxiliary characters with special reference to non-response are discussed.
In the fourth paper, the use of coefficient of variation and shape parameters in each stratum, the problem of estimation of population mean has been considered. In the fifth paper, a study of improved chain ratio-cum-regression type estimator for population mean in the presence of non-response for fixed cost and specified precision has been made.
The authors hope that the book will be helpful for the researchers and students that are working in the field of sampling techniques.

**Category:** Statistics

[99] **viXra:1507.0110 [pdf]**
*submitted on 2015-07-14 15:18:08*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 20 Pages.

Monte Carlo (MC) methods are widely used in signal processing, machine learning and stochastic optimization. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some ``horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework.
Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[98] **viXra:1507.0029 [pdf]**
*submitted on 2015-07-05 07:21:38*

**Authors:** Khaled Ouafi

**Comments:** 9 Pages.

We investigate the issue of approximate Bayesian parameter inference in nonlinear state space models with complex likelihoods. Sequential Monte Carlo with approximate Bayesian computations (SMC-ABC) is an approach to approximate the likelihood in this type of models. However, such approximations can be noisy and computationally expensive which hinders cost-effective implementations using standard methods based on optimisation and statistical simulation. We propose a innovational method based on the combination of Gaussian process optimisation (GPO) and SMC-ABC to create a Laplace approximation of the intractable posterior. The properties of the resulting GPO-ABC method are studied using stochastic volatility (SV) models with both synthetic and real-world data. We conclude that the algorithm enjoys: good accuracy comparable to particle Markov chain Monte Carlo with a significant reduction in computational cost and better robustness to noise in the estimates compared with a gradient-based optimisation algorithm. Finally, we make use of GPO-ABC to estimate the Value-at-Risk for a portfolio using a copula model with SV models for the margins.

**Category:** Statistics

[97] **viXra:1506.0175 [pdf]**
*submitted on 2015-06-24 13:01:14*

**Authors:** Ilija Barukčić

**Comments:** 19 pages. (C) Ilija Barukčić, Jever, Germany, 2015,

The deterministic relationship between cause and effect is deeply connected with our understanding of the physical sciences and their explanatory ambitions. Though progress is being made, the lack of theoretical predictions and experiments in quantum gravity makes it difficult to use empirical evidence to justify a theory of causality at quantum level in normal circumstances, i. e. by predicting the value of a well-confirmed experimental result. For a variety of reasons, the problem of the deterministic relationship between cause and effect is related to basic problems of physics as such. Despite the common belief, it is a remarkable fact that a theory of causality should be consistent with a theory of everything and is because of this linked to problems of a theory of everything. Thus far, solving the problem of causality can help to solve the problems of the theory of everything (at quantum level) too.

**Category:** Statistics

[96] **viXra:1506.0067 [pdf]**
*submitted on 2015-06-08 14:58:47*

**Authors:** Christopher Goddard

**Comments:** 4 Pages.

It is a common problem in statistics to determine the appropriate heuristic to select from a set of hypotheses (or equivalently, models), prior to optimising that model to fit the data. In this short note I sketch a technique based on the construction of an information in order to compute the optimal model within a given model space and given data.

**Category:** Statistics

[95] **viXra:1505.0136 [pdf]**
*submitted on 2015-05-19 00:31:36*

**Authors:** Vorobyev O.Yu., Golovkov L.S.

**Comments:** 10 Pages.

This article brings in two new discrete distributions: multidimensional Binomial
distribution and multidimensional Poisson distribution. Also there are its characteristics and properties.

**Category:** Statistics

[94] **viXra:1505.0135 [pdf]**
*submitted on 2015-05-18 10:45:07*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 26 Pages.

Monte Carlo algorithms represent the \textit{de facto} standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities for drawing candidate samples. Performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a \textit{layered}, that is a hierarchical, procedure for generating samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal distribution is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity of the resulting algorithm. A hierarchical interpretation of two well-known methods, such as of the random walk Metropolis-Hastings (MH) and the Population Monte Carlo (PMC) techniques, is provided.
Furthermore, we provide a general unified importance sampling (IS) framework where multiple proposal densities are employed, and several IS schemes are introduced applying the so-called deterministic mixture approach.
Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms combine efficiently the benefits of both IS and MCMC methods.

**Category:** Statistics

[93] **viXra:1503.0088 [pdf]**
*submitted on 2015-03-12 09:09:50*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 10 Pages.

We introduce logarithmic generalized Maxwell
distribution which is an extension of the generalized Maxwell
distribution. Some interesting properties of this distribution are
studied and the asymptotic distribution of the partial maximum of an
independent and identically distributed sequence from the
logarithmic generalized Maxwell distribution is gained.

**Category:** Statistics

[92] **viXra:1412.0276 [pdf]**
*submitted on 2014-12-31 01:34:35*

**Authors:** Jianwen Huang, Yanmin Liu

**Comments:** 7 Pages.

In this paper, with optimal normalized constants,
the asymptotic expansions of the distribution of the normalized
maxima from generalized Maxwell distribution is derived. It shows
that the convergence rate of the normalized maxima to the Gumbel
extreme value distribution is proportional to $1/\log n.$

**Category:** Statistics

[91] **viXra:1412.0275 [pdf]**
*submitted on 2014-12-31 01:42:41*

**Authors:** Jianwen Huang, Yanmin Liu

**Comments:** 12 Pages.

In this paper, the higher-order asymptotic
expansion of the moment of extreme from generalized Maxwell
distribution is gained, by which one establishes the rate of
convergence of the moment of the normalized partial
maximum to the moment of the associate Gumbel extreme value distribution.

**Category:** Statistics

[90] **viXra:1412.0247 [pdf]**
*submitted on 2014-12-26 15:30:26*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Krzanowski, Carlos Tadeu dos Santos Dias

**Comments:** 14 Pages.

A common problem in multi-environment trials arises when some genotype-by-environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic,the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

**Category:** Statistics

[89] **viXra:1412.0003 [pdf]**
*submitted on 2014-12-01 04:45:04*

**Authors:** Marisol García-Peña, Sergio Arciniegas-Alarcón, Décio Barbin

**Comments:** 10 Pages.

A common problem in climate data is missing information. Recently, four methods have been developed which are based in the singular value decomposition of a matrix (SVD). The aim of this paper is to evaluate these new developments making a comparison by means of a simulation study based on two complete matrices of real data. One corresponds to the historical precipitation of Piracicaba / SP - Brazil and the other matrix corresponds to multivariate meteorological characteristics in the same city from year 1997 to 2012. In the study, values were deleted randomly at different percentages with subsequent imputation, comparing the methodologies by three criteria: the normalized root mean squared error, the similarity statistic of Procrustes and the Spearman correlation coefficient. It was concluded that the SVD should be used only when multivariate matrices are analyzed and when matrices of precipitation are used, the monthly mean overcome the performance of other methods based on the SVD.

**Category:** Statistics

[88] **viXra:1411.0396 [pdf]**
*submitted on 2014-11-20 03:16:54*

**Authors:** A. Borumand Saeid, A. Namdar

**Comments:** 7 Pages.

We introduce the notion of Smarandache BCH-algebra and Smarandache (fresh, clean and fantastic) ideals, some example are given and related properties are investigated. Relationship between
Q-Smarandache (fresh, clean and fantastic) ideals and other types of ideals are given. Extension properties for Q-Smarandache (fresh, clean and fantastic) ideals are established.

**Category:** Statistics

[87] **viXra:1411.0270 [pdf]**
*submitted on 2014-11-19 01:04:21*

**Authors:** Florentin Smarandache

**Comments:** 2 Pages.

In this note the author presents a new proof for the theorem of I. Patrascu.

**Category:** Statistics

[86] **viXra:1411.0267 [pdf]**
*submitted on 2014-11-19 01:14:33*

**Authors:** Florentin Smarandache

**Comments:** 1 Page.

It is possible to cover all (positive) integers with n geometrical progressions of integers?
Find a necessary and sufficient condition for a general class of positive integer sequences
such that, for a fixed n , there are n (distinct) sequences of this class which cover all integers.

**Category:** Statistics

[85] **viXra:1411.0265 [pdf]**
*submitted on 2014-11-19 01:17:32*

**Authors:** Marian Niţu, Florentin Smarandache, Mircea Eugen Şelariu

**Comments:** 22 Pages.

Ideea centrală a lucrarii este prezentarea unor transformări noi, anterior inexistente în Matematica ordinară, denumită centrică (MC), dar, care au devenit posibile graţie apariţiei matematicii excentrice şi, implicit, a supermatematicii.

**Category:** Statistics

[84] **viXra:1411.0264 [pdf]**
*submitted on 2014-11-19 01:18:41*

**Authors:** Mircea E.selariu, Florentin Smarandache, Marian Nitu

**Comments:** 18 Pages.

Lucrarea prezintă corespondentele din matematica excentrică ale funcţiilor cardinale şi integrale din matematica centrică, sau matematica ordinară, funcţii centrice prezentate şi în introducerea lucrării, deoarece sunt prea puţin cunoscute, deşi sunt utilizate pe larg în fizica ondulatorie

**Category:** Statistics

[83] **viXra:1411.0260 [pdf]**
*submitted on 2014-11-19 01:38:40*

**Authors:** Octavian Cira, Florentin Smarandache

**Comments:** 8 Pages.

The first prime number with the special property that its addition with its reversal gives as result a prime number too is 299. The prime numbers with this property will be called Luhn prime numbers. In this article we intend to present a performing
algorithm for determining the Luhn prime numbers.

**Category:** Statistics

[82] **viXra:1411.0258 [pdf]**
*submitted on 2014-11-19 01:40:47*

**Authors:** Said Broumi, Pinaki Majumdar, Florentin Smarandache

**Comments:** 11 Pages.

In this paper , we have defined First Zadeh’s implication , First Zadeh’s intuitionistic fuzzy conjunction and intuitionistic fuzzy disjunction of two intuitionistic fuzzy soft sets and some their basic properties are studied with proofs and examples.

**Category:** Statistics

[81] **viXra:1411.0255 [pdf]**
*submitted on 2014-11-19 02:04:12*

**Authors:** Ion Patrascu, Florentin Smarandache

**Comments:** 3 Pages.

Open problem
Construct, using a ruler and a compass, two non-congruent triangles, which have equal
perimeters and arias.
In preparation for the proof of this problem we recall several notions and we prove a
Lemma.

**Category:** Statistics

[80] **viXra:1411.0253 [pdf]**
*submitted on 2014-11-19 02:07:41*

**Authors:** C.Dumitrescu, N.Varlan, St Zanfir, N.Radescu, F.Smarandache

**Comments:** 23 Pages.

In this paper we extend the Smarandache function.

**Category:** Statistics

[79] **viXra:1411.0252 [pdf]**
*submitted on 2014-11-19 02:09:03*

**Authors:** Ion Patrascu

**Comments:** 6 Pages.

In this article, we review some properties of the harmonic quadrilateral related to triangle simedians and to Apollonius circles.

**Category:** Statistics

[78] **viXra:1411.0072 [pdf]**
*submitted on 2014-11-08 15:25:10*

**Authors:** Suhoparov Stanislav Yurievich

**Comments:** 5 Pages.

Derivation of the recurrence relation for orthogonal polynomials and usage.
Вывод рекуррентного соотношения ортогональных многочленов из процесса ортогонализации Грама-Шмидта, а также схема применения полученного рекуррентного соотношения

**Category:** Statistics

[77] **viXra:1411.0064 [pdf]**
*submitted on 2014-11-07 17:22:14*

**Authors:** Jean Claude Dutailly

**Comments:** 16 Pages.

The purpose of this paper is to present a general method to estimate the probability of transitions of a system between phases. The system must be represented in a quantitative model, with vectorial variables depending on time, satisfying general conditions which are usually met. The method can be implemented in Physics, Economics or Finances.

**Category:** Statistics

[76] **viXra:1411.0016 [pdf]**
*submitted on 2014-11-03 07:05:31*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Krzanowski, Carlos Tadeu dos Santos Dias

**Comments:** 17 Pages.

Missing values for some genotype-environment combinations are commonly encountered in multienvironment trials. The recommended methodology for analyzing such unbalanced data combines the Expectation-Maximization (EM) algorithm with the additive main effects and multiplicative interaction (AMMI) model. Recently, however, four imputation algorithms based on the Singular Value Decomposition of a matrix (SVD) have been reported in the literature (Biplot imputation, EM+SVD, GabrielEigen imputation, and distribution free multiple imputation - DFMI). These algorithms all fill in the missing values, thereby removing the lack of balance in the original data and permitting simpler standard analyses to be performed. The aim of this paper is to compare these four algorithms with the gold standard EM-AMMI. To do this, we report the results of a simulation study based on three complete sets of real data (eucalyptus, sugar cane and beans) for various imputation percentages. The methodologies were compared using the normalised root mean squared error, the Procrustes similarity statistic and the Spearman correlation coefficient. The conclusion is that imputation using the EM algorithm plus SVD provides competitive results to those obtained with the gold standard. It is also an excellent alternative to imputation with an additive model, which in practice ignores the genotype-by-environment interaction and therefore may not be appropriate in some cases.

**Category:** Statistics

[75] **viXra:1410.0191 [pdf]**
*submitted on 2014-10-29 07:37:19*

**Authors:** Carlos Tadeu dos Santos Dias, Kuang Hongyu, Lúcio B. Araújo, Maria Joseane C. Silva, Marisol García-Peña, Mirian F. C. Araújo, Priscila N. Faria, Sergio Arciniegas-Alarcón

**Comments:** 19 Pages. Paper in portuguese.

This work is based on the short course “A Metodologia AMMI: Com Aplicacão ao Melhoramento Genético” taught during the 58a RBRAS and 15o SEAGRO held in Campina Grande - PB and aim to introduce the AMMI method for those that have and no have the mathematical training. We do not intend to submit a detailed work, but the intention is to serve as a light for researchers, graduate and postgraduate students. In other words, is a work to stimulate research and the quest for knowledge in an area of statistical methods. For this propose we make a review about the genotype-by-environment interaction, definition of the AMMI models and some selection criteria and biplot graphic. More details about it can be found in the material produced for the short course.

**Category:** Statistics

[74] **viXra:1410.0121 [pdf]**
*submitted on 2014-10-21 11:16:26*

**Authors:** Sergio Arciniegas-Alarcón, Carlos Tadeu dos Santos Dias, Marisol García-Peña

**Comments:** 9 Pages. Paper in portuguese with abstract in english.

Abstract – The objective of this work was to propose a new distribution‑free multiple imputation algorithm, through modifications of the simple imputation method recently developed by Yan in order to circumvent the problem of unbalanced experiments. The method uses the singular value decomposition of a matrix and was tested using simulations based on two complete matrices of real data, obtained from eucalyptus and sugarcane trials, with values deleted randomly at different percentages. The quality of the imputations was evaluated by a measure of overall accuracy that combines the variance between imputations and their mean square deviations in relation to the deleted values. The best alternative for multiple imputation is a multiplicative model that includes weights near to 1 for the eigenvalues calculated with the decomposition. The proposed methodology does not depend on distributional or structural assumptions and does not have any restriction regarding the pattern or the mechanism of the missing data.

**Category:** Statistics

[73] **viXra:1410.0077 [pdf]**
*submitted on 2014-10-14 13:14:47*

**Authors:** T. Prabhakar Reddy, S. Sambasiva Rao, P. Ramu

**Comments:** 13 Pages. This paper has been published in Journal of Physical Education and Sports Science, pp.226-234,Vol 2, 2014. ISSN 2229-7049.

Unpredictable game of the limited-over cricket brings with it excitement for the audience, expecting mayhem on the field. The huge expectation of audience to watch a good match may be ruined with an interruption due to bad weather or circumstances. Therefore, it is very much necessary to adjust the target score at the time of resumption of an interrupted match in a reasonable manner. Several mathematical models for resetting the target in interrupted one-day international (ODI) cricket matches are available in the literature; none of them is optimal for Twenty20 (T20) format to apply. The purpose of this note is to review the existing Rain Rules to reset the targets in an interrupted ODI cricket matches and to propose a method for resetting the targets in an interrupted T20 cricket match with suitable illustrative examples.

**Category:** Statistics

[72] **viXra:1410.0070 [pdf]**
*submitted on 2014-10-13 10:09:31*

**Authors:** Huang Jianwen, Yang Hongyan

**Comments:** 6 Pages.

Let
$\{X_n,~n\geq1\}$ be independent and identically distributed random
variables with each $X_n$ following skew normal distribution. Let
$M_n=\max\{X_k,~1\leq k\leq n\}$ denote the partial maximum of
$\{X_n,~n\geq1\}$. Liao et al. (2014) considered the convergence
rate of the distribution of the maxima for random variables obeying
the skew normal distribution under linear normalization. In this
paper, we
obtain the asymptotic distribution of the maximum under power
normalization and normalizing constants as well as the associated pointwise convergence rate under power
normalization.

**Category:** Statistics

[71] **viXra:1409.0127 [pdf]**
*submitted on 2014-09-16 10:08:05*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 15 Pages.

Let $\{X_n,~n\geq1\}$ be an independent
and identically distributed random sequence with common
distribution $F$ obeying the lognormal distribution. In
this paper, we obtain the exact uniform convergence rate of the
distribution of the maximum to its extreme value limit under power normalization.

**Category:** Statistics

[70] **viXra:1409.0119 [pdf]**
*submitted on 2014-09-15 10:24:34*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 9 Pages.

Motivated by Finner et al. (2008), the
asymptotic behavior of the probability density function (pdf) and
the cumulative distribution function (cdf) of the generalized
exponential and Maxwell distributions are studied. Specially, we
consider the asymptotic behavior of the ratio of the pdfs (cdfs) of
the generalized exponential and Student's $t$-distributions (likewise
for the Maxwell and Student's $t$-distributions) as the degrees of
freedom parameter approach infinity in an appropriate way. As by
products, Mills' ratios for the generalized exponential and Maxwell
distributions are gained. Moreover, we illustrate some examples to
indicate the application of our results in extreme value theory.

**Category:** Statistics

[69] **viXra:1409.0051 [pdf]**
*submitted on 2014-09-08 03:03:33*

**Authors:** L. Martino, J. Corander

**Comments:** 10 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights.
The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both are widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method.

**Category:** Statistics

[68] **viXra:1409.0015 [pdf]**
*submitted on 2014-09-02 11:32:22*

**Authors:** Ellida M. Khazen

**Comments:** 25 Pages.

The problem of filtering of unobservable components x(t) of a multidimensional continuous diffusion Markov process z(t)=(x(t),y(t)), given the observations of the (multidimensional) process y(t) taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, x(t), and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for x(t). As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.

**Category:** Statistics

[67] **viXra:1405.0280 [pdf]**
*submitted on 2014-05-21 11:13:00*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 18 Pages.

Monte Carlo (MC) methods are well-known techniques in different fields as signal processing, communications and machine learning. A well-known class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC). In this work, we introduce an adaptive and iterated importance sampler using a population of proposal densities. The novel algorithm, called Adaptive Population Importance Sampling (APIS), provides a global estimation of the variables of interest iteratively, using all the samples generated. APIS mixes together different convenient features of the AMIS and PMC schemes. Furthermore, APIS uses simultaneously both simple and more sophisticated approaches (as the deterministic mixture) to build the IS estimators. The cloud of proposals is adapted by learning from a subset of previously generated samples, in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. Numerical results show the advantages of the proposed sampling scheme in terms of mean square error. The resulting algorithm is also more robust in terms of sensibility to the initial choice of the parameters, w.r.t. other techniques as AMIS and PMC.

**Category:** Statistics

[66] **viXra:1405.0263 [pdf]**
*submitted on 2014-05-18 08:40:10*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 19 Pages.

Gibbs sampling is a well-known Markov Chain Monte Carlo (MCMC) technique, widely applied to draw samples from multivariate target distributions which appear often in many different fields (machine learning, finance, signal processing, etc.). The application of the Gibbs sampler requires being able to draw efficiently from the univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm that produces virtually independent samples from the target. The proposal density used is self-tuned to the specific target but it is not adaptive. Instead, the proposal is adjusted during the initialization stage following a simple procedure.
As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably speed up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several synthetic and real data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[65] **viXra:1404.0124 [pdf]**
*submitted on 2014-04-14 20:12:53*

**Authors:** Stefan Koester

**Comments:** 6 Pages.

The Koester Equation, and all of its processes, quantify the "loss in progress" experienced in a data set when it undergoes an abnormality, such as a missing day in testing. This loss in progress can also be viewed as a number determining by how much that data set is skewed by an abnormality. For example, if a person were to take three of the same tests for three days in a row, an obvious positive curve in their results would be apparent. If, on the fourth day, a break was taken and no testing occurred, the results after would not be the same as if the person had just continued. This is usually known as the loss in progress, and can now be quantified using The Koester Equation.

**Category:** Statistics

[64] **viXra:1404.0082 [pdf]**
*submitted on 2014-04-10 20:59:23*

**Authors:** Florentin Smarandache

**Comments:** 123 Pages.

Neutrosophic Statistics means statistical analysis of population or sample that has indeterminate (imprecise, ambiguous, vague, incomplete, unknown) data. For example, the population or sample size might not be exactly determinate because of some individuals that partially belong to the population or sample, and partially they do not belong, or individuals whose appurtenance is completely unknown. Also, there are population or sample individuals whose data could be indeterminate.
In this book, we develop the 1995 notion of neutrosophic statistics. We present various practical examples. It is possible to define the neutrosophic statistics in many ways, because there are various types of indeterminacies, depending on the problem to solve.

**Category:** Statistics

[63] **viXra:1403.0975 [pdf]**
*submitted on 2014-03-31 11:13:23*

**Authors:** editors Rajesh Singh, Florentin Smarandache

**Comments:** 71 Pages.

The purpose of writing this book is to suggest some improved estimators
using auxiliary information in sampling schemes like simple random sampling,
systematic sampling and stratified random sampling.
This volume is a collection of five papers, written by nine co-authors
(listed in the order of the papers): Rajesh Singh, Mukesh Kumar, Manoj Kr.
Chaudhary, Cem Kadilar, Prayas Sharma, Florentin Smarandache, Anil
Prajapati, Hemant Verma, and Viplav Kr. Singh.
In first paper dual to ratio-cum-product estimator is suggested and its
properties are studied. In second paper an exponential ratio-product type
estimator in stratified random sampling is proposed and its properties are
studied under second order approximation. In third paper some estimators are
proposed in two-phase sampling and their properties are studied in the
presence of non-response.
In fourth chapter a family of median based estimator is proposed in
simple random sampling. In fifth paper some difference type estimators are
suggested in simple random sampling and stratified random sampling and their
properties are studied in presence of measurement error.

**Category:** Statistics

[62] **viXra:1403.0948 [pdf]**
*submitted on 2014-03-27 12:42:36*

**Authors:** Nigel B. Cook

**Comments:** 1 Page.

The occurrence of pi in formulae apparently unrelated to geometry was used by Eugene Wigner in his 1960 paper The unreasonable effectiveness of mathematics in the natural sciences. Wigner's example is the Gaussian/normal distribution law, which is an example of obfuscation. Laplace (1782), Gauss (1809), Maxwell (1860) and Fisher (1915) wrote the normal exponential distribution with the square root of pi in the normalization outside the integral. But Stigler in 1982 rewrote the equation with pi in the exponent, making the formula look less mysterious because the exponent is then the area of a circle (in other words, Poisson's exponential distribution, adapted to circular areas, with areas expressed in dimensionless form); if you think of the use of the normal distribution to model CEP error probabilities for missiles landing around a target point. (Please see paper for equations.)

**Category:** Statistics

[61] **viXra:1403.0075 [pdf]**
*submitted on 2014-03-11 11:30:25*

**Authors:** Yuri Heymann

**Comments:** 6 Pages.

In the present study, Monte Carlo simulations show how a simple test applied to financial time-
series data can discriminate among the lognormal random walk used in the Black-Scholes-Merton
model, the Gaussian random walk used in the Ornstein-Uhlenbeck stochastic process, and the
square-root random walk used in the Cox, Ingersoll and Ross process. Alpha-level hypothesis
testing is provided. As a conclusion, this test appears to be helpful for selecting the best stochastic processes for pricing contingent claims and risk management.

**Category:** Statistics

[60] **viXra:1402.0127 [pdf]**
*submitted on 2014-02-19 09:38:22*

**Authors:** Maria Hablicsekne Richter

**Comments:** 13 Pages. Comments are welcome

One of the key issues in our lives: How long will we live? Other of the key issues in our lives:
How long will we get or enjoy our pensions? In this analysis I focus on the mortality of beneficiaries in receipt of old-age pensions and disability pensions in Hungary. My main objective is to demonstrate that the mortality of beneficiaries receiving different types of benefits may be significantly different from the mortality of the population. On the basis of the life tables presented I show the graduated probability of death corresponding to different ages, benefits and genders and also the expected number of future years at the given ages.
Considering all these, I make comparison between the mortality of beneficiaries receiving different types of benefits and the mortality of the population.

**Category:** Statistics

[59] **viXra:1310.0183 [pdf]**
*submitted on 2013-10-21 12:06:10*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Janusz Krzanowski, Carlos Tadeu dos Santos Dias

**Comments:** 17 Pages.

This paper proposes five new imputation methods for unbalanced experiments with genotype by-environment interaction (). The methods use cross-validation by eigenvector, based on an iterative scheme with the singular value decomposition (SVD) of a matrix. To test the methods, we performed a simulation study using three complete matrices of real data, obtained from interaction trials of peas, cotton, and beans, and introducing lack of balance by randomly deleting in turn 10%, 20%, and 40% of the values in each matrix. The quality of the imputations was evaluated with the additive main effects and multiplicative interaction model (AMMI), using the root mean squared predictive difference (RMSPD) between the genotypes and environmental parameters of the original data set and the set completed by imputation. The proposed methodology does not make any distributional or structural assumptions and does not have any restrictions regarding the pattern or mechanism of missing values.

**Category:** Statistics

[58] **viXra:1310.0024 [pdf]**
*submitted on 2013-10-05 03:35:05*

**Authors:** Nehul Yadav

**Comments:** 8 Pages. none

This research focuses primarily on the statistics and the famous models of mathematics used in ecology and evolution. I chose a unique topic in applied mathematics as i covet to become a mathematics researcher. Hope you like this research.

**Category:** Statistics

[57] **viXra:1307.0123 [pdf]**
*submitted on 2013-07-23 18:56:39*

**Authors:** editors Rajesh Singh, Florentin Smarandache

**Comments:** 64 Pages.

The purpose of writing this book is to suggest some improved estimators using auxiliary information in sampling schemes like simple random sampling and systematic sampling.
This volume is a collection of five papers, written by eight coauthors (listed in the order of the papers): Manoj K. Chaudhary, Sachin Malik, Rajesh Singh, Florentin Smarandache, Hemant Verma, Prayas Sharma, Olufadi Yunusa, and Viplav Kumar Singh, from India, Nigeria, and USA.
The following problems have been discussed in the book:
In chapter one an estimator in systematic sampling using auxiliary information is studied in the presence of non-response. In second chapter some improved estimators are suggested using auxiliary information. In third chapter some improved ratio-type estimators are suggested and their properties are studied under second order of approximation.
In chapter four and five some estimators are proposed for estimating unknown population parameter(s) and their properties are studied.
This book will be helpful for the researchers and students who are working in the field of finite population estimation.

**Category:** Statistics

[56] **viXra:1306.0064 [pdf]**
*submitted on 2013-06-10 23:24:51*

**Authors:** Rajesh Singh, Mukesh Kumar, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 8 Pages.

This paper presents a family of dual to ratio-cum-product estimators for the finite
population mean. Under simple random sampling without replacement
(SRSWOR) scheme, expressions of the bias and mean-squared error (MSE) up to
the first order of approximation are derived. We show that the proposed family is
more efficient than usual unbiased estimator, ratio estimator, product estimator,
Singh estimator (1967), Srivenkataramana (1980) and Bandyopadhyaya estimator
(1980) and Singh et al. (2005) estimator. An empirical study is carried out to
illustrate the performance of the constructed estimator over others.

**Category:** Statistics

[55] **viXra:1306.0021 [pdf]**
*submitted on 2013-06-05 07:25:19*

**Authors:** Sabiou Inoua

**Comments:** 2 Pages.

This short paper establishes one more formula for the variance. Consider a random variable *X* whose possible values are *x*_{1}, …, *x*_{n} with probabilities *p*_{1}, …, *p*_{n} of occurring, respectively. Pick two of these possible values
successively (each *x*_{i }having the probability *p*_{i} of being chosen). Compute the difference between the two chosen
values. Square the difference. Claim: you are expected to get (twice) the variance of *X*. This formula makes the variance appear an even more
natural measure of dispersion than usually thought.

[54] **viXra:1304.0143 [pdf]**
*submitted on 2013-04-25 11:24:37*

**Authors:** Zhang Huiming

**Comments:** 6 Pages. In Chinese

In this paper, by using three kinds of ideas of probability theory, we proof the equivalence among three kinds of probability expressions in the problem of rational division of stakes by the method of mathematical analysis. In addition, different ideas of probability theory obtain the identity. Let one of the probability expressions be a function, we find the B-function is closely relate to the derivative the probability expression function. According to Beta distribution function, we proof that probability expression function in the problem of rational division is equal to the distribution function of Beta distribution.

**Category:** Statistics

[53] **viXra:1304.0055 [pdf]**
*submitted on 2013-04-11 17:24:10*

**Authors:** Li Charlie Xia

**Comments:** 56 Pages.

Local association analysis, such as local similarity analysis and local shape analysis, of biological time series data helps elucidate the varying dynamics of biological systems. However, their applications to large scale high-throughput data are limited by slow permutation procedures for statistical signicance evaluation. We developed a theoretical approach to approximate the statistical signicance of local similarity and local shape analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d) and Markovian random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides p-values comparable to those from permutations. The new approach enables ecient calculation of statistical signicance for pairwise local association analysis, making possible all-to-all association studies otherwise prohibitive. As a demonstration, local association analysis of human microbiome time series shows that core OTUs are highly synergetic and some of the associations are body-site specic across samples. The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity and shape analysis of time series data. The tool is freely available from eLSA's website:
http://meta.usc.edu/softs/lsa.

**Category:** Statistics

[52] **viXra:1304.0054 [pdf]**
*submitted on 2013-04-11 17:27:00*

**Authors:** Li Charlie Xia

**Comments:** 127 Pages.

Recent developments in experimental molecular techniques, such as microarray, next generation sequencing technologies, have led molecular biology into a high-throughput era with emergent omics research areas, including metagenomics and transcriptomics. Massive-size omics datasets generated and being generated from the experimental laboratories put new challenges to computational biologists to develop fast and accurate quantitative analysis tools. We have developed two statistical and algorithmic methods,
GRAMMy and eLSA, for metagenomics and microbial community time series analysis. GRAMMy provides a unied probabilistic framework for shotgun metagenomics, in which maximum likelihood method is employed to accurately compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). We extended the Local Similarity Analysis technique (eLSA) to time series data with replicates, capturing statistically signicant local and potentially time-delayed associations. Both methods are validated through simulation studies and their capability to reveal new biology is also demonstrated through applications to real datasets. We implemented GRAMMy and eLSA as C++ extensions to Python, with both superior computational eciency and easy-to-integrate programming interfaces. GRAMMy and eLSA methods
will be increasingly useful tools as new omics researches accelerating their pace.http://meta.usc.edu/softs/lsa.

**Category:** Statistics

[51] **viXra:1301.0113 [pdf]**
*submitted on 2013-01-18 18:27:46*

**Authors:** Sergio Arciniegas-Alarcón, Carlos Tadeu dos Santos Dias

**Comments:** 14 Pages. In portuguese

A common problem in multienvironment trials are the missing genotype-environmental combinations. Recently, Bergamo proposed a distribution-free multiple imputation method in the interaction matrix. The purpose of this paper is to evaluate the new development and compare it with methodologies that have success in the genotype-environmental trials with missing data, like the alternating least squares (ALS) and the robust estimates, using the Additive Main effects and Multiplicative Interaction Models (AMMI). Was made an simulation study based in real data, doing missed random considering different percentages, imputing the observations and comparing the methodologies through three criteria: the square root of the mean predictive difference, the Procrustes statistic and the Spearman's rank correlation coeficient. Was concluded that the multiple imputation is not better than the imputation based in a additive model without interaction, and the best results for the variance are obtained with robust sub-models. All the considerated methods in this study have a high correlation between the true and the imputed missing values.

**Category:** Statistics

[50] **viXra:1301.0031 [pdf]**
*submitted on 2013-01-06 05:54:47*

**Authors:** Dimiter Tsvetkov, Lyubomir Hristov, Ralitsa Angelova-Slavova

**Comments:** 14 Pages.

In this paper we consider Metropolis-Hastings Markov chains with absolutely continuous with respect to Lebesgue measure target and proposal distributions.
We show that under some very general conditions the sequence of the powers of the conjugate transition operator has a strong limit in a properly defined Hilbert space
described for example in Stroock (2005).
Then we propose conditions under which the sequence of the successive densities of such a chain converges to the
target density according to the total variation distance for any choice of the initial density.
In particular we prove that the positiveness of the target and the proposal densities is enough for the chain to
converge.

**Category:** Statistics

[49] **viXra:1212.0008 [pdf]**
*submitted on 2012-12-02 07:12:32*

**Authors:** Xianzhao Zhong

**Comments:** 10 Pages.

For free electromagnetic field, there are two kinds of the wave equation, one is Maxwell
wave equation, another is generalized wave equation. In the paper, according to the matrix transformation the author transform the general quadratic form into diagonal matrix. Then this can obtain both forms of wave equation.
One is the Maxwell wave equation, another is the second form of the wave equation. In the half latter of the paper the author establish other two vibrator differential equations.

**Category:** Statistics

[48] **viXra:1211.0132 [pdf]**
*submitted on 2012-11-22 08:37:01*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Carlos Tadeu dos Santos Dias

**Comments:** 7 Pages. Paper in portuguese

The aim of this work was the study of prediction errors associated with four imputation methods applied to solve the problem of unbalance in experiments with genotype×environment (G×E) interaction. A simulation study was carried out based on four complete matrices of real data obtained in trials of interaction G×E of pea, cotton, beans and eucalyptus, respectively. The simulation of unbalance was done with random withdrawal of 10, 20 and 40% in each matrix. The prediction errors were found using cross-validation and were tested in classic intervals of
95% for missing data. For data imputation, algorithms were considered using models of additive effects without interaction and model estimates of additive effects with multiplicative interaction based on robust submodels. In general, the best prediction errors were obtained after imputation through an additive model without interaction.

**Category:** Statistics

[47] **viXra:1211.0131 [pdf]**
*submitted on 2012-11-22 08:15:53*

**Authors:** Sergio Arciniegas-Alarcón;, Carlos Tadeu dos Santos Dias

**Comments:** 7 Pages. Paper in portuguese

The objective of this work was to evaluate the convenience of defining the number of multiplicative components of additive main effect and multiplicative interaction models (AMMI) in genotype x enviroment interaction experiments in cotton with imputed or unbalanced data. A simulation study was carried out based on a matrix of real seed-cotton productivity data obtained in trials with genotype x environment interaction carried out with 15 genotypes at 27 locations in Brazil. The simulation was made with random withdrawals of 10, 20 and 30% of the data. The optimal number of multiplicative components for the AMMI model was determined using the Cornelius test and the likelihood ratio test onto the matrix completed by imputation. A correction based on the data missing in the Cornelius procedure was proposed for testing the hypothesis when the analysis is made from averages and the repetitions are not available. For data imputation, the methods considered used robust submodels, alternating least squares and multiple imputation. For analysis of unbalanced experiments, it is advisable to choose the number of multiplicative components of the AMMI model only from the observed information and to make the classical estimation of parameters based on the matrices completed by imputation.

**Category:** Statistics

[46] **viXra:1211.0129 [pdf]**
*submitted on 2012-11-21 13:18:47*

**Authors:** Shyam S Chandramouli

**Comments:** 10 Pages.

Many decision making problems that arise in Finance, Economics, Inventory etc. can be formulated as Markov Decision Problems (MDPs)
and solved using Dynamic Programming techniques. Further, to mitigate the statistical errors in estimating the underlying transition matrix or to exercise optimal control under adverserial setup led to the study of robust formulations of the same problems in Ghaoui and Nilim~\cite{ghaoui} and Iyengar~\cite{garud}. In this work, we study the computational methodologies to develop and validate feasible control policies for the Robust Dynamic Programming Problem. In terms of developing control policies, the current work can be seen as generalizing the existing literature on Approximate Dynamic Programming (ADP) to its robust counterpart. The work also generalizes the Information Relaxation and Dual approach of Brown, Smith and Sun~\cite{bss} to robust multi period problems. While discussing this framework we approach it both from a discrete control perspective and also as a set of conditional continous measures as in Ghaoui and Nilim~\cite{ghaoui} and Iyengar~\cite{garud}. We show numerical experiments on applications like ... In a nutshell, we expand the gamut of problems that the dual approach
can handle in terms of developing tight bounds on the value function.

**Category:** Statistics

[45] **viXra:1211.0127 [pdf]**
*submitted on 2012-11-21 10:29:40*

**Authors:** Shyam S Chandramouli

**Comments:** 22 Pages.

In this current work, we generalize the recent Pathwise Optimization approach of Desai et al.~\cite{desai2010pathwise} to Multiple stopping problems.
The approach also minimizes the dual bound as in Desai et al.~\cite{desai2010pathwise} to find the best approximation architecture for the Multiple
stopping problem. Though, we establish the convexity of the dual operator, in this setting as well, we cannot directly take advantage of this property
because of the computational issues that arise due to the combinatorial nature of the problem. Hence, we deviate from the pure martingale dual approach
to \emph{marginal} dual approach of Meinshausen and Hambly~\cite{meinshausenhambly2004} and solve each such optimal stopping problem in the framework of
Desai et al.~\cite{desai2010pathwise}. Though, this Pathwise Optimization approach as generalized to the Multiple stopping problem is computationally
intensive, we highlight that it can produce superior dual and primal bounds in certain settings.

**Category:** Statistics

[44] **viXra:1211.0113 [pdf]**
*submitted on 2012-11-19 13:56:24*

**Authors:** Stephen Crowley

**Comments:** 2 Pages.

Maximum likelihood estimation of the negative binomial distribution via numerical
methods is discussed.

**Category:** Statistics

[43] **viXra:1211.0094 [pdf]**
*submitted on 2012-11-16 15:47:51*

**Authors:** Stephen Crowley

**Comments:** 6 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function paramaterization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the Hawkes process, Autoregressive Conditional Duration(ACD), and Log-ACD models. The Autoregressive Conditional Intensity model is also discussed.

**Category:** Statistics

[42] **viXra:1210.0065 [pdf]**
*submitted on 2012-10-12 11:13:21*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Carlos Tadeu dos Santos Dias, Wojtek Janusz Krzanowski

**Comments:** 14 Pages.

A common problem in multi-environment trials arises when some genotype-by-environment combinations are missing. The aim of this paper is to propose a new deterministic imputation algorithm using a modification of the Gabriel cross-validation method. The method involves the singular value decomposition (SVD) of a matrix and was tested using three alternative component choices of the SVD in simulations based on two complete sets of real data, with values deleted randomly at different rates. The quality of the imputations was evaluated using the correlations and the mean square deviations between these estimates and the true observed values. The proposed methodology does not make any distributional or structural assumptions and does not have any restrictions regarding the pattern or mechanism of the missing data.

**Category:** Statistics

[41] **viXra:1208.0053 [pdf]**
*submitted on 2012-08-12 10:38:58*

**Authors:** Kiyoharu Tanaka, Evgeniy Grechnikov

**Comments:** 5 Pages.

This paper explains the Bayesian version of estimation as a method for calculating credibility
premium or credibility number of claims for short-term insurance contracts using two ingredients: past data on the risk itself and collateral data from other sources considered to be relevant. The Poisson/gamma model to estimate the claim frequency for portfolio of policies and Normal/normal model to estimate the pure premium are explained and applied.

**Category:** Statistics

[40] **viXra:1205.0104 [pdf]**
*submitted on 2012-05-28 01:49:47*

**Authors:** David D. Tung

**Comments:** 30 Pages.

In this paper, we use statistical data mining techniques to analyze a multivariate data set of career batting performances in Major League Baseball. Principal components analysis (PCA) is used to transform the high-dimensional data to its lower-dimensional principal components, which retain a high percentage of the sample variation, hence reducing the dimensionality of the data. From PCA, we determine a few important key factors of classical and sabermetric batting statistics, and the most important of these is a new measure, which we call Offensive Player Grade (OPG), that efficiently summarizes a player’s offensive performance on a numerical scale. The determination of these lower-dimensional principal components allows for accessible
visualization of the data, and for segmentation of players into groups using clustering, which is done here using the K-means clustering algorithm. We provide illuminating visual displays from our statistical data mining procedures, and we also furnish a player listing of the top 100 OPG scores which should be of interest to those that follow baseball.

**Category:** Statistics

[39] **viXra:1205.0093 [pdf]**
*submitted on 2012-05-24 02:48:37*

**Authors:** W. B. Vasantha Kandasamy, Florentin Smarandache, A. Praveen Prakash

**Comments:** 165 Pages.

The authors in this book have analyzed the socio-economic and
psychological problems faced by People with Disabilities
(PWDs) and their families. The study was made by collecting
data using both fuzzy linguistic questionnaire / by interviews in
case they are not literates from 2,15,811 lakhs people. This data
was collected using the five Non Government Organizations
(NGOs) from northern Tamil Nadu.

**Category:** Statistics

[38] **viXra:1204.0100 [pdf]**
*submitted on 2012-04-28 18:43:37*

**Authors:** Glen Gilchrist

**Comments:** 10 Pages.

Obtaining additional computational speed from a central processing unit by means of driving a computer system at clock frequencies higher than the default settings has long been used as a method to inexpensively “boost” the performance of a computer. With the emergence of so called smart-phones and the openness of the Android operating system, such tweaks have recently been applied to mobile handsets. This paper investigates the performance gains to an off the shelf handset with the custom Skatie Rom (C3C0, 2012) via means of a statistically designed experiment. A Taguchi Orthogonal Array was used to investigate 5 factors, each at 3 levels on the performance as measured with Aurora Softworks Quadrant application. Unsurprisingly the core CPU speed had the largest effect on overall performance, but we demonstrate that CPU Governor (lagfree) and the I/O Scheduler (noop) were also significant at p=0.000, whilst the size of the SD Card Cache is significant to p=0.065.

**Category:** Statistics

[37] **viXra:1204.0077 [pdf]**
*submitted on 2012-04-18 09:37:00*

**Authors:** Rajesh Singh, Sachin Malik, A. A. Adewara, Florentin Smarandache

**Comments:** 8 Pages.

In the present study, we propose estimators based on geometric and harmonic mean
for estimating population mean using information on two auxiliary attributes in simple
random sampling. We have shown that, when we have multi-auxiliary attributes, estimators
based on geometric mean and harmonic mean are less biased than Olkin (1958), Naik and
Gupta (1996) and Singh (1967) type- estimator under certain conditions. However, the MSE
of Olkin( 1958) estimator and geometric and harmonic estimators are same up to the first
order of approximation.

**Category:** Statistics

[36] **viXra:1203.0081 [pdf]**
*submitted on 2012-03-21 19:54:13*

**Authors:** D.S. Dihalu, B. Geelhoed

**Comments:** 7 Pages.

One of the most used theories for the sampling of materials for physical, chemical or biological testing is the theory developed by Pierre Gy. After a number of scientific publications, including several in the French language (e.g. Gy, 1953, 1964, 1975), he made –in 1979– his entire new theory available to the worldwide sampling community in a book (Gy, 1979) written in English. This book contains a complete description of Gy’s sampling theory. Later, Gy has made several refinements, but the essential character of the theory has always remained the same as the theory described in his 1979 book.
The impact of this book (and the entire theory of Gy) has been significant; even nowadays this book is regarded as the number one source of sampling-related information for engineers and process operators. Even though the practical impact and the scientific value of this work are unquestionably strong, several critical points of discussion need to be mentioned here, because the development of new technologies, recent experimental results and novel insights show that parts of Gy’s theory need to be updated or revised.

**Category:** Statistics

[35] **viXra:1112.0092 [pdf]**
*submitted on 2011-12-31 06:41:31*

**Authors:** Leonardo Rubino

**Comments:** 13 Pages.

In this paper (for the moment in Italian language only-sorry) you can find a rare proof of the Gauss Distribution Law, as almost all available books (in the opinion of who is writing) just show it, without giving any demonstration.
Then, here you can also find a description of the 3-sigma rule, very used in the field of technologies.
At last, a peculiarity of the quantum mechanics is here highlighted, where the claim of the equality sign in the Heisenberg indetermination equations leads to a Gaussian wave function, indeed, which reduces to a minimum value the quantum uncertainty situation.

**Category:** Statistics

[34] **viXra:1111.0073 [pdf]**
*submitted on 21 Nov 2011*

**Authors:** Popon Kangpenkae

**Comments:** 12 pages

This technical reference presents the functional structure and the algorithmic implementation of KL (Kullback-Leibler)
simplex. It details the simplex approximation and fusion. The KL simplex is fundamental, robust, adaptive an informatics
agent for computational research in economics, finance, game and mechanism. From this perspective the study provides
comprehensive results to facilitate future work in such areas.

**Category:** Statistics

[33] **viXra:1107.0049 [pdf]**
*submitted on 24 Jul 2011*

**Authors:** Rajesh Singh, Florentin Smarandache

**Comments:** 72 pages

This book has been designed for students and researchers who are working in the field
of time series analysis and estimation in finite population.
There are papers by Rajesh Singh, Florentin Smarandache, Shweta Maurya, Ashish K. Singh,
Manoj Kr. Chaudhary, V. K. Singh, Mukesh Kumar and Sachin Malik.
First chapter deals with the problem of time series and the rest of four chapters deal with the
problems in estimation in finite population.

**Category:** Statistics

[32] **viXra:1105.0038 [pdf]**
*submitted on 25 May 2011*

**Authors:** T Suslo

**Comments:** 9 pages.

We constraint on computer the best linear unbiased generalized statistics of
random field for the best linear unbiased generalized statistics of
an unknown constant mean of random field and derive the numerical
generalized least-squares estimator of an unknown constant mean of
random field. We derive the third constraint of spatial ststistics and show
that the classic generalized least-squares estimator of
an unknown constant mean of the field is only an asymptotic disjunction of
the numerical one.

**Category:** Statistics

[31] **viXra:1105.0026 [pdf]**
*submitted on 16 May 2011*

**Authors:** Elia Liitiäinen

**Comments:** 37 pages, Submitted to a journal

We study the moments E[d_{1,k}^{α}]
of the k-th nearest neighbor distance
for independent identically distributed points in R^{n}. In the earlier
literature, the case α > n has been analyzed by assuming a bounded support
for the underlying density. The boundedness assumption is removed by
assuming the multivariate Gaussian distribution. In this case, the nearest
neighbor distances show very different behavior in comparison to earlier
results. In the unbounded case, it is shown that E[d_{1,k}^{α}] is asymptotically
proportional to M^{-1} log^{n-1-α/2}M instead of M^{-α/n} as in the previous
literature.

**Category:** Statistics

[30] **viXra:1011.0070 [pdf]**
*submitted on 29 Nov 2010*

**Authors:** V.V. Singh, Alka Mittal, Neetish Sharma, Florentin Smarandache

**Comments:** 12 pages

Rajasthan is the biggest State of India and is currently in the second phase of demographic
transition and is moving towards the third phase of demographic transition with very slow
pace. However, state's population will continue to grow for a time period. Rajasthan's
performance in the social and economic sector has been poor in past. The poor performance
is the outcome of poverty, illiteracy and poor development, which co-exist and reinforce each
other. There are many demographic and socio-economic factors responsible for population
growth. This paper attempts to identify the demographic and socio-economic variables, which
are responsible for population growth in Rajasthan with the help of multivariate analysis.

**Category:** Statistics

[29] **viXra:1010.0054 [pdf]**
*submitted on 20 Mar 2010*

**Authors:** Manoj K. Chaudhary, Rajesh Singh, Mukesh Kumar, Rakesh K. Shukla, Florentin Smarandache

**Comments:** 11 pages

The objective of the present paper is to propose a family of separate-type
estimators of population mean in stratified random sampling in presence of nonresponse
based on the family of estimators proposed by Khoshnevisan et al.
(2007). Under simple random sampling without replacement (SRSWOR) the
expressions of bias and mean square error (MSE) up to the first order of
approximation are derived. The comparative study of the family with respect to
usual estimator has been discussed. The expressions for optimum sample sizes
of the strata in respect to cost of the survey have also been derived. An
empirical study is carried out to shoe the properties of the estimators.

**Category:** Statistics

[28] **viXra:1008.0044 [pdf]**
*submitted on 16 Aug 2010*

**Authors:** John Michael Williams

**Comments:** 47 Pages.

In common practice, degrees of freedom (df) may be corrected for the number of
theoretical free parameters as though parameters were the same as data categories.
However, a free physical parameter generally is not equivalent to a data category in
terms of goodness of the fit.
Here we use synthetic, nonrandom data to show the effect of choice of
categorization and df on goodness of fit. We then explain the origin of the df
problem and show how to avoid it in a three-step process:
First, the theoretical curve is fit to the data to remove its
variance, leaving what, under the null hypothesis, should be
structureless residuals.
Second, the residuals are fit by a set of orthogonal polynomials up
to the degree, should it occur, at which significant variance was
removed.
Third, the number of nonsignificant polynomial terms in the
original + orthogonal set become the df in a standard chi square
test.
This process reduces a general df problem to one of polynomial df and allows
goodness of a fit to be determined by data categorization and significance level
alone. An example is given of an evaluation of physical data on neutrino
oscillation.

**Category:** Statistics

[27] **viXra:1008.0034 [pdf]**
*submitted on 11 Aug 2010*

**Authors:** Jayant Singh, Hansraj Yadav, Florentin Smarandache

**Comments:** 8 pages

Migration plays an important role in urbanization of a state. In general more
the migration higher the urbanization rate though it many not necessarily
true in all the situations but in general it is witnessed that migration have a
fairly large share in urbanization. A district level analysis for Rajasthan state
is attempted to comprehend Urbanization due to migration their
interlinkages and association.

**Category:** Statistics

[26] **viXra:1008.0033 [pdf]**
*submitted on 11 Aug 2010*

**Authors:** Jayant Singh, Hansraj Yadav, Florentin Smarandache

**Comments:** 10 pages

People migrate to different distances and there migration is
governed by different reasons. Distance of place of migration plays an
important role in the migration process and an analysis based on the
remoteness of the origin and destination will reveal the push and pull factors
in more explicit way. However, a common phenomenon is that people do
migrate to a longer distance with a more focused objective and there
propensity to settle in urban areas is always higher than the small distance
migration.

**Category:** Statistics

[25] **viXra:1008.0020 [pdf]**
*submitted on 7 Aug 2010*

**Authors:** Manoj K. Chaudhary, Rajesh Singh, Rakesh K. Shukla, Mukesh Kumar, Florentin Smarandache

**Comments:** 8 pages

Khoshnevisan et al. (2007) proposed a general family of estimators for population mean using
known value of some population parameters in simple random sampling. The objective of this
paper is to propose a family of combined-type estimators in stratified random sampling adapting
the family of estimators proposed by Khoshnevisan et al. (2007) under non-response. The
properties of proposed family have been discussed. We have also obtained the expressions for
optimum sample sizes of the strata in respect to cost of the survey. Results are also supported by
numerical analysis.

**Category:** Statistics

[24] **viXra:1007.0034 [pdf]**
*submitted on 23 Jul 2010*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:** 15 pages.

In this paper, we propose a new test of uniformity on the circle based on
the Gini mean difference of the sample arc-lengths, i.e. the gaps between
successive observations on the circumference of the circle. These sample
arc-lengths are analogous to sample spacings, which are the gaps between
successive observations on the real line. Such a Gini mean difference test is
analogous to Rao's spacings test, which has been used to test the uniformity
of circular data.
We obtain both the exact and asymptotic distributions of the Gini mean
difference arc-lengths test, under the null hypothesis of circular uniformity.
We also provide a table of upper percentile values of the exact distribution
for small to moderate sample sizes. Some examples of circular data analysis
are also considered. It is also seen that the Gini mean difference arc-lengths
tests is more asymptotically efficient than Rao's test in the sense of Pitman
asymptotic relative efficiency.

**Category:** Statistics

[23] **viXra:1007.0016 [pdf]**
*submitted on 13 Mar 2010*

**Authors:** Rajesh Singh, Jayant Singh, Florentin Smarandache

**Comments:** 64 pages

This volume is a collection of five papers. Two chapters deal with problems in statistical
inference, two with inferences in finite population, and one deals with demographic problem.
The ideas included here will be useful for researchers doing works in these fields.

**Category:** Statistics

[22] **viXra:1006.0046 [pdf]**
*submitted on 18 Jun 2010*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:**
23 pages.

In this paper, we investigate the asymptotic theory for U-statistics based
on sample spacings, i.e. the gaps between successive observations. The
usual asymptotic theory for U-statistics does not apply here because spacings
are dependent variables. However, under the null hypothesis, the uniform
spacings can be expressed as conditionally independent Exponential random
variables. We exploit this idea to derive the relevant asymptotic theory both
under the null hypothesis and under a sequence of close alternatives.
The generalized Gini mean difference of the sample spacings is a prime
example of a U-statistic of this type. We show that such a Gini spacings test
is analogous to Rao's spacings test. We find the asymptotically locally most
powerful test in this class, and it has the same efficacy as the Greenwood
statistic.

**Category:** Statistics

[21] **viXra:1005.0068 [pdf]**
*submitted on 11 Mar 2010*

**Authors:** M. Khoshnevisan, S. Saxena, H. P. Singh, S. Singh, Florentin Smarandache

**Comments:**
63 pages.

The purpose of this book is to postulate some theories and test them numerically.
Estimation is often a difficult task and it has wide application in social sciences and
financial market. In order to obtain the optimum efficiency for some classes of
estimators, we have devoted this book into three specialized sections.

**Category:** Statistics

[20] **viXra:1005.0048 [pdf]**
*submitted on 11 Mar 2010*

**Authors:** Rajesh Singh, Mukesh Kumar, Manoj K. Chaudhary, Florentin Smarandache

**Comments:** 11 pages

This paper considers the problem of estimating the population mean using
information on auxiliary variable in presence of non response. Exponential ratio and
exponential product type estimators have been suggested and their properties are studied. An
empirical study is carried out to support the theoretical results.

**Category:** Statistics

[19] **viXra:1005.0020 [pdf]**
*submitted on 8 May 2010*

**Authors:** David D. Tung

**Comments:** 27 Pages.

In this paper, we will investigate the problem of obtaining
confidence intervals for a baseball team's Pythagorean expectation, i.e.
their expected winning percentage and expected games won. We study
this problem from two different perspectives. First, in the framework
of regression models, we obtain confidence intervals for prediction, i.e.
more formally, prediction intervals for a new observation, on the basis
of historical binomial data for Major League Baseball teams from the
1901 through 2009 seasons, and apply this to the 2009 MLB regular
season. We also obtain a Scheffé-type simultaneous prediction band
and use it to tabulate predicted winning percentages and their
prediction intervals, corresponding to a range of values for log(RS=RA).
Second, parametric bootstrap simulation is introduced as a data-driven,
computer-intensive approach to numerically computing confidence
intervals for a team's expected winning percentage. Under the
assumption that runs scored per game and runs allowed per game are
random variables following independent Weibull distributions, we
numerically calculate confidence intervals for the Pythagorean expectation
via parametric bootstrap simulation on the basis of each team's runs
scored per game and runs allowed per game from the 2009 MLB
regular season. The interval estimates, from either framework, allow us to
infer with better certainty as to which teams are performing above or
below expectations. It is seen that the bootstrap confidence intervals
appear to be better at detecting which teams are performing above
or below expectations than the prediction intervals obtained in the
regression framework.

**Category:** Statistics

[18] **viXra:1005.0003 [pdf]**
*submitted on 10 Mar 2010*

**Authors:** W. B. Vasantha Kandasamy, Florentin Smarandache

**Comments:** 209 pages

In this book, for the first time we introduce the notions of Ngroups,
N-semigroups, N-loops and N-groupoids. We also
define a mixed N-algebraic structure. We expect the reader to be
well versed in group theory and have at least basic knowledge
about Smarandache groupoids, Smarandache loops,
Smarandache semigroups and bialgebraic structures and
Smarandache bialgebraic structures.

**Category:** Statistics

[17] **viXra:1004.0076 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 8 pages

In this paper exponential ratio and exponential product type estimators using two
auxiliary variables are proposed for estimating unknown population variance S_{y}^{2}. Problem is
extended to the case of two-phase sampling. Theoretical results are supported by an empirical
study.

**Category:** Statistics

[16] **viXra:1004.0064 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 11 pages

This study proposes improved chain-ratio type estimator for
estimating population mean using some known values of population
parameter(s) of the second auxiliary character. The proposed estimators have
been compared with two-phase ratio estimator and some other chain type
estimators. The performances of the proposed estimators have been
supposed with a numerical illustration.

**Category:** Statistics

[15] **viXra:1004.0063 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Jayant Singh, Florentin Smarandache

**Comments:** 16 pages

Optimum Statistical Test Procedure

**Category:** Statistics

[14] **viXra:1004.0062 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 15 pages

In practice, the information regarding the population proportion possessing certain
attribute is easily available see Jhajj et.al. (2006). For estimating the population mean Y
of the study variable y, following Bahl and Tuteja (1991), a ratio-product type
exponential estimator has been proposed by using the known information of population
proportion possessing an attribute (highly correlated with y) in simple random sampling.
The expressions for the bias and the mean-squared error (MSE) of the estimator and its
minimum value have been obtained. The proposed estimator has an improvement over
mean per unit estimator, ratio and product type exponential estimators as well as Naik
and Gupta (1996) estimators. The results have also been extended to the case of two
phase sampling. The results obtained have been illustrated numerically by taking some
empirical populations considered in the literature.

**Category:** Statistics

[13] **viXra:1004.0061 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 12 pages

In this paper we have proposed an almost unbiased ratio and product type
exponential estimator for the finite population mean Y-bar. It has been shown that Bahl and
Tuteja (1991) ratio and product type exponential estimators are particular members of the
proposed estimator. Empirical study is carried to demonstrate the superiority of the
proposed estimator.

**Category:** Statistics

[12] **viXra:1004.0056 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 11 pages

It is well recognized that the use of auxiliary information in sample survey design
results in efficient estimators of population parameters under some realistic conditions.
Out of many ratio, product and regression methods of estimation are good examples in
this context. Using the knowledge of kurtosis of an auxiliary variable Upadhyaya and
Singh (1999) has suggested an estimator for population variance. In this paper, following
the approach of Singh and Singh (1993), we have suggested almost unbiased ratio and
product-type estimators for population variance.

**Category:** Statistics

[11] **viXra:1004.0054 [pdf]**
*submitted on 8 Mar 2010*

**Authors:** Florentin Smarandache

**Comments:** 9 pages

This article presents several alternatives to Pearson's correlation coefficient
and many examples. In the samples where the rank in a discrete variable counts more
than the variable values, the mixture of Pearson's and Spearman's gives a better result.

**Category:** Statistics

[10] **viXra:1003.0183 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Rajesh Singh, Jayant Singh, Florentin Smarandache

**Comments:** 5 pages

In this paper problem of testing of hypothesis is discussed when the samples
have been drawn from normal distribution. The study of hypothesis testing
is also extended to Baye's set up.

**Category:** Statistics

[9] **viXra:1003.0172 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** M. Khoshnevisan, Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 11 pages

A general family of estimators for estimating the population mean of the variable
under study, which make use of known value of certain population parameter(s), is proposed.
Under Simple Random Sampling Without Replacement (SRSWOR) scheme, the expressions of
bias and mean-squared error (MSE) up to first order of approximation are derived. Some well
known estimators have been shown as particular member of this family. An empirical study is
carried out to illustrate the performance of the constructed estimator over others.

**Category:** Statistics

[8] **viXra:1003.0137 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** M. Khoshnevisan, F. Kaymram, Housila P. Singh, Rajesh Singh, Florentin Smarandache

**Comments:** 10 pages

This paper proposes a class of estimators for population correlation coefficient
when information about the population mean and population variance of one of the
variables is not avaliable but information about these parameters of another variable
(auxiliary) is avaliable, in two phase sampling and analyzes its properties. Optimum
estimator in the class is identified with its variance formula. The estimators of the class
involve unknown constants whose optimum values depend on unknown population
parameters.Following Singh (1982) and Srivastava and Jhajj (1983), it has been shown
that when these population parameters are replaced by their consistent estimates the
resulting class of estimators has the same asymptotic variance as that of optimum
estimator. An empirical study is carried out to demonstrate the performance of the
constructed estimators.

**Category:** Statistics

[7] **viXra:1003.0136 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** M. Khoshnevisan, F. Kaymram, Housila P. Singh, Rajesh Singh, Florentin Smarandache

**Comments:** 11 pages

This paper investigates the efficiency of an alternative to ratio estimator
under the super population model with uncorrelated errors and a gammadistributed
auxiliary variable. Comparisons with usual ratio and unbiased
estimators are also made.

**Category:** Statistics

[6] **viXra:1003.0130 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Jack Allen, Housila P. Singh, Florentin Smarandache

**Comments:** 16 pages

This paper proposes a family of estimators of population mean using information on several auxiliary variables
and analyzes its properties in the presence of measurement errors.

**Category:** Statistics

[5] **viXra:1003.0128 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Housila P. Singh, Sharad Saxena, Jack Allen, Sarjinder Singh, Florentin Smarandache

**Comments:** 20 pages

This paper is speculated to propose a class of shrinkage estimators for shape parameter β in
failure censored samples from two-parameter Weibull distribution when some 'apriori' or guessed
interval containing the parameter β is available in addition to sample information and analyses their
properties. Some estimators are generated from the proposed class and compared with the minimum
mean squared error (MMSE) estimator. Numerical computations in terms of percent relative efficiency
and absolute relative bias indicate that certain of these estimators substantially improve the MMSE
estimator in some guessed interval of the parameter space of β, especially for censored samples with
small sizes. Subsequently, a modified class of shrinkage estimators is proposed with its properties.

**Category:** Statistics

[4] **viXra:1003.0113 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 6 pages

This study proposes some exponential ratio-type estimators for estimating the population
mean of the variable under study ... (see paper for full abstract)

**Category:** Statistics

[3] **viXra:1003.0109 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Jack Allen, Housila P. Singh, Sarjinder Singh, Florentin Smarandache

**Comments:** 21 pages

In this paper we have suggested two classes of estimators for population median MY of the study
character Y using information on two auxiliary characters X and Z in double sampling. It has
been shown that the suggested classes of estimators are more efficient than the one suggested by
Singh et al (2001). Estimators based on estimated optimum values have been also considered
with their properties. The optimum values of the first phase and second phase sample sizes are
also obtained for the fixed cost of survey.

**Category:** Statistics

[2] **viXra:1003.0092 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 75 pages

This volume is a collection of six papers on the use of auxiliary
information and a priori values in construction of improved estimators. The
work included here will be of immense application for researchers and
students who employ auxiliary information in any form.

**Category:** Statistics

[1] **viXra:1003.0091 [pdf]**
*submitted on 6 Mar 2010*

**Authors:** Rajesh Singh, Pankaj Chauhan, Nirmala Sawan, Florentin Smarandache

**Comments:** 7 pages

Some ratio estimators for estimating the population mean of the variable under study, which
make use of information regarding the population proportion possessing certain attribute, are
proposed. Under simple random sampling without replacement (SRSWOR) scheme, the
expressions of bias and mean-squared error (MSE) up to the first order of approximation are
derived. The results obtained have been illustrated numerically by taking some empirical
population considered in the literature.

**Category:** Statistics

[53] **viXra:1603.0215 [pdf]**
*replaced on 2016-03-17 17:28:08*

**Authors:** Glenn Healey

**Comments:** 14 Pages.

Given a set of observed batted balls and their outcomes, we develop a method for learning the dependence of a batted ball’s intrinsic value on its measured parameters.

**Category:** Statistics

[52] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-14 15:52:37*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo scheme for a node-specific inference problem where a vector of global parameters and multiple vectors of local parameters are involved. This scenario often appears in inference problems over heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[51] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-13 11:19:11*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo scheme for a node-specific inference problem where a vector of global parameters and multiple vectors of local parameters are involved. This scenario often appears in inference problems over heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[50] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-12 06:01:27*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo
scheme for a node-specific inference problem where a vector of
global parameters and multiple vectors of local parameters are
involved. This scenario often appears in inference problems over
heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[49] **viXra:1602.0112 [pdf]**
*replaced on 2016-03-05 09:11:03*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 32 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[48] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-20 06:30:34*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[47] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-19 04:23:27*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[46] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-14 08:13:03*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the {\it perplexity} measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[45] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-10 07:48:50*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In IS context, an approximation of the theoretical ESS definition is widely applied, $\widehat{ESS}$, involving the sum of the squares of the normalized importance weights. This formula $\widehat{ESS}$ has become an essential piece within Sequential Monte Carlo (SMC) methods using adaptive resampling procedures. The expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[44] **viXra:1602.0053 [pdf]**
*replaced on 2016-02-05 08:42:31*

**Authors:** Jason Lind

**Comments:** 3 Pages. Added preliminary calculations for correcting non-normal distribution

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[43] **viXra:1602.0053 [pdf]**
*replaced on 2016-02-05 03:29:44*

**Authors:** Jason Lind

**Comments:** Corrected table on page 2

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[42] **viXra:1512.0420 [pdf]**
*replaced on 2015-12-26 13:02:26*

**Authors:** L. Martino, J. Read, V. Elvira, F. Louzada

**Comments:** 21 Pages.

We design a sequential Monte Carlo scheme for the joint purpose of Bayesian inference and model selection, with application to urban mobility context where different modalities of transport and measurement devices can be employed. In this case, we have the joint problem of online tracking and detection of the current modality. For this purpose, we use interacting parallel particle filters each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of the models given the data. The interaction occurs by a parsimonious distribution of the computational effort, adapting on-line the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and flexible. We have tested the novel technique in different numerical experiments with artificial and real data, which confirm the robustness of the proposed scheme.

**Category:** Statistics

[41] **viXra:1508.0142 [pdf]**
*replaced on 2016-02-24 08:21:59*

**Authors:** L. Martino, F. Louzada

**Comments:** 15 Pages. To appear in Computational Statistics

The multiple Try Metropolis (MTM) algorithm is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[40] **viXra:1508.0142 [pdf]**
*replaced on 2015-08-19 03:39:57*

**Authors:** L. Martino, F. Louzada

**Comments:** 17 Pages.

The multiple Try Metropolis (MTM) algorithm is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[39] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-30 08:34:32*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 25 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[38] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-28 23:03:29*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 24 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[37] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-28 08:47:05*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 24 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework.
Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[36] **viXra:1506.0175 [pdf]**
*replaced on 2015-10-04 03:38:05*

**Authors:** Ilija Barukčić

**Comments:** 19 Pages. (C) Ilija Barukčić, Jever, Germany, 2015. Published by: International Journal of Applied Physics and Mathematics vol. 6, no. 2, pp. 45-65, 2016. http://dx.doi.org/10.17706/ijapm.2016.6.2.45-65

The deterministic relationship between cause and effect is deeply connected with our understanding of the physical sciences and their explanatory ambitions. Though progress is being made, the lack of theoretical predictions and experiments in quantum gravity makes it difficult to use empirical evidence to justify a theory of causality at quantum level in normal circumstances, i. e. by predicting the value of a well-confirmed experimental result. For a variety of reasons, the problem of the deterministic relationship between cause and effect is related to basic problems of physics as such. Despite the common belief, it is a remarkable fact that a theory of causality should be consistent with a theory of everything and is because of this linked to problems of a theory of everything. Thus far, solving the problem of causality can help to solve the problems of the theory of everything (at quantum level) too.

**Category:** Statistics

[35] **viXra:1505.0135 [pdf]**
*replaced on 2016-02-25 06:00:34*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 24 Pages.

Monte Carlo methods represent the \textit{de facto} standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a \textit{layered} (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.

**Category:** Statistics

[34] **viXra:1505.0135 [pdf]**
*replaced on 2015-05-27 13:09:35*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 25 Pages.

Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities for drawing candidate samples. Performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered, that is a hierarchical, procedure for generating samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. A hierarchical interpretation of two well-known methods, such as of
the random walk Metropolis-Hastings (MH) and the Population Monte Carlo (PMC) techniques, is provided. Furthermore, we provide a general unified importance sampling (IS) framework where multiple proposal densities are employed, and several IS schemes are introduced applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms combine efficiently the benefits of both IS and MCMC methods.

**Category:** Statistics

[33] **viXra:1409.0127 [pdf]**
*replaced on 2015-03-17 07:17:05*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 10 Pages.

Let $\{X_n,n\geq1\}$ be an independent and
identically distributed random sequence with common distribution $F$ obeying the lognormal distribution. In this paper, we obtain the exact uniform convergence rate of the distribution of maxima to its extreme value limit under power normalization.

**Category:** Statistics

[32] **viXra:1409.0051 [pdf]**
*replaced on 2016-03-17 14:39:23*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 16 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both have been widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method. Furthermore, novel schemes are also designed.

**Category:** Statistics

[31] **viXra:1409.0051 [pdf]**
*replaced on 2016-02-17 13:27:24*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 15 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both have been widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method. Furthermore, novel schemes are also designed.

**Category:** Statistics

[30] **viXra:1409.0051 [pdf]**
*replaced on 2016-01-14 12:54:49*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 14 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both have been widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method. Furthermore, novel schemes are also designed.

**Category:** Statistics

[29] **viXra:1409.0051 [pdf]**
*replaced on 2016-01-05 08:47:18*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 11 Pages.

**Category:** Statistics

[28] **viXra:1409.0051 [pdf]**
*replaced on 2016-01-04 12:40:57*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 11 Pages.

**Category:** Statistics

[27] **viXra:1409.0051 [pdf]**
*replaced on 2014-09-23 02:30:02*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 10 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both are widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method.

**Category:** Statistics

[26] **viXra:1409.0015 [pdf]**
*replaced on 2014-12-15 15:30:35*

**Authors:** Ellida M. Khazen

**Comments:** Pages. The paper is being publuished in Cogent Mathematics (2016), 2:1134031. http://dx.doi.org/10.1080/23311835.2015.1134031

The problem of filtering of unobservable components x(t) of a multidimensional continuous diffusion Markov process z(t)=(x(t),y(t)), given the observations of the (multidimensional) process y(t) taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, x(t), and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for x(t). As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.

**Category:** Statistics

[25] **viXra:1405.0280 [pdf]**
*replaced on 2015-03-25 13:29:09*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** IEEE Transactions on Signal Processing, Volume 63, Issue 16, Pages 4422-4437, 2015

Monte Carlo (MC) methods are well-known computational techniques, widely used in different fields such as signal processing, communications and machine learning. An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, such as population Monte Carlo (PMC) and adaptive multiple IS (AMIS). In this work, we introduce a novel adaptive and iterated importance sampler using a population of proposal densities. The proposed algorithm, named adaptive population importance sampling (APIS), provides a global estimation of the variables of interest iteratively, making use of all the samples previously generated. APIS combines a sophisticated scheme to build the IS estimators (based on the deterministic mixture approach) with a simple temporal adaptation (based on epochs). In this way, APIS is able to keep all the advantages of both AMIS and PMC, while minimizing their drawbacks. Furthermore, APIS is easily parallelizable. The cloud of proposals is adapted in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. The result is a fast, simple, robust and high-performance algorithm applicable to a wide range of problems. Numerical results show the advantages of the proposed sampling scheme in four synthetic examples and a localization problem in a wireless sensor network.

**Category:** Statistics

[24] **viXra:1405.0280 [pdf]**
*replaced on 2014-07-04 10:52:29*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 19 Pages.

Monte Carlo (MC) methods are well-known computational techniques widely used in different fields such as signal processing, communications and machine learning.
An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC).
In this work, we introduce a novel adaptive and iterated importance sampler using a population of proposal densities.
The proposed algorithm, named {\it Adaptive Population Importance Sampling} (APIS), provides a global estimation of the variables of interest iteratively, making use of all the samples previously generated.
APIS combines a sophisticated scheme to build the IS estimators (based on the deterministic mixture approach) with a simple temporal adaptation (based on epochs).
In this way, APIS is able to keep all the advantages of both AMIS and PMC while minimizing their drawbacks. Futhermore, the cloud of proposals is adapted in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures.
The result is a fast, simple, robust and high-performance algorithm applicable to a wide range of problems. Numerical results show the advantages of the proposed sampling scheme for a toy example and a localization problem in a wireless sensor network.

**Category:** Statistics

[23] **viXra:1405.0280 [pdf]**
*replaced on 2014-05-23 12:13:47*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 20 Pages.

Monte Carlo (MC) methods are well-known computational techniques in different fields as signal processing, communications, and machine learning. An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC). In this work, we introduce an adaptive and iterated importance sampler using a population of proposal densities. The novel algorithm, called {\it Adaptive Population Importance Sampling} (APIS), provides iteratively a global estimation of the variables of interest, using all the samples generated. APIS mixes together different convenient features of the AMIS and PMC schemes. Furthermore, APIS uses simultaneously simple and more sophisticated approaches (as the deterministic mixture) to build the IS estimators. The cloud of proposals is adapted by learning from a subset of previously generated samples, in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. Numerical results show the advantages of the proposed sampling scheme in terms of mean square error. The resulting algorithm is also more robust in terms of sensitivity to the initial choice of the parameters w.r.t. other techniques as AMIS and PMC.

**Category:** Statistics

[22] **viXra:1405.0263 [pdf]**
*replaced on 2015-04-09 13:23:39*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** Digital Signal Processing, Volume 47, Pages 68-83, 2015.

Bayesian inference often requires efficient numerical approximation algorithms, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique, widely applied in many signal processing problems. Drawing samples from univariate full-conditional distributions efficiently is essential for the practical application of the Gibbs sampler. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from these univariate target densities. The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Instead, the proposal is adjusted during an initial optimization stage, following a simple and extremely effective procedure. Hence, we have named the newly proposed approach as FUSS (Fast Universal Self-tuned Sampler), as it can be used to sample from any bounded univariate distribution and also from any bounded multi-variate distribution, either directly or by embedding it within a Gibbs sampler. Numerical experiments, on several synthetic data sets (including a challenging parameter estimation problem in a chaotic system) and a high-dimensional financial signal processing problem, show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[21] **viXra:1405.0263 [pdf]**
*replaced on 2014-07-02 10:33:21*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 18 Pages.

Bayesian inference often requires efficient numerical approximation algorithms such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique widely applied in several fields (e.g., machine learning, finance, etc.). In the application of the Gibbs sampler one needs to efficiently generate values from univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from univariate target densities.
The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Indeed, the proposal is adjusted during an initialization stage following a simple procedure. As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably sped up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[20] **viXra:1405.0263 [pdf]**
*replaced on 2014-06-02 04:58:08*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 15 Pages.

Gibbs sampling is a well-known Markov Chain Monte Carlo (MCMC) technique, widely applied to draw samples from multivariate target distributions which appear often in many different fields (machine learning, finance, signal processing, etc.). The application of the Gibbs sampler requires being able to draw efficiently from the univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm that produces virtually independent samples from the target. The proposal density used is self-tuned to the specific target but it is not adaptive. Instead, the proposal is adjusted during the initialization stage following a simple procedure. As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably speed up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several synthetic and real data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[19] **viXra:1403.0075 [pdf]**
*replaced on 2015-07-08 07:59:15*

**Authors:** Yuri Heymann

**Comments:** 11 Pages.

This paper aims to offer a testing framework for the structural properties of the Brownian motion of the underlying stochastic process of a time series. In particular, the test can be applied to financial time-series data and discriminate among the lognormal random walk used in the Black-Scholes-Merton model, the Gaussian random walk used in the Ornstein-Uhlenbeck stochastic process, and the square-root random walk used in the Cox, Ingersoll and Ross process. Alpha-level hypothesis testing is provided. This testing framework is helpful for selecting the best stochastic processes for pricing contingent claims and risk management.

**Category:** Statistics

[18] **viXra:1301.0031 [pdf]**
*replaced on 2013-03-06 20:13:47*

**Authors:** Dimiter Tsvetkov, Lyubomir Hristov, Ralitsa Angelova-Slavova

**Comments:** 14 Pages.

In this paper we consider Markov chains associated with the Metropolis-Hastings algorithm.
We propose conditions under which the sequence of the successive densities of such a chain converges to the
target density according to the total variation distance for any choice of the initial density.
In particular we prove that the positiveness of the target and the proposal densities is enough for the chain to
converge.

**Category:** Statistics

[17] **viXra:1301.0031 [pdf]**
*replaced on 2013-03-01 09:56:38*

**Authors:** Dimiter Tsvetkov, Lyubomir Hristov, Ralitsa Angelova-Slavova

**Comments:** 15 Pages.

In this paper we consider Markov chains associated with the Metropolis-Hastings algorithm.
We propose conditions under which the sequence of the successive densities of such a chain converges to the target density according to the total variation distance for any choice of the initial density.
In particular we prove that the positiveness of the target and the proposal densities is enough for the chain to
converge.

**Category:** Statistics

[16] **viXra:1301.0031 [pdf]**
*replaced on 2013-02-04 04:29:28*

**Authors:** Dimiter Tsvetkov, Lyubomir Hristov, Ralitsa Angelova-Slavova

**Comments:** 14 Pages.

In this paper we consider Markov chains associated with the Metropolis-Hastings algorithm.
We show that under some very general conditions the sequence of the powers of the conjugate transition operator has a strong limit in a properly defined Hilbert space
described for example in Stroock (2005).
Then we propose conditions under which the sequence of the successive densities of such a chain converges to the
target density according to the total variation distance for any choice of the initial density.
In particular we prove that the positiveness of the target and the proposal densities is enough for the chain to
converge.

**Category:** Statistics

[15] **viXra:1211.0094 [pdf]**
*replaced on 2015-11-20 17:57:17*

**Authors:** Stephen Crowley

**Comments:** 12 Pages.

The Hawkes process having a kernel in the form of a linear combination of exponential functions ν(t)=sum_(j=1)^Pα_j*e^(-β_j*t) has a nice recursive structure that lends itself to tractable likelihood expressions. When P=1 the kernel is ν(t)=α e^(-β t) and the inverse of the compensator can be expressed in closed-form as a linear combination of exponential functions and the LambertW function having arguments which can be expressed as recursive functions of the jump times.

**Category:** Statistics

[14] **viXra:1211.0094 [pdf]**
*replaced on 2013-01-30 12:59:09*

**Authors:** Stephen Crowley

**Comments:** 41 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function parametrization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the (exponential) Hawkes (univariate and multivariate) process, Autoregressive Conditional Duration(ACD), with both exponential and Weibull distributed errors, and a hybrid model combining the ACD and the exponential Hawkes models. Formulas are also derived, however without the elegant recursions of the exponential kernels, for kernels of the Weibull and Gamma type and comparison of the Weibullfit vs exponential kernel fits viaQQand probability plots are provided. The additional complexity of the Hawkes-Weibull or the ACD-Hawkes appears to not be worth the tradeoff. Diurnal, or daily, adjustment of the deterministic predictable part of the intensity variation via piecewise polynomial splines is discussed. Data from the symbol SPY on three different electronic markets is used to estimate model parameters and generate illustrative plots. The parameters were estimated without diurnal adjustments, a repeat of the analysis with adjustments is due in a future version of this article. The connection of the Hawkes process to quantum theory is briefly mentioned. Prediction of the next point of a Hawkes process is briefly discussed and a closed-form expression in terms of the Lambert W function for the standard exponential kernel with P=1 is calculated.

**Category:** Statistics

[13] **viXra:1211.0094 [pdf]**
*replaced on 2013-01-12 16:33:02*

**Authors:** Stephen Crowley

**Comments:** 34 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function parametrization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the (exponential) Hawkes (univariate and multivariate)process, Autoregressive Conditional Duration(ACD), with both exponential andWeibull distributed errors, and a hybrid model combining the ACD and the exponential Hawkes models. Formulas are also derived, however without the elegant recursions of the exponential kernels, for kernels of the Weibull and Gamma type and comparison of the
Weibull fit vs exponential kernel fits via QQ and probability plots are provided. The additional complexity of the Hawkes-Weibull or the ACD-Hawkes appears to not be worth the tradeoff. Diurnal, or daily, adjustment of the deterministic predictable part of the intensity variation via piecewise polynomial splines is discussed. Data from the symbol SPY on three different electronic markets is used to estimate model parameters and generate illustrative plots. The connection of the Hawkes process to quantum theory is briefly mentioned.

**Category:** Statistics

[12] **viXra:1211.0094 [pdf]**
*replaced on 2012-12-31 16:05:15*

**Authors:** Stephen Crowley

**Comments:** 23 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function parametrization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the Hawkes (univariate and multivariate)process, Autoregressive Conditional Duration(ACD), with both exponential and Weibull distributed errors, and a hybrid model combining the ACD and the Hawkes models. Diurnal, or daily, adjustment of the deterministic predictable part of the intensity variation via piecewise polynomial splines is discussed. Data from the symbol SPY on three different electronic markets is used to estimate model parameters and generate illustrative plots. The parameters were estimated without diurnal adjustments, a repeat of the analysis with adjustments is due in a future version of this article. The connection of the Hawkes process to quantum theory is briefly mentioned. The Hawkes process with a Weibull kernel is also briefly mentioned and will be explored more in the future.

**Category:** Statistics

[11] **viXra:1211.0094 [pdf]**
*replaced on 2012-12-12 10:24:21*

**Authors:** Stephen Crowley

**Comments:** 19 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function parameterization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the Hawkes (univariate and multivariate)process, Autoregressive Conditional Duration(ACD) and a hybrid model combining the ACD and the Hawkes models. Diurnal, or daily, adjustment of the deterministic predictable part of the intensity variation via piecewise polynomial splines is discussed. Data from the symbol SPY on three different electronic markets is used to estimate model parameters and generate illustrative plots. The parameters were estimated without diurnal adjustments, a repeat of the analysis with adjustments is due in a future version of this article. The connection of the Hawkes process to quantum theory is briefly mentioned.

**Category:** Statistics

[10] **viXra:1211.0094 [pdf]**
*replaced on 2012-11-29 12:25:23*

**Authors:** Stephen Crowley

**Comments:** 16 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function parametrization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the Hawkes (univariate and multivariate) process, Autoregressive Conditional Duration(ACD) and a hybrid model combining the ACD and the Hawkes models. Data from the symbol SPY on three different electronic markets is used to estimate model parameters and generate illustrative plots.

**Category:** Statistics

[9] **viXra:1211.0094 [pdf]**
*replaced on 2012-11-22 14:48:59*

**Authors:** Stephen Crowley

**Comments:** 13 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function paramaterization and maximum likelihood estimation from data are explored. Closed-formlog-likelihood expressions are given for the Hawkes (unidimensional andmultidimensional)process, Autoregressive Conditional Duration(ACD), and Log-ACD models. The Autoregressive Conditional Intensity model is also discussed. Data from the symbol SPY on the Nasdaq stock market on Oct 22nd, 2012 is used to estimate model parameters and generate illustrative plots.

**Category:** Statistics

[8] **viXra:1211.0094 [pdf]**
*replaced on 2012-11-19 18:25:11*

**Authors:** Stephen Crowley

**Comments:** 8 Pages.

Definitions from the theory of point processes are recalled. Models of intensity function paramaterization and maximum likelihood estimation from data are explored. Closed-form log-likelihood expressions are given for the Hawkes process, Autoregressive Conditional Duration(ACD), and Log-ACD models. The Autoregressive Conditional Intensity
model is also discussed. Data from the symbol SPY on the Nasdaq stock market on Oct 22nd, 2012 is used to estimate model parameters and generate illustrative plots.

**Category:** Statistics

[7] **viXra:1111.0073 [pdf]**
*replaced on 2012-06-15 13:36:51*

**Authors:** Popon Kangpenkae

**Comments:** 12 Pages.

Abstract. This technical reference presents the functional structure and the algorithmic implementation of KL (Kullback-Leibler) simplex. It details the simplex approximation and fusion. The KL simplex is fundamental, robust, adaptive an informatics agent for computational research in economics, finance, game and mechanism. From this perspective the study provides comprehensive results to facilitate future work in such areas.
Abstract.

**Category:** Statistics

[6] **viXra:1111.0073 [pdf]**
*replaced on 2011-11-25 09:15:23*

**Authors:** Popon Kangpenkae

**Comments:** 12 Pages.

This technical reference presents the functional structure and the algorithmic implementation of KL (Kullback-Leibler) simplex. It details the simplex approximation and fusion. The KL simplex is fundamental, robust, adaptive an informatics agent for computational research in economics, finance, game and mechanism. From this perspective the study provides comprehensive results to facilitate future work in such areas.

**Category:** Statistics

[5] **viXra:1111.0073 [pdf]**
*replaced on 25 Nov 2011*

**Authors:** Popon Kangpenkae

**Comments:** 12 pages

This technical reference presents the functional structure and the algorithmic implementation of KL (Kullback-Leibler) simplex. It details the simplex approximation and fusion. The KL simplex is fundamental, robust, adaptive an informatics agent for computational research in economics, finance, game and mechanism. From this perspective the study provides comprehensive results to facilitate future work in such areas.

**Category:** Statistics

[4] **viXra:1007.0034 [pdf]**
*replaced on 2012-01-11 19:14:18*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:** 14 Pages.

In this paper, we propose a new test of uniformity on the circle based on the
Gini mean difference of the sample arc-lengths. These sample arc-lengths,
which are the gaps between successive observations on the circumference of
the circle, are analogous to sample spacings on the real line. The Gini mean
difference, which compares these arc-lengths between themselves, is
analogous to Rao's spacings statistic, which has been used to test the uniformity
of circular data.
We obtain both the exact and asymptotic distributions of the Gini mean
difference arc-lengths test, under the null hypothesis of circular uniformity.
We also provide a table of upper percentile values of the exact distribution for
small to moderate sample sizes. Illustrative examples in circular data analysis
are also given. It is shown that a generalized Gini mean difference test has
better asymptotic efficiency than the corresponding generalized Rao's test in
the sense of Pitman asymptotic relative efficiency.

**Category:** Statistics

[3] **viXra:1007.0034 [pdf]**
*replaced on 16 Aug 2010*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:** 14 pages.

In this paper, we propose a new test of uniformity on the circle based on the
Gini mean difference of the sample arc-lengths. These sample arc-lengths,
which are the gaps between successive observations on the circumference of
the circle, are analogous to sample spacings on the real line. The Gini mean
difference, which compares these arc-lengths between themselves, is
analogous to Rao's spacings statistic, which has been used to test the uniformity
of circular data.
We obtain both the exact and asymptotic distributions of the Gini mean
difference arc-lengths test, under the null hypothesis of circular uniformity.
We also provide a table of upper percentile values of the exact distribution for
small to moderate sample sizes. Illustrative examples in circular data analysis
are also given. It is shown that a generalized Gini mean difference test has
better asymptotic efficiency than the corresponding generalized Rao's test in
the sense of Pitman asymptotic relative efficiency.

**Category:** Statistics

[2] **viXra:1006.0046 [pdf]**
*replaced on 2012-01-11 19:16:15*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:** 23 Pages.

In this paper, we investigate the asymptotic theory for U-statistics based
on sample spacings, i.e. the gaps between successive observations. The
usual asymptotic theory for U-statistics does not apply here because spacings
are dependent variables. However, under the null hypothesis, the uniform
spacings can be expressed as conditionally independent Exponential random
variables. We exploit this idea to derive the relevant asymptotic theory both
under the null hypothesis and under a sequence of close alternatives.
The generalized Gini mean difference of the sample spacings is a prime
example of a U-statistic of this type. We show that such a Gini spacings test
is analogous to Rao's spacings test. We find the asymptotically locally most
powerful test in this class, and it has the same efficacy as the Greenwood
statistic.

**Category:** Statistics

[1] **viXra:1006.0046 [pdf]**
*replaced on 16 Aug 2010*

**Authors:** David D. Tung, S. Rao Jammalamadaka

**Comments:**
23 pages.

In this paper, we investigate the asymptotic theory for U-statistics based
on sample spacings, i.e. the gaps between successive observations. The
usual asymptotic theory for U-statistics does not apply here because spacings
are dependent variables. However, under the null hypothesis, the uniform
spacings can be expressed as conditionally independent Exponential random
variables. We exploit this idea to derive the relevant asymptotic theory both
under the null hypothesis and under a sequence of close alternatives.
The generalized Gini mean difference of the sample spacings is a prime
example of a U-statistic of this type. We show that such a Gini spacings test
is analogous to Rao's spacings test. We find the asymptotically locally most
powerful test in this class, and it has the same efficacy as the Greenwood
statistic.

**Category:** Statistics