Statistics

1604 Submissions

[2] viXra:1604.0302 [pdf] submitted on 2016-04-22 01:25:58

From Worst to Most Variable? Only the Worst Performers May be the Most Informative

Authors: Bradly Alicea
Comments: 13 pages, 7 Figures, 2 Supplemental Figures. Full dataset can be found at doi:10.6084/m9.figshare.944542

What makes a good prediction good? Generally, the answer is thought to be a faithful accounting of both tangible and intangible factors. Among sports teams, it is thought that if you get enough of the tangible factors (e.g. roster, prior performance, schedule) correct, then the predictions will be correspondingly accurate. While there is a role for intangible factors, they are thought to gum up the works, so to speak. Here, I start with the hypothesis that the best and worst teams in a league or tournament are easy to predict relative to teams with average performance. Data from the 2013 MLB and NFL seasons plus data from the 2014 NCAA Tournament were used. Using a model-free approach, data representing various aspects of competition reveal that mainly the teams predicted to perform the worst actually conform to expectation. The reasons for this are then discussed, including the role of shot noise on performance driven by tangible factors.
Category: Statistics

[1] viXra:1604.0009 [pdf] replaced on 2017-02-25 01:17:19

Estimating Spatial Averages of Environmental Parameters Based on Mobile Crowdsensing

Authors: Ioannis Koukoutsidis
Comments: 31 Pages. A short version of this article was published in Proceedings of SENSORNETS 2017 (February 2017). The final corrected version was published in ACM Transactions on Sensor Networks (TOSN), Vol. 14, Issue 1, Art. 2, December 2017

Mobile crowdsensing can facilitate environmental surveys by leveraging sensor-equipped mobile devices that carry out measurements covering a wide area in a short time, without bearing the costs of traditional field work. In this paper, we examine statistical methods to perform an accurate estimate of the mean value of an environmental parameter in a region, based on such measurements. The main focus is on estimates produced by considering the mobile device readings at a random instant in time. We compare stratified sampling with different stratification weights to sampling without stratification, as well as an appropriately modified version of systematic sampling. Our main result is that stratification with weights proportional to stratum areas can produce significantly smaller bias, and gets arbitrarily close to the true area average as the number of mobiles increases, for a moderate number of strata. The performance of the methods is evaluated for an application scenario where we estimate the mean area temperature in a linear region that exhibits the so-called Urban Heat Island effect, with mobile users moving in the region according to the Random Waypoint Model.
Category: Statistics