Introduction to Agnentropy

Russell Leidich

Introduction to Agnentropy

Claude Shannon[1] devised a way to quantify the information entropy[2] of a finite integer set, given the probabilities of finding each integer in the set. Information entropy, hereinafter simply "entropy", refers to the number of bits required to encode some such set in a given numerical base (usually binary). Unfortunately, his formula for the "Shannon entropy" seems to have been widely misappropriated as a means by which to measure the entropy of such sets by supplanting the probability coefficients (which are generally unknowable) with the normalized frequencies of the integers as they actually occur in the set. This practice is so common that Shannon entropy is often defined in precisely this manner, and indeed this is how we define it here. However, the inaccuracy induced by this compromise may lead to erroneous conclusions, especially when very short or faint signals are concerned. To make matters worse, the numerical behavior of Shannon entropy formula is rather unstable over large sets, where otherwise it would be more accurate. Herein we introduce the concept of agnentropy, short for "agnostic entropy", in the sense of an entropy metric which begins with almost no assumptions about the set under analysis. (Technically, it's a "divergence" -- essentially a Kullback-Leibler divergence[3] without the implicit singularies -- because it fails the triangle inequality. We refer to it as a "metric" only in the qualitative sense that it measures something.) This stands in stark contrast to the (compromised) Shannon entropy, which presupposes that the frequencies of integers within a given set are already known. In addition to being more accurate when used appropriately, agnentropy is also more numerically stable and faster to compute than Shannon entropy. To be precise, Shannon entropy does not measure the number of bits in an invertibly compressed code. It is, more accurately, an underestimation of that value. Unfortunately, the margin of underestimation is not straightforwardly computable, and has a size O(Z), where Z is the number of unique integers in the set, assuming that said integers are of predetermined maximum size. By contrast, agnentropy underestimates that bit count by no more than 2, plus the size of 2 logplexes. (Logplexes are universal (affine) codes introduced in [8].) In practice, this overhead amounts to tens of bits, as opposed to potentially thousands of bits for Shannon. This difference has meaningful ramifications for the optimization of both lossless and lossy compression algos.

Comments: 28 Pages.

Download: PDF

Submission history

[v1] 2017-05-11 21:39:13

Unique-IP document downloads: 103 times

Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Digital Signal Processing

Introduction to Agnentropy

Submission history