On 11/01/2015, at 6:26 pm, Fernando Bugni
I know that: InfoGain(Class,Attribute) = H(Class) - H(Class | Attribute)
and there's a property for entropy: 0 <= H <= log_a (n) ... where (i think) a =
2 and n = number of samples.
But I don't know know to use this propery in order to calculate the range of
InformationGain. If I have a classification in two groups, how could use this property to
calculate H(class) and H(class | Attribute) ?
The minimum information gain is zero, when H(Class) = H(Class | Attribute).
The maximum is achieved when H(Class | Attribute) = 0.
Entropy is maximal when all classes are equally likely, in which case it is log_b(c),
where b = 2 (if entropy is calculated in bits) and c is the *NUMBER OF CLASS VALUES*.
In the two-class case, the maximum info gain is 1 bit (and occurs when both classes are
equally likely a priori, before the attribute is considered).
However, in most datasets, not all classes are equally likely a priori, so H(Class) will
be smaller than log_b(c).
You can calculate H(Class) in WEKA, in the Classify panel, by running any classifier
(e.g., ZeroR), setting “Use training set” for evaluation, and “Output entropy evaluation
measures” under “More options…”.
For example, running ZeroR on the iris data gives:
=== Summary ===
Correctly Classified Instances 50 33.3333 %
Incorrectly Classified Instances 100 66.6667 %
Kappa statistic 0
K&B Relative Info Score 0 %
K&B Information Score 0 bits 0 bits/instance
Class complexity | order 0 237.7444 bits 1.585 bits/instance
Class complexity | scheme 237.7444 bits 1.585 bits/instance
Complexity improvement (Sf) 0 bits 0 bits/instance
Mean absolute error 0.4444
Root mean squared error 0.4714
Relative absolute error 100 %
Root relative squared error 100 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 100 %
Total Number of Instances 150
"Class complexity | order 0" gives you H(Class). It is 1.585 bits/instance for
the iris data (rounded) because, for this data, all three classes are equally likely, so