An instance of the dataset ? I guess it can be a new
instance that is
not in the dataset already, right ?
In the meaning of "an unseen one" yes.
I seem not to understand something
but I don't know exactly what .. is there some documentation that I
could read on that ? Why when I create a new instance I have to set the
whole dataset to it ? I figured it has to do with knowing the format but
can't one describe the format of an instance outwith an actual dataset ?
An Instance object is just a container for a double array of the actual
attribute values and the weight of this instance. For information about
the actual attributes it needs access to the Instances object it belongs
to. The attributes are ONLY stored in the Instances object. E.g., for
nominal attributes you need to know which actual value (i.e., the
nominal label) the double value in the array stands for.
instance, you need to provide values for all attributes.
and just to make sure when
u say attributes you mean I should leave the
class empty, using the setClassMissing right ? Like the code does it for
the evaluation of the classifier (this doesn't get filtered with the
ReplaceMissingValues I hope!)
Interestingly enough the distributionForInstance
works either with an
instance with raw data (numeric) or one with nominal values (of course I
had to use the same names that the auto filter created and also set the
dataset of the instance to the filtered dataset which I don't understand
but anyway) and I was happy that somehow the appropriate method looks in
what interval the values fall and use the nominal values before
returning the distribution ?
The normalize method gets called in the distributionForInstance method
to ensure to have only nominal values.
But I am not sure .. because when I use the bif xml
file to read the bn
then the distributionForInstance only works if I give instance already
with the nominal values.
The Discretization filter is not initialized if you read a BIF file.
So digging into the code I realise it is because the
m_DiscretizeFilter would be true for the BN as I was creating it from
scratch but of course not for the bifreader bn.
So what I wonder is
(1) how exactly does my instance gets discretized. I understand it
uses the supervised filtered that was created before but now does it
just check the values and assigns them or is it likely to affect the
actual filter ?
The Discretize filter (and ReplaceMissingValues) gets setup with each
call of the buildClassifier method. Subsequent calls of
distributionForInstance only use the filter as it's setup, they don't
modify it anymore.
(2) can't I somehow re-use/save previous
discretisation or is there
code in weka to help me with sets to convert my self numeric attibutes
of instances to the corresponding nominal ones ?
Instead of letting the classifier to the discretizing for you, you
should discretize it yourself. Then you're independent of the
discretization that happens inside the classifier and you can also load
BIF files without having problems with uninitialized filters. You can
then also serialize the filter and re-use again later.
Here's more about using filters:
More about serialization:
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
+64 (7) 838-4466 Ext. 5174