We are using classification algorithms. We are not very sure about how to build the training and testing files. We believe that, in the traininng file, you should not include the instances appearing to the testing file. We are thinking about two possibilities (we don't know if they are correct):
(1) In the training file, include data from users whose predictions won't be made. In the testing file, include the data from users whose predictions will be made (this will have missing values for the items to predict, and real values for the items that the user has already rated)
(2) In the training file, include data from all the users. The items to be predicted, for some users, will have missing values. In the testing file, include only the users whose predicions will made (again, there will be missing values for the items to predict, and real values for the items that the user has already rated). This means that we are duplicating some information in both files.
We appreciate any help.
Get your free email from http://www.graffiti.net
Powered by Outblaze
I am running Id3 classifier agains a data set that has no missing class
values or attribute values but I still get some a number for
Unclassified instances in the evaluation results.
I wonder why I get unclassified instances while all the data I have (both
training and testing data) is without any unclassified instances?
Any answer will be appreciated