[Wekalist] Re: how weka predicts in logistic regression?
ouyeyu panyu
ouyeyu at gmail.com
Sat Jul 28 05:46:14 NZST 2012
Anyone can help me on this issue?
2012/7/26 ouyeyu panyu <ouyeyu at gmail.com>
> Hi there,
>
> Do you know how weka predicts in logistic regression?
> The common formula to calculate prediction probability is as below.
> [image: \mathbb{E}[Y_i|\mathbf{X}_i] = p_i =
> \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) =
> \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}] However, by putting
> weka model coefficients and test data into this formula, I can NOT get same
> probability as Weka yields.
>
> An example is as below.
> *model_coeffs test_data*
> =============================================
> Intercept -4.440248
> price 0.00000271591 474700
> sqft -0.0003140287 1693
> ltv_new 0.6966173 2
> age_cat -0.001434963 4
> current_hold_days -0.0001069998 1947
> appr_since_last_sqr 0.002447026 0.0961
>
> BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 + 0.6966173*2 -
> -0.001434963*4 - -0.0001069998*1947 + 0.002447026*0.0961 = -3.779602634944
>
> probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223
>
> However, the probabilityOfTestData that weka's distributionForInstance
> method yielded is 0.07563123767893659.
>
> Do you know why the two values are so different?
> This issue is important to my work, any idea would be appreciated.
> Thanks in advance.
>
>
> 2012/7/25 ouyeyu panyu <ouyeyu at gmail.com>
>
>> *Hi Mark and Michael,*
>>
>>
>> I'm trying to do logistic regression via WEKA.
>>
>> First I trained the model against trainData.
>> val cf_train: Logistic = new Logistic();
>> cf_train.buildClassifier(trainData);
>>
>> WEKA generated a model as below.
>> Variable
>> ==========================================
>> price 0
>> sqft 0.0003
>> nod -2.8426
>> ltv -0.6966
>> age 0.0014
>> chd 0.0001
>> asl -0.0024
>> Intercept 4.4403
>>
>>
>> Then I do prediction against testData.
>> for (i <- 0 to (testData.size()-1))
>> {
>> val predictedValue: Array[Double] =
>> cf_train.distributionForInstance(testData.get(i));
>> }
>> For a specific record, its predictedValue = 0.07563123767893659.
>>
>> I want to verify if this value is correct, so I did some manual
>> calculation by substituting the trained coefficients into the following
>> formula
>> manual_value = intercept + price*testPrice + sqft*testSqft + nod*testNod
>> + ltv*testLtv + age*testAge + chd*testChd + asl*testAsl
>> = 4.4403 + 0 + 1693*0.0003
>> + 0 -0.6966*2 + 0.0014*4 + 0.0001*1947
>> -0.0024*0.0961
>> = 3.75506936
>>
>> So my manual_value is totally different from weka's predictedValue.
>> Do you know why this happens?
>> Thanks in advance.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20120727/2b978da5/attachment.html>
More information about the Wekalist
mailing list