My question is regarding the LR cost function from andrews ML course (http://feature-space.com/en/document50.pdf , page -5)
$cost= \frac{1}{m}[ -y \times \log(\psi) - (1-y) \times \log(\kappa) ]$
The vector y holds values for the digits (1-10), so if we plug these values in the cost function then the cost function takes ambiguous values. For instance if y=5, then the cost function will have both the parameters
$cost (y=4) = \frac{1}{m} [ -4 \times \log(\psi) -(1+4) \times \log(\kappa) ]$
As per Andrew's lecture I remember him saying that only one of the log terms would remain inside the cost function as if classified correct $\log(\psi)$ term remains else $\log(\kappa)$ remains.
Please help me where I'm getting it wrong.
-
2$\begingroup$ Remember that y is a vector in this case, not a scalar. $\endgroup$Bar– Bar2015年06月10日 13:21:56 +00:00Commented Jun 10, 2015 at 13:21
1 Answer 1
$y$ always takes on values of 1 or 0, as you noted. For the multi-class problem, you're going to solve for the "one vs. all" case. You'll need to transform your $y$ vector into a vector of 1's and 0's depending on the class you are minimizing for. So for the number 5, you'll solve for $P(y=5)$ vs. $P(y \ne 5)$. You repeat that for all your digits. You come up with 10 different $h_{i}{\theta}$'s, i.e. $h_{1i}{\theta},ドル $h_{2i}{\theta},ドル ...
Also, the following is from your link at the bottom of page 8:
When training the classifier for class k ∈ {1, ..., K}, you will want a m-dimensional vector of labels y, where yj ∈ 0, 1 indicates whether the j-th training instance belongs to class k (yj = 1), or if it belongs to a different class (yj = 0).