Conversation
sklearn/neighbors/nca.py
Outdated
| Neighborhood Component Analysis (NCA) is a machine learning algorithm for | ||
| metric learning. It learns a linear transformation in a supervised fashion | ||
| to improve the classification accuracy of a stochastic nearest neighbors | ||
| rule in the new space. |
sklearn/neighbors/nca.py
Outdated
|
|
||
| As NCA is optimizing a non-convex objective function, it will | ||
| likely end up in a local optimum. Several runs with independent random | ||
| init might be necessary to get a good convergence. |
There was a problem hiding this comment.
Is this a standard warning used in other sklearn classes? LMNN does not have it (and is also nonconvex when solving for the linear transformation)
There was a problem hiding this comment.
You are right, I just saw it for k-means as a note, and thought it could be useful, but warning is surely too much, and even a note may not be necessary, for consistency with lmnn.
| "Neighbourhood Components Analysis". Advances in Neural Information | ||
| Processing Systems. 17, 513-520, 2005. | ||
| http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
| """ |
There was a problem hiding this comment.
you can also add a reference to the Wikipedia page
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
sklearn/neighbors/nca.py
Outdated
| http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
| """ | ||
|
|
||
| def __init__(self, n_features_out=None, init='identity', max_iter=50, |
There was a problem hiding this comment.
default init for LMNN is pca, it is probably better to have the same behavior for consistency
sklearn/neighbors/nca.py
Outdated
| from ..externals.six import integer_types | ||
|
|
||
|
|
||
| class NeighborhoodComponentAnalysis(BaseEstimator, TransformerMixin): |
There was a problem hiding this comment.
I think the original name in the paper (also used in the Wikipedia page) has an 's' at the end of component. probably to change the class name and other occurrences
sklearn/neighbors/nca.py
Outdated
| soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) | ||
| ci = masks[:, y[i]] | ||
| p_i_j = soft[ci] | ||
| not_ci = np.logical_not(ci) |
There was a problem hiding this comment.
can't we just use ~ci and avoid defining this variable?
There was a problem hiding this comment.
Yes you are right, I changed it
| not_ci = np.logical_not(ci) | ||
| diff_ci = diffs[i, ci, :] | ||
| diff_not_ci = diffs[i, not_ci, :] | ||
| sum_ci = diff_ci.T.dot( |
There was a problem hiding this comment.
again, sum_ci and sum_not_ci are distances
sklearn/neighbors/nca.py
Outdated
| # for every sample, compute its contribution to loss and gradient | ||
| for i in range(X.shape[0]): | ||
| diff_embedded = X_embedded[i] - X_embedded | ||
| sum_of_squares = np.einsum('ij,ij->i', diff_embedded, |
There was a problem hiding this comment.
this variable contains distances (in embedded space), maybe it's good to have this in the variable name for clarity
There was a problem hiding this comment.
Done: I called them dist_embedded
sklearn/neighbors/nca.py
Outdated
| sum_of_squares = np.einsum('ij,ij->i', diff_embedded, | ||
| diff_embedded) | ||
| sum_of_squares[i] = np.inf | ||
| soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) |
There was a problem hiding this comment.
again maybe find a name which makes it clear that these are exponentiated distances
There was a problem hiding this comment.
Done: I called them exp_dist_embedded
| X_embedded = transformation.dot(X.T).T | ||
|
|
||
| # for every sample, compute its contribution to loss and gradient | ||
| for i in range(X.shape[0]): |
There was a problem hiding this comment.
maybe a bit more comments describing the steps below would be useful, since this is the heart of the algorithm
There was a problem hiding this comment.
I added some, but I will continue to do so, maybe with big O notations etc ...
sklearn/neighbors/tests/test_nca.py
Outdated
| y = random_state.randint(0, n_labels, (n_samples)) | ||
| point = random_state.randn(num_dims, n_features) | ||
| X = random_state.randn(n_samples, n_features) | ||
| nca = NeighborhoodComponentsAnalysis(None, init=point) |
There was a problem hiding this comment.
A quoi sert ce premier paramètre None ?
There was a problem hiding this comment.
Ah oui effectivement il ne devrait pas être là... Je crois qu'il est apparu en refactorant
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
No description provided.