Nca temp by wdevazelhes · Pull Request #1 · wdevazelhes/scikit-learn

wdevazelhes · 2017-10-27T14:27:19Z

No description provided.

bellet

Here are some comments!

bellet · 2017-10-27T15:00:06Z

sklearn/neighbors/nca.py

+    Neighborhood Component Analysis (NCA) is a machine learning algorithm for
+    metric learning. It learns a linear transformation in a supervised fashion
+    to improve the classification accuracy of a stochastic nearest neighbors
+    rule in the new space.


transformed space

bellet · 2017-10-27T15:01:04Z

sklearn/neighbors/nca.py

+
+        As NCA is optimizing a non-convex objective function, it will
+        likely end up in a local optimum. Several runs with independent random
+        init might be necessary to get a good convergence.


Is this a standard warning used in other sklearn classes? LMNN does not have it (and is also nonconvex when solving for the linear transformation)

You are right, I just saw it for k-means as a note, and thought it could be useful, but warning is surely too much, and even a note may not be necessary, for consistency with lmnn.

bellet · 2017-10-27T15:02:08Z

sklearn/neighbors/nca.py

+           "Neighbourhood Components Analysis". Advances in Neural Information
+           Processing Systems. 17, 513-520, 2005.
+           http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf
+    """


you can also add a reference to the Wikipedia page
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis

bellet · 2017-10-27T15:03:13Z

sklearn/neighbors/nca.py

+           http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf
+    """
+
+    def __init__(self, n_features_out=None, init='identity', max_iter=50,


default init for LMNN is pca, it is probably better to have the same behavior for consistency

bellet · 2017-10-27T15:05:49Z

sklearn/neighbors/nca.py

+from ..externals.six import integer_types
+
+
+class NeighborhoodComponentAnalysis(BaseEstimator, TransformerMixin):


I think the original name in the paper (also used in the Wikipedia page) has an 's' at the end of component. probably to change the class name and other occurrences

bellet · 2017-10-30T10:33:29Z

sklearn/neighbors/nca.py

+            soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares))
+            ci = masks[:, y[i]]
+            p_i_j = soft[ci]
+            not_ci = np.logical_not(ci)


can't we just use ~ci and avoid defining this variable?

Yes you are right, I changed it

bellet · 2017-10-30T10:38:58Z

sklearn/neighbors/nca.py

+            not_ci = np.logical_not(ci)
+            diff_ci = diffs[i, ci, :]
+            diff_not_ci = diffs[i, not_ci, :]
+            sum_ci = diff_ci.T.dot(


again, sum_ci and sum_not_ci are distances

bellet · 2017-10-30T10:40:33Z

sklearn/neighbors/nca.py

+        # for every sample, compute its contribution to loss and gradient
+        for i in range(X.shape[0]):
+            diff_embedded = X_embedded[i] - X_embedded
+            sum_of_squares = np.einsum('ij,ij->i', diff_embedded,


this variable contains distances (in embedded space), maybe it's good to have this in the variable name for clarity

Done: I called them dist_embedded

bellet · 2017-10-30T10:41:16Z

sklearn/neighbors/nca.py

+            sum_of_squares = np.einsum('ij,ij->i', diff_embedded,
+                                       diff_embedded)
+            sum_of_squares[i] = np.inf
+            soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares))


again maybe find a name which makes it clear that these are exponentiated distances

Done: I called them exp_dist_embedded

bellet · 2017-10-30T10:43:36Z

sklearn/neighbors/nca.py

+        X_embedded = transformation.dot(X.T).T
+
+        # for every sample, compute its contribution to loss and gradient
+        for i in range(X.shape[0]):


maybe a bit more comments describing the steps below would be useful, since this is the heart of the algorithm

I added some, but I will continue to do so, maybe with big O notations etc ...

#1 (review)

nvauquie · 2017-10-31T12:25:16Z

sklearn/neighbors/tests/test_nca.py

+    y = random_state.randint(0, n_labels, (n_samples))
+    point = random_state.randn(num_dims, n_features)
+    X = random_state.randn(n_samples, n_features)
+    nca = NeighborhoodComponentsAnalysis(None, init=point)


A quoi sert ce premier paramètre None ?

Ah oui effectivement il ne devrait pas être là... Je crois qu'il est apparu en refactorant

#1 (review)

* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI

William de Vazelhes added 2 commits October 27, 2017 16:05

minor corrections in docstring

010a211

minor corrections in docstring

9f1157d

wdevazelhes requested review from bellet and nvauquie October 27, 2017 14:27

William de Vazelhes added 2 commits October 27, 2017 17:01

remove comment

22838c5

Add verbose during iterations

8a9ff5b

bellet reviewed Oct 30, 2017

View reviewed changes

William de Vazelhes added 3 commits October 31, 2017 10:22

Update code according to code review:

76f3544

#1 (review)

Remove _make_masks and use OneHotEncoder instead

0cee29d

precise that distances are squared

507b763

nvauquie reviewed Oct 31, 2017

View reviewed changes

William de Vazelhes added 3 commits October 31, 2017 13:33

simplify tests and remove None parameter in nca

f6d0973

remove useless None

4c0494d

simplify tests

fb8cefa

wdevazelhes pushed a commit that referenced this pull request Nov 2, 2017

Update code according to code review:

42e078a

#1 (review)

		from ..externals.six import integer_types


		class NeighborhoodComponentAnalysis(BaseEstimator, TransformerMixin):

Conversation

wdevazelhes commented Oct 27, 2017

Uh oh!

bellet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants