-
Couldn't load subscription status.
- Fork 229
-
Hello, I wanted to open this up in discussion before making an issue out of it. If you run this code in Google Collab, my input space does not seem to transform or if I was doing this wrong.
import metric_learn import numpy as np import pandas as pd from sklearn.datasets import make_friedman1 lmnn = metric_learn.LMNN() # fit the data! def friedman_np_to_df(X,y): return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y) # Make training set. We don't care about Y so call it NA. X_train, NA1 = make_friedman1(n_samples=1000, n_features=5, random_state = 1) X_train, NA1 = friedman_np_to_df(X_train,NA1) #categorize training set based off of x0 domain_list = [] for i in range(len(X_train)): if X_train.iloc[i]['x0'] < 0.6 : domain_list.append(1) else: domain_list.append(0) X_train['domain'] = domain_list # Set training set to where domain == 1 (x0 < 0.6), but also add ~60 out-of-domain samples (X_train['domain'] == 1 ) n = 10 out_of_domain = X_train[X_train['domain'] == 0][:n] X_train = X_train[X_train['domain']==1] X_train = pd.concat([out_of_domain, X_train]) y_train = X_train.copy() X_train = X_train.drop(columns = ['domain']) y_train = y_train['domain'] # Make testing set with a different random_state X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3) X_test, NA2 = friedman_np_to_df(X_test,NA2) #categorize testing set based off of x0 domain_list = [] for i in range(len(X_test)): if X_test.iloc[i]['x0'] < 0.6: domain_list.append(1) else: domain_list.append(0) X_test['domain'] = domain_list y_test = X_test['domain'].copy() X_test = X_test.drop(columns = ['domain']) lmnn.fit(np.array(X_train), np.array(y_train)) # transform our input space X_lmnn = lmnn.transform(np.array(X_test))
Then in a new cell run the following. You can see they are the same:
X_lmnn[:5], X_test[:5]
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
@angelotc thanks for reporting this! What version of metric-learn are you using ?
Running your script on colab with the latest metric-learn version from pip, I didn't obtained the same X_lmnn and X_test (note that I obtained the same result (i.e. different arrays) (on at least on those 5 first rows) with the LMNN PR #309 ):
X_lmnn[:5], X_test[:5] (array([[ 2.48105611, 0.90078992, -0.12757663, 0.82172106, 1.26216829], [ 4.23874062, -0.04583753, 0.29213554, -0.02094142, 0.68556641], [ 0.31345599, 0.586104 , 0.04540811, 0.42523969, 0.72904908], [ 2.88944521, -0.1862227 , 0.34355608, 0.29715756, 0.51748561], [ 1.37856497, 0.9168011 , 0.07280986, 0.203493 , 0.66506381]]), x0 x1 x2 x3 x4 0 0.550798 0.708148 0.290905 0.510828 0.892947 1 0.896293 0.125585 0.207243 0.051467 0.440810 2 0.029876 0.456833 0.649144 0.278487 0.676255 3 0.590863 0.023982 0.558854 0.259252 0.415101 4 0.283525 0.693138 0.440454 0.156868 0.544649)
Also, adding the "verbose" flag, iterations are happening and the algorithm seems to be training (the objective values improves)
Beta Was this translation helpful? Give feedback.