Active Learning Yields Poor Results in Multi-Label Task #191

Open

Description

opened

on Sep 10, 2024

I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model.
For the same dataset, I apply both active learning (using minimum confidence and average confidence strategies) and random sampling. I select the same number of samples in both strategies, but the results from random sampling are significantly better than those from the active learning approach. I would like to know if this discrepancy might be due to an issue with my code or the modAL library's handling of multi-label classification. Below is my active learning loop:

for i in range(n_queries):
 if i == 12:
 n_instances = X_pool.shape[0]
 else:
 n_instances = batch(int(np.ceil(np.power(10, POWER))), BATCH_SIZE)
 print(f"\nQuery {i + 1}: Requesting {n_instances} samples from a pool of size {X_pool.shape[0]}")
 if X_pool.shape[0] < n_instances:
 print("Not enough samples left in the pool to query the desired number of instances.")
 break
 query_idx, _ = learner.query(X_pool, n_instances=n_instances)
 query_idx = np.unique(query_idx)
 if len(query_idx) == 0:
 print("No indices were selected, which may indicate an issue with the query function or pool.")
 continue
 # Add the newly selected samples to the cumulative training set
 cumulative_X_train.append(X_pool[query_idx])
 cumulative_y_train.append(y_pool[query_idx])
 # Concatenate all the samples to form the cumulative training data
 X_train_cumulative = np.concatenate(cumulative_X_train, axis=0)
 y_train_cumulative = np.concatenate(cumulative_y_train, axis=0)
 learner.teach(X_train_cumulative, y_train_cumulative)
 # Log the selected sample names
 selected_sample_names = train_df.loc[query_idx, "image"].tolist()
 print(f"Selected samples in Query {i + 1}: {selected_sample_names}")
 with open(samples_log_file, mode='a', newline='') as f:
 writer = csv.writer(f)
 writer.writerow([i + 1] + selected_sample_names)
 # Remove the selected samples from the pool
 X_pool = np.delete(X_pool, query_idx, axis=0)
 y_pool = np.delete(y_pool, query_idx, axis=0)
 # Evaluate the model
 y_pred = learner.predict(X_test_np)
 accuracy = accuracy_score(y_test_np, y_pred)
 f1 = f1_score(y_test_np, y_pred, average='macro')
 acc_test_data.append(accuracy)
 f1_test_data.append(f1)
 print(f"Accuracy after query {i + 1}: {accuracy}")
 print(f"F1 Score after query {i + 1}: {f1}")
 # Early stopping logic
 if f1 > best_f1_score:
 best_f1_score = f1
 wait = 0
 else:
 wait += 1
 if wait >= patience:
 print(f"Stopping early after {i + 1} queries due to no improvement in F1 score.")
 break
 total_samples += len(query_idx)
 print(f"Total samples used for training after query {i + 1}: {total_samples}")
 POWER += 0.25
 torch.cuda.empty_cache()

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Active Learning Yields Poor Results in Multi-Label Task #191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions