Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Active Learning Yields Poor Results in Multi-Label Task #191

Open
@shadikhamsehh

Description

I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model.
For the same dataset, I apply both active learning (using minimum confidence and average confidence strategies) and random sampling. I select the same number of samples in both strategies, but the results from random sampling are significantly better than those from the active learning approach. I would like to know if this discrepancy might be due to an issue with my code or the modAL library's handling of multi-label classification. Below is my active learning loop:

for i in range(n_queries):
 if i == 12:
 n_instances = X_pool.shape[0]
 else:
 n_instances = batch(int(np.ceil(np.power(10, POWER))), BATCH_SIZE)
 print(f"\nQuery {i + 1}: Requesting {n_instances} samples from a pool of size {X_pool.shape[0]}")
 if X_pool.shape[0] < n_instances:
 print("Not enough samples left in the pool to query the desired number of instances.")
 break
 query_idx, _ = learner.query(X_pool, n_instances=n_instances)
 query_idx = np.unique(query_idx)
 if len(query_idx) == 0:
 print("No indices were selected, which may indicate an issue with the query function or pool.")
 continue
 # Add the newly selected samples to the cumulative training set
 cumulative_X_train.append(X_pool[query_idx])
 cumulative_y_train.append(y_pool[query_idx])
 # Concatenate all the samples to form the cumulative training data
 X_train_cumulative = np.concatenate(cumulative_X_train, axis=0)
 y_train_cumulative = np.concatenate(cumulative_y_train, axis=0)
 learner.teach(X_train_cumulative, y_train_cumulative)
 # Log the selected sample names
 selected_sample_names = train_df.loc[query_idx, "image"].tolist()
 print(f"Selected samples in Query {i + 1}: {selected_sample_names}")
 with open(samples_log_file, mode='a', newline='') as f:
 writer = csv.writer(f)
 writer.writerow([i + 1] + selected_sample_names)
 # Remove the selected samples from the pool
 X_pool = np.delete(X_pool, query_idx, axis=0)
 y_pool = np.delete(y_pool, query_idx, axis=0)
 # Evaluate the model
 y_pred = learner.predict(X_test_np)
 accuracy = accuracy_score(y_test_np, y_pred)
 f1 = f1_score(y_test_np, y_pred, average='macro')
 acc_test_data.append(accuracy)
 f1_test_data.append(f1)
 print(f"Accuracy after query {i + 1}: {accuracy}")
 print(f"F1 Score after query {i + 1}: {f1}")
 # Early stopping logic
 if f1 > best_f1_score:
 best_f1_score = f1
 wait = 0
 else:
 wait += 1
 if wait >= patience:
 print(f"Stopping early after {i + 1} queries due to no improvement in F1 score.")
 break
 total_samples += len(query_idx)
 print(f"Total samples used for training after query {i + 1}: {total_samples}")
 POWER += 0.25
 torch.cuda.empty_cache()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /