1

I have 200 neural networks which I trained using transfer learning on text. They all share the same weights except for their heads which are trained on different tasks. Is it possible to merge those networks into a single model to use with Tensorflow such that when I call it with input (text, i) it returns me the prediction task i. The idea here is to only store the shared weights once to save on model size and also to only evaluate the head of the task we want to predict in order to save on computations. The important bit is to wrap all of that into a Tensorflow model as I want to make it easier to serve it on google-ai-platform .

Note: It is fine to train all the heads independently, I just want to put all of them together into a single model for the inference part

asked Sep 2, 2022 at 16:02
3
  • You could add multiple heads to your shared model, load the weights to each one of them. When you do inference you can select the right prediction with the right index. The output of your model, having multiple heads is probably a list where each element is a prediction / logits from one of your outputs. So if you know which head you want to select you can just index the output predictions[head_i] Commented Sep 2, 2022 at 17:13
  • Hey thanks for your comments. That looks like and inteesting idea. In order to save time on inference, are you aware of a way to make sure we only evaluate the code from the head we are interested into (i.e. we freeze the other heads) Commented Sep 4, 2022 at 12:03
  • Well, if you are just doing inference there is no need to freeze the other heads, since you are not updating the weights. Just make sure to obtain the results setting training=False while calling your model. I'm not entirely sure, but I think it's not that bad, performance-wise, to obtain predictions even from other heads and just ignore them. Commented Sep 4, 2022 at 12:21

1 Answer 1

1

You probably have a model like the following:

# Create the model
inputs = Input(shape=(height, width, channels), name='data')
x = layers.Conv2D(...)(inputs)
# ...
x = layers.GlobalAveragePooling2D(name='penultimate_layer')(x)
x = layers.Dense(num_class, name='task0', ...)(x)
model = models.Model(inputs=inputs, outputs=[x])

Until now the model only has one output. You can add multiple outputs at model creation, or later on. You can add a new head like this:

last_layer = model.get_layer('penultimate_layer').output
output_heads = []
taskID = 0
while True:
 try:
 head = model.get_layer("task"+str(taskID))
 output_heads.append(head.output)
 taskID += 1
 except:
 break
# add new head
new_head = layers.Dense(num_class, name='task'+str(taskID), ...)(last_layer)
output_heads.append(new_head)
model = models.Model(inputs=model.input, outputs=output_heads)

Now since every head has a name you can load your specific weights, calling the head by name. The weights to load are the weights of the last layer of (an)other_model. You should have something like this:

model.get_layer("task0").set_weights(other_model.layers[-1].get_weights())

When you want to obtain predictions, all you need to know is the task ID of the head you want to look at:

taskID=0 # obtain predictions from head 0
outputs = model(test_data, training=False)
predictions = outputs[taskID]

If you want to train new heads later on, while still sharing the same backbone, you just have to freeze the other heads, otherwise even those will be trained, and you don't want that:

for layer in model.layers:
 if "task" in layer.name:
 layer.trainable = False
# code to add the new head ...

Training new tasks, so a new set of classes, in a later moment is called task-incremental learning. The major issue with this is catastrophic forgetting: it is pretty easy to still forget prior knowledge while training new tasks. Even if the heads are frozen, the backbone obviously isn't. If you do this you'll have to apply some technique to avoid this.

answered Sep 4, 2022 at 13:00
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer, it gives me a good idea on what to do. Is there some cases when we want to keep the backbone frozen when we train for new tasks ?
Yes you definitely can try it out. This ensures not forgetting old tasks: actually the accuracy will remain the same. However wether or not you’ll learn the new task will depend a lot from your data. If you have very different tasks it might be harder to learn the new one just by fine-tuning the head. Personally I would give it a try

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.