Non-linear fluorescence for simulation and trajectory reconstruction. · DeepTrackAI/DeepTrack2 · Discussion #77

b-grimaud
Oct 22, 2021

Hello,

First and foremost, thank you for your work and for making it open source and available. The only alternative I could find for such a specific function was commercial.

I have a question regarding the Fluorescence module and noise simulation : My team is using two-photon microscopy to track nanoparticles. This means we have a relatively good SNR with little background noise, but second harmonic generation coupled with the very small size of our fluorophore means we have a very small amount of photons detected.
This induces a lot of spatial noise, a 'jitter' of some sort.
Here's an example of a single static particle :
output-onlinegiftools

Could there be a way to simulate such a phenomenon ?
For now I've only experimented simulation with regular, cohesive particles, and the quality of the predictions are limited by my simulations as well as the technical limitations of my computer and Google Colab.

Secondly, as I understand the prediction output is an array of coordinates were spots are located, for each frame.
Have you considered implementing a 'linking' algorithm in order to link those spots into trajectories over time, frame by frame ?
We already use algorithmic approaches to do this, and I can reshape DeepTrack outputs to feed directly into it, but I would be interested to see if there are AI-based, parameters-free approches to this problem.

Thanks a lot !

Answered by BenjaminMidtvedt

Oct 22, 2021

Hi! We're happy to be of use!

I have definitely been able to get outputs resembling yours! Something like

particle = dt.PointParticle()
optics = dt.Fluorescence(output_region=(0, 0, 64, 64))
pipeline = optics(particle) + dt.Poisson(snr=0.05)
pipeline.plot(cmap="gray")

gives something closely resembling your output! The snr in Poisson could more accurately be thought of as scaling with the photon count. The important thing is that the background is very close to 0.

Linking is something we are actively working on, and we expect to have a graph neural network-based particle tracer för linking frame-to-frame in a month or so!

View full answer

Replies: 1 comment 10 replies

BenjaminMidtvedt
Oct 22, 2021
Maintainer

Hi! We're happy to be of use!

I have definitely been able to get outputs resembling yours! Something like

particle = dt.PointParticle()
optics = dt.Fluorescence(output_region=(0, 0, 64, 64))
pipeline = optics(particle) + dt.Poisson(snr=0.05)
pipeline.plot(cmap="gray")

example
gives something closely resembling your output! The snr in Poisson could more accurately be thought of as scaling with the photon count. The important thing is that the background is very close to 0.

Linking is something we are actively working on, and we expect to have a graph neural network-based particle tracer för linking frame-to-frame in a month or so!

10 replies

@BenjaminMidtvedt

BenjaminMidtvedt Oct 27, 2021
Maintainer

Great!

np.array(validation_set+art_validation_set) here validation_set and art_validation_set are lists, so it will do list concatenation. np.array(validation_set)+np.array(art_validation_set) should work better! If not, check the shape and make sure it matches (validation_set_size, 512, 512, 1). It can give a reference on where to debug.

It is completely fine to reduce the amount of data. In fact, I would assume min_data_size and max_data_size of a few hundred is completely sufficient. However, if you reduce min_data_size by a factor 10, you need to increase the number of epochs by a factor 10 to make the training equivalent. Also, a batch_size of 4-8 is enough to get a great model. If the model predicts essentially all zeros (as is your case), I would train it for longer using weighted loss, perhaps even increasing the weight of the weighted loss. I would for now skip the second stage where the model is trained on unweighted loss, and just do weighted loss. Make sure it is reasonable after this step.

Training on unweighted loss is meant to further refine the model a bit to reduce false positives, but also makes the training slightly unstable, which is why it is not done from the start. I think your bad resultsstem from not training long enough on weighted loss to stabilize the model, since tha data size was reduced without increasing the number of epochs!

@b-grimaud

b-grimaud Oct 29, 2021
Author

Validation works !

I did manage to increase training time by increasing the number of epochs while keepint the batch size and dataset size relatively low.
However, one of the reason I initially reduced training time was because the loss, as well as the other metrics, wouldn't change after 2 or 3 epochs. I don't know if it's just a display issue or not, but prediction results were not proportionate to the amount of epochs used for training, even if the difference in training time is an order of magnitude.

For instance, training a model with 150 epochs results in this in the first few lines :

Generating 200 / 200 samples before starting training
Epoch 1/150
50/50 [==============================] - 43s 509ms/step - loss: 0.0151 - nd_unet_crossentropy: 0.0774 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 2/150
50/50 [==============================] - 16s 328ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9165e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 3/150
50/50 [==============================] - 16s 327ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9218e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 4/150
50/50 [==============================] - 16s 325ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9165e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 5/150
50/50 [==============================] - 16s 326ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9165e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04

Up to the last few :

Epoch 148/150
50/50 [==============================] - 16s 329ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9192e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 149/150
50/50 [==============================] - 17s 340ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9192e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04
Epoch 150/150
50/50 [==============================] - 17s 340ms/step - loss: 0.0015 - nd_unet_crossentropy: 7.9165e-04 - val_loss: 0.0015 - val_nd_unet_crossentropy: 7.9269e-04

As I didn't change anything in the model itself, I wonder if my data generator might be the issue here ? It seems like it's training on the same data over and over. Or maybe it has something to do with how Google Colab handles it, but that would seem strange.

@BenjaminMidtvedt

BenjaminMidtvedt Oct 29, 2021
Maintainer

To me, this looks like a model that predicts all 0s. You can validate that your generator is creating reasonable data using

with generator:
 for data, label in generator:
 plt.subplot(1, 2, 1)
 plt.imshow(data[0, ..., 0])
 plt.subplot(1, 2, 2)
 plt.imshow(label[0, ..., 0])
 plt.show()

I would also look at if your images have reasonable intensities. If the max value is very low (1e-2 or lower), the model may not be able to train! In general, the pixel-intensity histogram of your training data should overlap nicely with the experimental data. You may also want to increase the loss weighting slightly, such as

loss = dt.losses.flatten(
 dt.losses.weighted_crossentropy((25, 1))
)

As a point of reference, the weight number should be roughly (very roughly) 1 / np.mean(labels). So, if 0s are 10 times more common in your labels than 1s, it should be about 10.

You'll probably be able to see if the model is learning at after only a few epochs. When things are working well, it learns very quickly!

@b-grimaud

b-grimaud Nov 2, 2021
Author

I found (one of) the culprit : I had deliberately lowered the CIRCLE_RADIUS parameter used to get labels to see if datasets would properly label ground truth particles and not artifactual ones.

Correct me if I'm wrong, but as I understand it particle tracking models are essentially classifiers returning the probability of each pixel to be part of a valid spot, and subpixel accuracy is derived from the center of the label and not the spot itself, which would mean I need my label size to be as close as possible to the size of my experimental spots ?

Matching the size of the spots did improve training a lot, and I managed to get some results on generated data, albeit with a low threshold and some false positives and negatives.

The loss still stagnates around epoch 5 or 6, however. Increasing weight does lower the loss, and the rate at which it drops, but does not seem to extend the duration during which it decreases. Increasing the minimum amound of data also didn't change much.

I would also look at if your images have reasonable intensities.

For now I use fixed, and fairly strong, intensities, as I am still learning how to use the PointParticle and Fluorescence objects, and playing around with the parameters to obtain something that is visually as close as possible to my experimental data, rather than sticking to the physical parameters that we use. Could that be an issue ?

Thanks !

Update :

I messed around with several parameters, specifically minimum data size, batch size, weight, and even the size of the validation dataset. I got some pretty good results on simulated pictures, with no false positives and few false negatives in most cases. Results on experimental data are still rough, but I believe it could improve with some fine tuning of my data simulation model.

However, the decrease in loss still stops pretty early. Pushing parameters lowers loss overall and improves the rate at which it decreases, but it is still stagnating after 10 or so epochs at most. I can essentially "brute force" the model to obtain a loss low enough to get good results before loss stagnates, but from what I've read and what you wrote before, it seems that proper trainining should take longer.

I also noticed issues when using np.random.normal instead of rand and randint when randomizing data, but that might just be an issue with TensorFlow itself !

@BenjaminMidtvedt

BenjaminMidtvedt Nov 4, 2021
Maintainer

Great! The rate of decrease and the time it takes for the training to stagnate depends mostly on the difficulty of the problem and is hard to predict from the start. I don't think 10 epochs is entirely unreasonable! Sometimes, decreasing the learning rate can help optimize the model further. In the end, I think you're on the home stretch! At this point it's mostly about testing hyper-parameters to optimize the results.

Regardless, what will be orders of magnitude more important to the final quality is how well your simulation-pipeline approximates your experimental data. So that's where I would put most of my effort!

Answer selected by BenjaminMidtvedt

Non-linear fluorescence for simulation and trajectory reconstruction. #77

Uh oh!

b-grimaud Oct 22, 2021

Replies: 1 comment · 10 replies

Uh oh!

BenjaminMidtvedt Oct 22, 2021 Maintainer

Uh oh!

BenjaminMidtvedt Oct 27, 2021 Maintainer

Uh oh!

b-grimaud Oct 29, 2021 Author

Uh oh!

BenjaminMidtvedt Oct 29, 2021 Maintainer

Uh oh!

Uh oh!

b-grimaud Nov 2, 2021 Author

Uh oh!

BenjaminMidtvedt Nov 4, 2021 Maintainer

b-grimaud
Oct 22, 2021

Replies: 1 comment 10 replies

BenjaminMidtvedt
Oct 22, 2021
Maintainer

BenjaminMidtvedt Oct 27, 2021
Maintainer

b-grimaud Oct 29, 2021
Author

BenjaminMidtvedt Oct 29, 2021
Maintainer

b-grimaud Nov 2, 2021
Author

BenjaminMidtvedt Nov 4, 2021
Maintainer