Anomaly Detection System is defined as a real-time surveillance program designed to automatically detect and account for the signs of offensive or disruptive activities immediately.
In MIL, precise temporal locations of anomalous events in videos are unknown. Video-level labels indicating the presence of an anomaly.
Single video is a bag if the instance of video contains the anomaly we label it as a positive bag(anomalus video) else we consider it negative video(normal video).
The C3D model is given an input video segment of 16 frames (after downsampling to a fixed size which depends on dataset used) and the outputs a 4096-element vector.
The fully connected layers have a size of 4096 dimensions which will be used in the DNN model for calculating the anomaly score
Screenshot 2022年04月26日 213236
Another_approach
The inflated convolution i.e. 3d convolution are performed on the 2D cnn model and after performing number of convolutions on the previous layer and also applying max pooling the results are concated and that result is called an inception module.
The I3D Architecture gives a size of 1024 dimensions which will be used in the DNN model for calculating the anomaly score.
1_Ab76Q3eRUOOuIX87hs-GZg
Approach
Feature of 16 frames clip are represented in the form of (4096D and 1024D) were fed into a 3-layer feed forward neural network. This approach will use forward propagation and backward propagation using hinge loss formulation, sparsity and smoothness.
Explosion041_x264
Abuse040_x264
We have trained our model for 4000 iterations, batch size is 32, learning rate is 0.01 and we have got the sum of hinge-loss, sparsity loss and smoothness loss which is 1.7413.
a
Explosion051_x264
Assault048_x264
We have trained our I3d model for 10000 iterations, batch size is 32, learning rate is 0.01 and we have got the sum of hinge-loss, sparsity loss and smoothness loss which is 2.23.
Screenshot 2022年04月26日 214938
The I3d Trained model gives results with more accuracy then the results generated using the C3D model.