This project is our 3-step RTN and aRTN analysis algorithm that was the subject of our paper submitted to IEEE EDL.
Please refer to our submitted paper for a detailed description of the theory of our algorithm. A preprint is available here.
See also:
-
Clone the repository and enter the directory
git clone ist-git@git.uwaterloo.ca:QuINLab/Projects/Noise/analysis-algorithm.git cd analysis-algorithm -
Create a new conda environment with pip installed
conda create -n analysis_algorithm pip
-
Activate the new environment
conda use analysis_algorithm
-
Install the requirements
pip install -r requirements.txt
This project uses many different types of files. There are two "design value" files for each example:
files ending in _signals.feather and _parameters.csv.
There are also many intermediate files and result files. The reason to have each step save it's results to a file is to easily run a single part of the algorithm. For example, when I am working on the GMM step, I don't want to have to rerun the KDE step every time, which may be slow. It is also interesting to plot these intermediate values.
Full list of files:
-
_parameters.csv: The design values of the example. There will be one row for each trap. The columns areamplitude,tau_high, andtau_low(there may be more columns for some aRTN cases). -
_signals.feather: The raw time-series signal with 1 million points. This file includes the "design value" signal for each trap (labelledtrap_{i}), the sum of all pure RTN components (rtn_sum), and the white noise (white_noise). The algorithm is not allowed to look at these. They are only for calculating the accuracy after the algorithm has run. The main column isfull_signal. This is the sum ofrtn_sumandwhite_noise. There may also be some temporary columns added by certain steps. -
_kde_data.feather: The probability density function and other data output by the KDE step.I recommend using
df.squeeze()after reading this file to convert the PandasDataFrameinto aSeries.This file includes many columns:
intensity: Intensity is the independent variable of the KDE. This column will be an array of equispaced points along the axis of the intensity of the signal. It is like current, but we are trying to be agnostic to the measurement type (current/voltage/other). This has negative values because the intensity of the signal can sometimes be negative.density: This is the probability density. It is the dependent variable for the KDE. This column is an array with each value being the probability of the intensity with the same index. This is a probability and should always be positive. The sum of this array should be 1.raw_densityandraw_intensity: These are the same asdensityandintensity, but for the unfiltered raw signal. These are not used by the algorithm, only to make the gray plot in the KDE figure.peaks_intensities: This is a list of the intensities of the detected peaks. This is the seed data for the GMM.1peaks_densities: This is a list of the densities of the detected peaks. This is not used by the algorithm. It is only saved to draw a marker over the peak in the KDE plot.window: The width of the rolling mean used, which is a function of the estimated white noise.
-
_timer_kde.txt: Time taken for the KDE step. -
_decomp_data_traps.csv: Result of the GMM step: the predicted amplitude of each trap. This is a dataframe with two columns:trapandsep.sepis the separation between two fitted Gaussian functions (i.e. the trap switching amplitude).trapis the trap index (arbitrary and may be different from the indices used elsewhere). -
_sep_error.feather: Accuracy of the GMM step: the error to predict the amplitude of each trap (reads design values, cannot be used later in the algorithm). -
_gmm_fit.lmfit: The full fittedlmfit(the library used for the GMM) Gaussian mixture model. -
_timer_decompose.txt: Time taken for the GMM step. -
_time_series_predictions.feather: The raw predictions of the RNN step. -
_tf_model.h5: Weights of the trained RNN model.
The class Example in example.py was created to help manage all of these files.
Since many of these files for a given example must be read,
this class allows you to quickly get all the different variants
just by accessing that attribute on the class.
It will also allow you to read and write the file
intelligently with different methods depending on the data type (.csv, .feather, .h5, .txt all have different functions to read and write).
Please see example.py for the full capabilities.
For example:
example = Example('<some_example>_signals.feather') print(example.path) # <some_example>_signals.feather print(example.parameters.path) # <some_example>_parameters.csv print(example.parameters.read()) # DataFrame of the contents of <some_example>_parameters.csv example.kde_data.write(pd.DataFrame()) # Saves an empty dataframe to <some_example>_kde_data.csv
Many files (the large ones) are saved in .feather files instead of .csv.
This is because .csv is very inefficient.
It is a text-based format, so when saving the file, every number has to be converted into text (which is time-inefficient)
and this text is saved on the disk (which is very space-inefficient).
When the file is read, the text must be parsed back into numbers, which is time-inefficient.
Therefore, we use .feather files, which are efficient binary files, solving both of these issues.
Some small files are still saved as .csv (short list of parameters, accuracy results, etc.) for convenience.
The main code to do the three steps of our algorithm is located in ./main_algorithm/.
There are three subfolders for our three algorithm steps:
Here, the prefixes a_, b_, c_ are used to order the steps because Python files may not start with a number.
Running the algorithm is currently done one step at a time. Future steps use data saved in files by previous steps. This has been very helpful while working on a single step, but it would be nice to eventually add a method to run the algorithm start-to-finish.
- KDE:
python main.py process-kde FILES - GMM:
python main.py process-gmm FILES - RNN:
python main.py train FILES
Plase see USAGE.md for more information.
- KDE:
python main.py plot-kde FILES - GMM: Already handled by
process-gmm - RNN:
python main.py plot-time-series-predictions
Plase see USAGE.md for more information.
- Switching amplitude:
python main.py aggregate-amplitude-error FILES - Digitization:
python main.py aggregate-digitization-tau-error FILES
Plase see USAGE.md for more information.
The code to generate RTN data is in data_generation.
This folder contains the files to generate validation data (like our 330 normal RTN dataset)
as well as files to generate training data used by the RNN (the two share the same core functions).
The code to make "fancy" figures like those included in manuscripts
is in results_presentation.
There are more details in the README.md of that folder.
The data_exploration folder holds the code to generate
less fancy figures for data exploration.
Normally, these figures are also to understand the input data, not to present the results.
The folder results_aggregation holds files to aggregate the results from the intermediate files for each example
into a single file. These files are very messy due to the multiple different results formats.
The folder experiments_debugging holds
temporary code, works in progress, and debugging code.
Please see USAGE.md for the full usage.