Backend
- seeq.addons.mps._mps.gather_workbook_worksheet_meta_data(workbook_id, worksheet_id)[source]
This function gathers workbook object data and worksheet index
- Parameters:
workbook_id (str) – The Seeq ID of the source workbook
worksheet_id (str) – The Seeq ID of the source worksheet
- Returns:
desired_workbook (list of seeq.spy.workbooks._workbook objects) – and seeq spy workbook meta data
sheet_index (int) – integer detailing the index of the source worksheet
- seeq.addons.mps._mps.known_select(data_pull_c, data_pull_known, select_)[source]
This function uses the known start and end time of the reference capsule/s and extracts the time series data from the entire time series data set within the investigation range of the worksheet.
- Parameters:
data_pull_c (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.
select (str) – Name of the selected reference capsule.
- Returns:
known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has strictly only the time series data for the reference profile start and end.
knownlength (int) – Length of the known dataframe.
- seeq.addons.mps._mps.load_ref(load_name, mypath)[source]
This function loads the reference profile time series data and metadata from a previously saved pickle file.
- Parameters:
load_name (str) – Name of the pickle file to load.
mypath (str) – Path to folder containing the pickle file to load.
- Returns:
items_s_ref (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Name’, ‘ID’, ‘Type’, ‘Color’, ‘Line Style’, ‘Line Width’, ‘Lane’, ‘Samples Display’, ‘Axis Auto Scale’, ‘Axis Align’, ‘Axis Group’, ‘Axis Show’] to detail the signals that describe the known reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’] to detail the capsule of the known reference profile capsule.
- seeq.addons.mps._mps.pull_mps_data(workbook_id, worksheet_id, signal_pull_list, items_s_ref, data_pull_known, time_frame, grid)[source]
This function gathers all the time series data required for the analysis, for the reference and the search area.
- Parameters:
workbook_id (str) – The Seeq ID of the source workbook.
worksheet_id (str) – The Seeq ID of the source worksheet.
signal_pull_list (list of str) – List of strings that detail the signal names to describe the reference profile.
items_s_ref (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Name’, ‘ID’, ‘Type’, ‘Color’, ‘Line Style’, ‘Line Width’, ‘Lane’, ‘Samples Display’, ‘Axis Auto Scale’, ‘Axis Align’, ‘Axis Group’, ‘Axis Show’]. To detail the signals that describe the reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’]. to detail the capsule of the known reference profile capsule.
time_frame (list of datetime) – Start and end of the analysis range to search for the known capsule in the spy pull.
grid (str) – Details the resolution/griding of the spy pull
- Returns:
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.known_select
data_pull_c (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
sheet_index (int) – Integer detailing the index of the source worksheet.
- seeq.addons.mps._mps.pull_ref_data(items_s_ref, data_pull_known, grid)[source]
This function gathers the time series data of the reference. The reference conditions limits the timeframe of the time series data to be gathered, a signal list instructs the variables to be gathered and the sampling rate or griding is set.
- Parameters:
items_s_ref (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Name’, ‘ID’, ‘Type’, ‘Color’, ‘Line Style’, ‘Line Width’, ‘Lane’, ‘Samples Display’, ‘Axis Auto Scale’, ‘Axis Align’, ‘Axis Group’, ‘Axis Show’]. To detail the signals that describe the known reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’]. To detail the capsule of the known reference profile capsule.
grid (str) – Resolution/griding of the spy pull.
- Returns:
data_pull_c – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
- Return type:
pd.DataFrame, pd.Series
- seeq.addons.mps._mps.push_mps_results(Return_top_x, min_idx_multivar, data_pull, workbook_id, condition_name, Sheet_index, grid)[source]
This function pushes the % similarity score as a % dis-similarity time series signal to a new worksheet within the desired workbook in the case mps is in continuous mode. In addition each variable’s % contribution to the dis-similarity is also pushes as a signal per variable.
- Parameters:
Return_top_x (int) – Variable to limit number of top found capsules.
min_idx_multivar (numpy.ndarray) – numpy array of three columns 1st = found capsules similarity measurement (float 0 to 1) 2nd = integer index of each found capsule relative to data_pull 3rd = integer describing the duration/length of the found capsule (each integer is defined by griding in ‘pull_mps_data’)
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.
workbook_id (str) – The Seeq ID of the source workbook.
condition_name (str) – Name of condition to leverage in the pushed item names.
Sheet_index (int) – Integer detailing the index of the source worksheet.
grid (str) – Resolution/griding of the spy pull.
- Returns:
end – Indicator for UI to display successful ending.
- Return type:
bool
- seeq.addons.mps._mps.push_mps_results_batch(batch_sim_df, workbook_id, condition_name, Sheet_index)[source]
This function pushes the % similarity score as a % dis-similarity time series signal to a new worksheet within the desired workbook in the case mps is in batch mode. In addition each variable’s % contribution to the dis-similarity is also pushes as a signal per variable.
- Parameters:
batch_sim_df (pd.DataFrame) – A dataframe or series that minimally has columns of [‘Similarity’, ‘Date-Time’]. This has the resulting similarity measure of each defined capsule with the centered datetime.
workbook_id (str) – The Seeq ID of the source workbook
condition_name (str) – Name of condition to leverage in the pushed item names.
Sheet_index (int) – Integer detailing the index of the source worksheet.
- Returns:
end – Indicator for UI to display successful ending.
- Return type:
bool
- seeq.addons.mps._mps.save_ref(workbook_id, worksheet_id, signal_pull_list, known_cap, time_frame, grid, save_name, mypath)[source]
This function saves the reference profile time series data and metadata as a pickle file
- Parameters:
workbook_id (str) – The Seeq ID of the source workbook
worksheet_id (str) – The Seeq ID of the source worksheet
signal_pull_list (list of str) – List of strings that detail the signal names to describe the reference profile to be saved
known_cap (str) – Name (str) of the capsule that defines the reference/s to be saved
time_frame (list of datetime) – Start and end datetimes of the analysis range to searched for the known capsule in the seeq.spy.pull
grid (str) – resolution/griding of the seeq.spy.pull
save_name (str) – name of the pickle file to saved as
mypath – path to folder to save the pickle file in
- seeq.addons.mps._mps.seeq_mps_dtw(data_pull, data_pull_c, data_pull_known, threshold, normalise, sim_, time_distort)[source]
This function measures the distance between the reference and search space time series data over one or many window size/durations. This is achieved by utilising the dynamic time warping algorithm, which searches for the smallest distance from each data point to a corresponding reference data point within an assigned search window. This function window steps through the search area calculating a distance for each time period. It loops through all the variables in the dataset and sums the distances into an accumulative distance measurement for each step. The function can be instructed to normalise the data before measurement. Then the distance scores are converted to % similarity compared to minimum possible distance (zero) and the maximum distance (largest of the inverse of its self or maximum found distance). Finally the % of each variable/signal contribution to the similarity measurement is calculated. This function is intended to be applied to continuous process data. Technical reference https://www.cs.unm.edu/~mueen/DTW.pdf
- Parameters:
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.
data_pull_c (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’]. To detail the capsule of the known reference profile capsule.
threshold (float) – 0 to 1 float to set similarity cutoff for found capsules.
normalise (bool) – Set normalisation of the input data to the algo.
sim (bool) – Set to return similar or dis-similar results.
time_distort (float) – 0 to 1 float to set % of time distortion of the searching window length in the window stepping of the algo.
- Returns:
found_all_sorted – numpy array of three columns 1st = found capsules similarity measurement (float 0 to 1) 2nd = integer index of each found capsule relative to data_pull 3rd = integer describing the duration/length of the found capsule (each integer is defined by griding in ‘pull_mps_data’)
- Return type:
numpy.ndarray
- seeq.addons.mps._mps.seeq_mps_dtw_batch(batch_cond, data_pull, data_pull_c, data_pull_known, normalise, time_distort)[source]
This function is similar to the “seeq_mps_dtw” function but instead of window stepping through the entire search area it only measures the distance between two time series datasets for each capsule in the batch condition. This function is intended to be applied to batch process data.
- Parameters:
batch_cond (pd.DataFrame, pd.Series) – pd.DataFrame, pd.Series A dataframe or series that minimally has columns of “X, …, Y” conditions requested to be analysed with index of (x, …, Y) and columns of at least Capsule Start, Capsule End. This dataframe has all the data for the batches capsules required.
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area
data_pull_c (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’]. To detail the capsule of the known reference profile capsule.
normalise (bool) – Set normalisation of the input data to the algo.
time_distort (float) – 0 to 1 float to set % of time distortion of the searching window length in the window stepping of the algo.
- Returns:
batch_sim_df – A dataframe or series that minimally has columns of [‘Similarity’, ‘Date-Time’]. This has the resulting similarity measure of each defined capsule with the centered datetime
- Return type:
pd.DataFrame
- seeq.addons.mps._mps.seeq_mps_mass(data_pull, data_pull_c, data_pull_known, threshold, normalise, sim_)[source]
This function measures the euclidean distance between the reference and search space time series data over the same duration (limited by the reference) using Mueen’s Algorithm for Similarity Search [MASS]. The algorithm window steps through the time series data calculating a distance for each time period. It loops through all the variables in the dataset and sums the distances into an accumulative distance measurement for each time step. The function can be instructed to normalise the data before measurement. The distance scores are then converted to % similarity compared to minimum possible distance (zero) and the maximum distance (largest of the inverse of its self or maximum found distance). Finally the % of each variable/signal contribution to the similarity measurement is calculated. This function is intended to be applied to continuous process data.
Technical reference found @ https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
- Parameters:
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.
data_pull_c (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the reference profile.
data_pull_known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of [‘Condition’, ‘Capsule Start’,’Capsule End’,’Capsule Is Uncertain’]. To detail the capsule of the known reference profile capsule.
threshold (float) – 0 to 1 float to set similarity cutoff for found capsules.
normalise (bool) – Set normalisation of the input data to the algo.
sim (bool) – Set to return similar or dis-similar results.
- Returns:
found_all_sorted – numpy array of three columns 1st = found capsules similarity measurement (float 0 to 1) 2nd = integer index of each found capsule relative to data_pull 3rd = integer describing the duration/length of the found capsule (each integer is defined by griding in ‘pull_mps_data’)
- Return type:
numpy.ndarray
- seeq.addons.mps._mps.sort_and_prepare_results(data_pull, total_dist, window_step, threshold, known, sim_, max_)[source]
This function takes the distance measurements from all variables at each window step, orders them numerically, computes a % similarity score by comparing against the minimum possible distance (zero) and the maximum distance (largest of the inverse of its self or maximum found distance) and removes overlapping ‘found’ capsules/events from most similar to least similar.
- Parameters:
data_pull (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has all the time series data for the analysis/search area.
total_dist (list of float) – List of distance measurements between the two curves.
window_step (int) – ize of the window stepping used by the algo.
known (pd.DataFrame, pd.Series) – A dataframe or series that minimally has columns of “X, …, Y” signals requested to be pulled (detailed in items_s_ref input variable) and ‘Date-Time’ with an index of timestamps. [X, …, Y,’Date-Time’]. This dataframe has strictly only the time series data for the reference profile start and end.
sim (bool) – Set to return similar or dis-similar results.
threshold (float) – 0 to 1 float to set similarity cutoff for found capsules.
max (float) – Max accumulative distance used to scale all other distances measured.
- Returns:
found_list (list of int) – Each int in the list is the sorted (highest similarity match 1st) index of each found capsule as an integer relative to data_pull.
found_sim (list of float) – Corresponding similarity measurement (0 to 1) of found list with 100% being a perfect match.