Cross-Correlation Calculations
- seeq.addons.correlation._cross_correlations.cross_corr_matrix_lagged(df_serialized, lags=100)[source]
Returns the matrix of lags and coefficients for best cross correlation.
- Parameters:
df_serialized (bytes) – A pickled pd.DataFrame with the signals to cross correlate.
lags (float) – Maximum number of lags to shift the signals.
- Returns:
(lags, coefficients)
- Return type:
tuple
Notes
This function requires the input pd.DataFrame to be pickled to take advantage of the caching functionality.
- seeq.addons.correlation._cross_correlations.cross_corr_matrix_raw(df_serialized)[source]
Returns the matrix of correlation coefficients for the set of signals.
- Parameters:
df_serialized (bytes) – A pickled pd.DataFrame with the signals to cross correlate.
- Returns:
coefficients_matrix
- Return type:
np.array
Notes
This function requires the input pd.DataFrame to be pickled to take advantage of the caching functionality.
- seeq.addons.correlation._cross_correlations.lags_coeffs(df, max_time_shift, time_output_unit)[source]
Calculates the lags to maximize correlations between signals and the cross-correlation coefficients of the shifted signals. If max_time_shift is None, the lags are zero (raw data correlations with no time shift). This function also returns the sampling period of the signals, either inferring from the dataframe or using the property value attached to it
- Parameters:
df (pandas.DataFrame) – A DataFrame that contains a set of signals as columns and date-time as the index. This function does not call the data preprocessor. Thus, make sure the data frame contains cleansed data
max_time_shift ({'auto', str, None}, default 'auto' Maximum time) – (e.g. ’15s’, or ‘1min’) that the signals are allowed to slide in order to maximize cross-correlation. For times specified as a str, normal time units are accepted.If ‘auto’ is selected, a default maximum time shift is calculated based on the number of samples. If None, the raw signals are used and no time shifts are calculated.
time_output_unit ({'auto', str} default 'auto') – Specifies the time unit used to display the time shifts. Valid units are the ones accepted by pd.Timedelta
- Returns:
lags (array_like, 2d) – Lags to maximize cross correlations. Not to be confused with time shifts. This lags used the opposite sign as the typical convention.
coeffs (array_like, 2d) – Cross-correlation coefficients for the lagged signals
sampling (pd.Timedelta) – A pd.Timedelta with the grid of the data in the input DataFrame
time_unit (str) – A str of a valid pd.Timedelta unit in which the
maxlags (int) – Numbers of maximum allowable lags to maximize cross correlations
Examples
Get the cross-correlation coefficients and lag delays to maximize cross-correlations for a given DataFrame allowing for automatic guess of maximum time shifts
>>> seeq.addons.correlation.lags_coeffs(df, max_time_shift='auto', time_output_unit='auto')
Get the cross-correlation coefficients for a given DataFrame using the raw data (no time shift allowed)
>>> seeq.addons.correlation.lags_coeffs(df, max_time_shift=None, time_output_unit='sec')