Cross-Correlation Calculations

seeq.addons.correlation._cross_correlations.cross_corr_matrix_lagged(df_serialized, lags=100)[source]

Returns the matrix of lags and coefficients for best cross correlation.

Parameters:
  • df_serialized (bytes) – A pickled pd.DataFrame with the signals to cross correlate.

  • lags (float) – Maximum number of lags to shift the signals.

Returns:

(lags, coefficients)

Return type:

tuple

Notes

This function requires the input pd.DataFrame to be pickled to take advantage of the caching functionality.

seeq.addons.correlation._cross_correlations.cross_corr_matrix_raw(df_serialized)[source]

Returns the matrix of correlation coefficients for the set of signals.

Parameters:

df_serialized (bytes) – A pickled pd.DataFrame with the signals to cross correlate.

Returns:

coefficients_matrix

Return type:

np.array

Notes

This function requires the input pd.DataFrame to be pickled to take advantage of the caching functionality.

seeq.addons.correlation._cross_correlations.lags_coeffs(df, max_time_shift, time_output_unit)[source]

Calculates the lags to maximize correlations between signals and the cross-correlation coefficients of the shifted signals. If max_time_shift is None, the lags are zero (raw data correlations with no time shift). This function also returns the sampling period of the signals, either inferring from the dataframe or using the property value attached to it

Parameters:
  • df (pandas.DataFrame) – A DataFrame that contains a set of signals as columns and date-time as the index. This function does not call the data preprocessor. Thus, make sure the data frame contains cleansed data

  • max_time_shift ({'auto', str, None}, default 'auto' Maximum time) – (e.g. ’15s’, or ‘1min’) that the signals are allowed to slide in order to maximize cross-correlation. For times specified as a str, normal time units are accepted.If ‘auto’ is selected, a default maximum time shift is calculated based on the number of samples. If None, the raw signals are used and no time shifts are calculated.

  • time_output_unit ({'auto', str} default 'auto') – Specifies the time unit used to display the time shifts. Valid units are the ones accepted by pd.Timedelta

Returns:

  • lags (array_like, 2d) – Lags to maximize cross correlations. Not to be confused with time shifts. This lags used the opposite sign as the typical convention.

  • coeffs (array_like, 2d) – Cross-correlation coefficients for the lagged signals

  • sampling (pd.Timedelta) – A pd.Timedelta with the grid of the data in the input DataFrame

  • time_unit (str) – A str of a valid pd.Timedelta unit in which the

  • maxlags (int) – Numbers of maximum allowable lags to maximize cross correlations

Examples

Get the cross-correlation coefficients and lag delays to maximize cross-correlations for a given DataFrame allowing for automatic guess of maximum time shifts

>>> seeq.addons.correlation.lags_coeffs(df, max_time_shift='auto', time_output_unit='auto')

Get the cross-correlation coefficients for a given DataFrame using the raw data (no time shift allowed)

>>> seeq.addons.correlation.lags_coeffs(df, max_time_shift=None, time_output_unit='sec')