komanawa.kendall_stats.time_tests_check_step_window#

usage python time_tests.py [outdir] :param outdir: path to save the results to, if not provided then the results are saved to the same directory as the script

created matt_dumont on: 29/09/23

Classes#

MultiPartKendall

multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha

SeasonalMultiPartKendall

multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Functions#

timeit_test(function_names, npoints, check_step, ...)

run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds

Module Contents#

class MultiPartKendall(data, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, data_col=None, rm_na=True, serialise_path=None, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha

Parameters:
  • data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index) if np.array or list expects the data to be in sample order

  • nparts – number of parts to split the time series into

  • expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)

  • min_size – minimum size for the first and last section of the time series

  • alpha – significance level

  • no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha

  • data_col – if data is a DataFrame or Series, the column to use

  • rm_na – remove na values from the data

  • serialise_path – path to serialised file (as hdf), if None will not serialise

  • check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point

  • check_window

    the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test. Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) One of:

    • None or tuple (start_idx, end_idx) (one breakpoint only)

    • list of tuples of len nparts-1 with a start/end idx for each part,

    • or a 2d array shape (nparts-1, 2) with a start/end idx for each part,

  • recalc – if True will recalculate the mann kendall even if the serialised file exists

  • initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:

path – path to the serialised file

Returns:

MultiPartKendall

get_acceptable_matches()[source]#

get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#

get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:

breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches

Returns:

outdata: list of dataframes for each part of the time series

Returns:

kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

  • if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

  • else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

  • and * znorm_joint = the sum of the znorm values for each part

Parameters:

raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None

Returns:

array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:

key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value

Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:
  • breakpoints

  • ax – ax to plot on if None then create the ax

  • txt_vloc – vertical location of the text (in ax.transAxes)

  • add_labels – boolean, if True add labels (slope, pval) to the plot

  • kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#

convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:
  • save_path – None (save to self.serialise_path) or path to save the file

  • complevel – compression level for hdf

  • complib – compression library for hdf

Returns:

class SeasonalMultiPartKendall(data, data_col, season_col, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, rm_na=True, serialise_path=None, freq_limit=0.05, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

Bases: MultiPartKendall

multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Parameters:
  • data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index)if np.array or list expects the data to be in sample order

  • data_col – if data is a DataFrame or Series, the column to use

  • season_col – the column to use for the season

  • nparts – number of parts to split the time series into

  • expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)

  • min_size – minimum size for the first and last section of the time series

  • alpha – significance level

  • no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha

  • rm_na – remove na values from the data

  • serialise_path – path to serialised file (as hdf), if None will not serialise

  • check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point

  • check_window

    the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) one of:

    • None or tuple (start_idx, end_idx) (one breakpoint only)

    • or list of tuples of len nparts-1 with a start/end idx for each part,

    • or a 2d array shape (nparts-1, 2) with a start/end idx for each part

  • recalc – if True will recalculate the mann kendall even if the serialised file exists

  • initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:

path

Returns:

get_acceptable_matches()[source]#

get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#

get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:

breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches

Returns:

outdata: list of dataframes for each part of the time series

Returns:

kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

  • if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

  • else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

  • and * znorm_joint = the sum of the znorm values for each part

Parameters:

raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None

Returns:

array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:

key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value

Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:
  • breakpoints

  • ax – ax to plot on if None then create the ax

  • txt_vloc – vertical location of the text (in ax.transAxes)

  • add_labels – boolean, if True add labels (slope, pval) to the plot

  • kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#

convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:
  • save_path – None (save to self.serialise_path) or path to save the file

  • complevel – compression level for hdf

  • complib – compression library for hdf

Returns:

timeit_test(function_names, npoints, check_step, check_window, n=10)[source]#

run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds

Parameters:
  • py_file_path – path to the python file that holds the functions, if the functions are in the same script as call then __file__ is sufficient. in this case the function call should be protected by: if __name__ == ‘__main__’:

  • function_names – the names of the functions to test (iterable), functions must not have arguments

  • n – number of times to test

Returns: