komanawa.kendall_stats.time_tests_check_step_window#

usage python time_tests.py [outdir] :param outdir: path to save the results to, if not provided then the results are saved to the same directory as the script

created matt_dumont on: 29/09/23

Classes#

`MultiPartKendall`	multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha
`SeasonalMultiPartKendall`	multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Functions#

timeit_test(function_names, npoints, check_step, ...)

run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds

Module Contents#

class MultiPartKendall(data, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, data_col=None, rm_na=True, serialise_path=None, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha

Parameters:

data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index) if np.array or list expects the data to be in sample order
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
data_col – if data is a DataFrame or Series, the column to use
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test. Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) One of:
- None or tuple (start_idx, end_idx) (one breakpoint only)
- list of tuples of len nparts-1 with a start/end idx for each part,
- or a 2d array shape (nparts-1, 2) with a start/end idx for each part,
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:: path – path to the serialised file
Returns:: MultiPartKendall

get_acceptable_matches()[source]#: get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#: get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:: breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
Returns:: outdata: list of dataframes for each part of the time series
Returns:: kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

and * znorm_joint = the sum of the znorm values for each part

Parameters:: raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
Returns:: array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:: key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:

breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#: convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:

save_path – None (save to self.serialise_path) or path to save the file
complevel – compression level for hdf
complib – compression library for hdf

Returns:

class SeasonalMultiPartKendall(data, data_col, season_col, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, rm_na=True, serialise_path=None, freq_limit=0.05, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

Bases: MultiPartKendall

multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Parameters:

data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index)if np.array or list expects the data to be in sample order
data_col – if data is a DataFrame or Series, the column to use
season_col – the column to use for the season
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) one of:
- None or tuple (start_idx, end_idx) (one breakpoint only)
- or list of tuples of len nparts-1 with a start/end idx for each part,
- or a 2d array shape (nparts-1, 2) with a start/end idx for each part
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:: path
Returns:

get_acceptable_matches()[source]#: get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#: get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:: breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
Returns:: outdata: list of dataframes for each part of the time series
Returns:: kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

and * znorm_joint = the sum of the znorm values for each part

Parameters:: raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
Returns:: array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:: key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:

breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#: convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:

save_path – None (save to self.serialise_path) or path to save the file
complevel – compression level for hdf
complib – compression library for hdf

Returns:

timeit_test(function_names, npoints, check_step, check_window, n=10)[source]#

run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds

Parameters:

py_file_path – path to the python file that holds the functions, if the functions are in the same script as call then __file__ is sufficient. in this case the function call should be protected by: if __name__ == ‘__main__’:
function_names – the names of the functions to test (iterable), functions must not have arguments
n – number of times to test

Returns: