komanawa.kendall_stats.time_tests_check_step_window#
usage python time_tests.py [outdir] :param outdir: path to save the results to, if not provided then the results are saved to the same directory as the script
created matt_dumont on: 29/09/23
Classes#
multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha |
|
multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 |
Functions#
|
run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds |
Module Contents#
- class MultiPartKendall(data, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, data_col=None, rm_na=True, serialise_path=None, check_step=1, check_window=None, recalc=False, initalize=True)[source]#
multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha
- Parameters:
data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index) if np.array or list expects the data to be in sample order
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
data_col – if data is a DataFrame or Series, the column to use
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test. Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) One of:
None or tuple (start_idx, end_idx) (one breakpoint only)
list of tuples of len nparts-1 with a start/end idx for each part,
or a 2d array shape (nparts-1, 2) with a start/end idx for each part,
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file
- Returns:
- static from_file(path)[source]#
load the class from a serialised file
- Parameters:
path – path to the serialised file
- Returns:
MultiPartKendall
- get_acceptable_matches()[source]#
get the acceptable matches for the multipart kendall test :return: pd.DataFrame
- get_all_matches()[source]#
get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame
- get_data_from_breakpoints(breakpoints)[source]#
get the data from the breakpoints
- Parameters:
breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
- Returns:
outdata: list of dataframes for each part of the time series
- Returns:
kendal_stats: dataframe of kendal stats for each part of the time series
- get_maxz_breakpoints(raise_on_none=False)[source]#
get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:
if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part
else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part
and * znorm_joint = the sum of the znorm values for each part
- Parameters:
raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
- Returns:
array of breakpoint tuples
- plot_acceptable_matches(key)[source]#
quickly plot the acceptable matches
- Parameters:
key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
- Returns:
- plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#
plot the data from the breakpoints including the senslope fits
- Parameters:
breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)
- Returns:
fig, ax
- class SeasonalMultiPartKendall(data, data_col, season_col, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, rm_na=True, serialise_path=None, freq_limit=0.05, check_step=1, check_window=None, recalc=False, initalize=True)[source]#
Bases:
MultiPartKendall
multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0
- Parameters:
data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index)if np.array or list expects the data to be in sample order
data_col – if data is a DataFrame or Series, the column to use
season_col – the column to use for the season
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) one of:
None or tuple (start_idx, end_idx) (one breakpoint only)
or list of tuples of len nparts-1 with a start/end idx for each part,
or a 2d array shape (nparts-1, 2) with a start/end idx for each part
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file
- Returns:
- get_acceptable_matches()[source]#
get the acceptable matches for the multipart kendall test :return: pd.DataFrame
- get_all_matches()[source]#
get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame
- get_data_from_breakpoints(breakpoints)[source]#
get the data from the breakpoints
- Parameters:
breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
- Returns:
outdata: list of dataframes for each part of the time series
- Returns:
kendal_stats: dataframe of kendal stats for each part of the time series
- get_maxz_breakpoints(raise_on_none=False)[source]#
get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:
if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part
else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part
and * znorm_joint = the sum of the znorm values for each part
- Parameters:
raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
- Returns:
array of breakpoint tuples
- plot_acceptable_matches(key)[source]#
quickly plot the acceptable matches
- Parameters:
key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
- Returns:
- plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#
plot the data from the breakpoints including the senslope fits
- Parameters:
breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)
- Returns:
fig, ax
- timeit_test(function_names, npoints, check_step, check_window, n=10)[source]#
run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds
- Parameters:
py_file_path – path to the python file that holds the functions, if the functions are in the same script as call then __file__ is sufficient. in this case the function call should be protected by: if __name__ == ‘__main__’:
function_names – the names of the functions to test (iterable), functions must not have arguments
n – number of times to test
- Returns: