komanawa.kendall_stats#
created matt_dumont on: 21/09/23
Submodules#
Classes#
an object to hold and calculate kendall trends assumes a pandas dataframe or series with a time index |
|
multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha |
|
an object to hold and calculate seasonal kendall trends |
|
multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 |
Functions#
|
assumes linear log-log relationship between runtime and number of points |
Package Contents#
- class MannKendall(data, alpha=0.05, data_col=None, rm_na=True)[source]#
Bases:
objectan object to hold and calculate kendall trends assumes a pandas dataframe or series with a time index
- Parameters:
trend – the trend of the data, -1 decreasing, 0 no trend, 1 increasing
h – boolean, True if the trend is significant
p – the p value of the trend
z – the z value of the trend
s – the s value of the trend
var_s – the variance of the s value
alpha – the alpha value used to calculate the trend
data – the data used to calculate the trend
data_col – the column of the data used to calculate the trend
- calc_senslope()[source]#
calculate the senslope of the data
- Returns:
senslope, senintercept, lo_slope, up_slope
- class MultiPartKendall(data, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, data_col=None, rm_na=True, serialise_path=None, check_step=1, check_window=None, recalc=False, initalize=True)[source]#
multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha
- Parameters:
data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index) if np.array or list expects the data to be in sample order
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
data_col – if data is a DataFrame or Series, the column to use
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test. Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) One of:
None or tuple (start_idx, end_idx) (one breakpoint only)
list of tuples of len nparts-1 with a start/end idx for each part,
or a 2d array shape (nparts-1, 2) with a start/end idx for each part,
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file
- Returns:
- static from_file(path)[source]#
load the class from a serialised file
- Parameters:
path – path to the serialised file
- Returns:
MultiPartKendall
- get_acceptable_matches()[source]#
get the acceptable matches for the multipart kendall test :return: pd.DataFrame
- get_all_matches()[source]#
get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame
- get_data_from_breakpoints(breakpoints)[source]#
get the data from the breakpoints
- Parameters:
breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
- Returns:
outdata: list of dataframes for each part of the time series
- Returns:
kendal_stats: dataframe of kendal stats for each part of the time series
- get_maxz_breakpoints(raise_on_none=False)[source]#
get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:
if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part
else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part
and * znorm_joint = the sum of the znorm values for each part
- Parameters:
raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
- Returns:
array of breakpoint tuples
- plot_acceptable_matches(key)[source]#
quickly plot the acceptable matches
- Parameters:
key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
- Returns:
- plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#
plot the data from the breakpoints including the senslope fits
- Parameters:
breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)
- Returns:
fig, ax
- class SeasonalKendall(df, data_col, season_col, alpha=0.05, rm_na=True, freq_limit=0.05)[source]#
Bases:
MannKendall
an object to hold and calculate seasonal kendall trends
- Parameters:
trend – the trend of the data, -1 decreasing, 0 no trend, 1 increasing
h – boolean, True if the trend is significant
p – the p value of the trend
z – the z value of the trend
s – the s value of the trend
var_s – the variance of the s value
alpha – the alpha value used to calculate the trend
data – the data used to calculate the trend
data_col – the column of the data used to calculate the trend
season_col – the column of the season data used to calculate the trend
freq_limit – the maximum difference in frequency between seasons (as a fraction), if greater than this will raise a warning
- calc_senslope()[source]#
calculate the senslope of the data :return: senslope, senintercept, lo_slope, lo_intercept
- class SeasonalMultiPartKendall(data, data_col, season_col, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, rm_na=True, serialise_path=None, freq_limit=0.05, check_step=1, check_window=None, recalc=False, initalize=True)[source]#
Bases:
MultiPartKendall
multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0
- Parameters:
data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index)if np.array or list expects the data to be in sample order
data_col – if data is a DataFrame or Series, the column to use
season_col – the column to use for the season
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) one of:
None or tuple (start_idx, end_idx) (one breakpoint only)
or list of tuples of len nparts-1 with a start/end idx for each part,
or a 2d array shape (nparts-1, 2) with a start/end idx for each part
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file
- Returns:
- get_acceptable_matches()[source]#
get the acceptable matches for the multipart kendall test :return: pd.DataFrame
- get_all_matches()[source]#
get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame
- get_data_from_breakpoints(breakpoints)[source]#
get the data from the breakpoints
- Parameters:
breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
- Returns:
outdata: list of dataframes for each part of the time series
- Returns:
kendal_stats: dataframe of kendal stats for each part of the time series
- get_maxz_breakpoints(raise_on_none=False)[source]#
get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:
if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part
else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part
and * znorm_joint = the sum of the znorm values for each part
- Parameters:
raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
- Returns:
array of breakpoint tuples
- plot_acceptable_matches(key)[source]#
quickly plot the acceptable matches
- Parameters:
key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
- Returns:
- plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#
plot the data from the breakpoints including the senslope fits
- Parameters:
breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)
- Returns:
fig, ax