komanawa.kendall_stats#

created matt_dumont on: 21/09/23

Submodules#

Classes#

`MannKendall`	an object to hold and calculate kendall trends assumes a pandas dataframe or series with a time index
`MultiPartKendall`	multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha
`SeasonalKendall`	an object to hold and calculate seasonal kendall trends
`SeasonalMultiPartKendall`	multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Functions#

estimate_runtime(npoints, func[, plot])

assumes linear log-log relationship between runtime and number of points

Package Contents#

class MannKendall(data, alpha=0.05, data_col=None, rm_na=True)[source]#

Bases: object

an object to hold and calculate kendall trends assumes a pandas dataframe or series with a time index

Parameters:

trend – the trend of the data, -1 decreasing, 0 no trend, 1 increasing
h – boolean, True if the trend is significant
p – the p value of the trend
z – the z value of the trend
s – the s value of the trend
var_s – the variance of the s value
alpha – the alpha value used to calculate the trend
data – the data used to calculate the trend
data_col – the column of the data used to calculate the trend

calc_senslope()[source]#

calculate the senslope of the data

Returns:: senslope, senintercept, lo_slope, up_slope

classmethod map_trend()[source]#

map the trend value to a string (1: increasing, -1: decreasing, 0: no trend)

Parameters:: val – trend value
Returns:

plot_data(ax=None, **kwargs)[source]#

plot the data and the senslope fit

Parameters:

ax – optional matplotlib axis to plot the data on
kwargs – kwargs to pass to plt.scatter for the raw data

Returns:

class MultiPartKendall(data, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, data_col=None, rm_na=True, serialise_path=None, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

multi part mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0 note where the expected trend is zero the lack of a trend is considered significant if p > 1-alpha

Parameters:

data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index) if np.array or list expects the data to be in sample order
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
data_col – if data is a DataFrame or Series, the column to use
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test. Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) One of:
- None or tuple (start_idx, end_idx) (one breakpoint only)
- list of tuples of len nparts-1 with a start/end idx for each part,
- or a 2d array shape (nparts-1, 2) with a start/end idx for each part,
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:: path – path to the serialised file
Returns:: MultiPartKendall

get_acceptable_matches()[source]#: get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#: get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:: breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
Returns:: outdata: list of dataframes for each part of the time series
Returns:: kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

and * znorm_joint = the sum of the znorm values for each part

Parameters:: raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
Returns:: array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:: key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:

breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#: convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:

save_path – None (save to self.serialise_path) or path to save the file
complevel – compression level for hdf
complib – compression library for hdf

Returns:

class SeasonalKendall(df, data_col, season_col, alpha=0.05, rm_na=True, freq_limit=0.05)[source]#

Bases: MannKendall

Inheritance diagram of komanawa.kendall_stats.SeasonalKendall

an object to hold and calculate seasonal kendall trends

Parameters:

trend – the trend of the data, -1 decreasing, 0 no trend, 1 increasing
h – boolean, True if the trend is significant
p – the p value of the trend
z – the z value of the trend
s – the s value of the trend
var_s – the variance of the s value
alpha – the alpha value used to calculate the trend
data – the data used to calculate the trend
data_col – the column of the data used to calculate the trend
season_col – the column of the season data used to calculate the trend
freq_limit – the maximum difference in frequency between seasons (as a fraction), if greater than this will raise a warning

calc_senslope()[source]#: calculate the senslope of the data :return: senslope, senintercept, lo_slope, lo_intercept

classmethod map_trend()[source]#

map the trend value to a string (1: increasing, -1: decreasing, 0: no trend)

Parameters:: val – trend value
Returns:

plot_data(ax=None, **kwargs)[source]#

plot the data and the senslope fit

Parameters:

ax – optional matplotlib axis to plot the data on
kwargs – kwargs to pass to plt.scatter for the raw data (note that the seasonal column is passed to scatter as c)

Returns:

class SeasonalMultiPartKendall(data, data_col, season_col, nparts=2, expect_part=(1, -1), min_size=10, alpha=0.05, no_trend_alpha=0.5, rm_na=True, serialise_path=None, freq_limit=0.05, check_step=1, check_window=None, recalc=False, initalize=True)[source]#

Bases: MultiPartKendall

Inheritance diagram of komanawa.kendall_stats.SeasonalMultiPartKendall

multi part seasonal mann kendall test to indentify a change point(s) in a time series after Frollini et al., 2020, DOI: 10.1007/s11356-020-11998-0

Parameters:

data – time series data, if DataFrame or Series, expects the index to be sample order (will sort on index)if np.array or list expects the data to be in sample order
data_col – if data is a DataFrame or Series, the column to use
season_col – the column to use for the season
nparts – number of parts to split the time series into
expect_part – expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)
min_size – minimum size for the first and last section of the time series
alpha – significance level
no_trend_alpha – significance level for no trend e.g. will accept if p> no_trend_alpha
rm_na – remove na values from the data
serialise_path – path to serialised file (as hdf), if None will not serialise
check_step – int, the step to check for breakpoints, e.g. if 1 will check every point, if 2 will check every second point
check_window –
the window to check for breakpoints. if None will use the whole data. this is used to significantly speed up the mann kendall test Note that check_step still applies to the check_window (e.g. a check_window of (2, 6) with a check_step of 2 will check the points (2, 4, 6)) one of:
- None or tuple (start_idx, end_idx) (one breakpoint only)
- or list of tuples of len nparts-1 with a start/end idx for each part,
- or a 2d array shape (nparts-1, 2) with a start/end idx for each part
recalc – if True will recalculate the mann kendall even if the serialised file exists
initalize – if True will initalize the class from the data, only set to False used in self.from_file

Returns:

static from_file(path)[source]#

load the class from a serialised file

Parameters:: path
Returns:

get_acceptable_matches()[source]#: get the acceptable matches for the multipart kendall test :return: pd.DataFrame

get_all_matches()[source]#: get the all matches for the multipart kendall test (including those that are not significant) :return: pd.DataFrame

get_data_from_breakpoints(breakpoints)[source]#

get the data from the breakpoints

Parameters:: breakpoints – beakpoints to split the data, e.g. from self.get_acceptable_matches
Returns:: outdata: list of dataframes for each part of the time series
Returns:: kendal_stats: dataframe of kendal stats for each part of the time series

get_maxz_breakpoints(raise_on_none=False)[source]#

get the breakpoints for the maximum joint normalised (min-max for each part) z the best match is the maximum znorm_joint value where:

if expected trend == 1 or -1: * znorm = the min-max normalised z value for each part

else: (no trend expected) * znorm = 1 - the min-max normalised z value for each part

and * znorm_joint = the sum of the znorm values for each part

Parameters:: raise_on_none – bool, if True will raise an error if no acceptable matches, otherwise will return None
Returns:: array of breakpoint tuples

plot_acceptable_matches(key)[source]#

quickly plot the acceptable matches

Parameters:: key – key to plot (one of [‘p’, ‘z’, ‘s’, ‘var_s’,’znorm’, znorm_joint]) or ‘all’ a figure for each value note joint stats only have 1 value
Returns:

plot_data_from_breakpoints(breakpoints, ax=None, txt_vloc=-0.05, add_labels=True, **kwargs)[source]#

plot the data from the breakpoints including the senslope fits

Parameters:

breakpoints
ax – ax to plot on if None then create the ax
txt_vloc – vertical location of the text (in ax.transAxes)
add_labels – boolean, if True add labels (slope, pval) to the plot
kwargs – passed to ax.scatter (all parts)

Returns:

fig, ax

print_mk_diffs(other)[source]#: convenience function to print the differences between two MultiPartKendall classes :param other: another MultiPartKendall class

to_file(save_path=None, complevel=9, complib='blosc:lz4')[source]#

save the data to a hdf file

Parameters:

save_path – None (save to self.serialise_path) or path to save the file
complevel – compression level for hdf
complib – compression library for hdf

Returns:

estimate_runtime(npoints, func, plot=False)[source]#

assumes linear log-log relationship between runtime and number of points

Parameters:

npoints
func
plot – if True then plot the data and the regression line

Returns: