komanawa.BASE.utils#
created matt_dumont on: 6/14/24
Classes#
class for predicting the source concentration from the BPEFM model |
Functions#
|
Load the source and receptor data from a npz file (self serialized by BASESolver.get_source_receptor_data) |
Module Contents#
- class BASESolver(save_dir, n_inflections, ts_data, mrt, mrt_p1, mrt_p2, frac_p1, f_p1, f_p2, precision=2, min_conc=0.01, inflection_times=None, cdf_inflection_start=None, sigma=None, interpolator='linear', interpolator_kwargs=None, serialize_receptor_data=True, predict_ncores=None, nchains=5, niterations=50000, start_age=np.nan, verbose=True)#
class for predicting the source concentration from the BPEFM model
BASE: Bayesian approach to source estimation. This class uses MT-DREAM(ZS) algorithm to conduct Bayesian inference on the source concentration for a measured time series data and a BPEFM model.
The MT-DREAM(ZS) algorithm was introduced in: Laloy, E. & Vrugt, J. A. High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing. Water Resources Research 48, W01526 (2012).
Instance Variables:
- Variables:
age_bounds – Age bounds for the model (years since start date)
age_fractions – Age fractions for age in the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)
age_step – Age step for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)
ages – Ages for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)
end_date – End date of the observed time series data (pd.Timestamp)
f_p1 – Fraction of the total mass in the first peak (age model parameter)
f_p2 – Fraction of the total mass in the second peak (age model parameter)
frac_p1 – Fraction of the total mass in the first peak (age model parameter)
inflection_times – Inflection times for the model (years since start date). Source concentration will be parameterized at these times. Note that len(inflection_times) = n_inflections + 2, where n_inflections is passed in the init as the method automatically includes the start and end of the time series
measured_age_bounds – Age bounds for the measured time series data (years since start date)
mrt – Mean residence time for the age distribution (yrs) (age model parameter)
mrt_p1 – Mean residence time for the first peak (yrs) (age model parameter)
mrt_p2 – Mean residence time for the second peak (yrs) (age model parameter)
nchains – Number of pydream chains for the model. note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower
niterations – The number of algorithm iterations to run.
precision – precision of the age distribution (decimal places years) default is 2, approximately monthly
predict_ncores – Number of cores for prediction (see param)
save_dir – working directory for the model
serialize_receptor_data – Boolean flag to serialize receptor data
start_date – Start date of the time series data (pd.Timestamp)
t – time index for the time series data (decimal years since start date)
ts_data – Time series data (pandas Series) Index=self.t, values=concentration
Class attributes:
- Variables:
available_interpolators – tuple of available interpolators
all_t – not used in this class, but used in the child classes
_force_old_predict – a flag to force the use of the old predict method for testing purposes
Basic Usage:
from komanawa.BASE import BASESolver, generators, BaseSampledParam from pathlib import Path import matplotlib.pyplot as plt # Set up the problem first_model_name = 'first_model' n_inflections = 5 # this will increase to 7 with the addition of a start and end point ndata = get_ndata() # get your data, this should be a pandas series with a datetime index output_dir = Path.home() cdf_threshold = 0.15 mrt = 12 f_p1 = 0.7 # create the solver solver = BASESolver( save_dir=output_dir, ts_data=ndata, mrt=mrt, mrt_p1=mrt, mrt_p2=mrt, frac_p1=1, f_p1=f_p1, f_p2=f_p1, n_inflections=n_inflections, cdf_inflection_start=cdf_threshold, ) # set priors lows = [0.001, 0.001, 0.1, 0.1, 0.2, 0.5, 1.0] ups = [0.5, 2, 6, 6, 6, 6, 6] params = BaseSampledParam(distribution=generators.UniformPathGenerator, low=lows, up=ups, n=7) # run the model run = solver.run_base(model_name=first_model_name, params=params, verbose=False) # plot the results nplot = 0.2 # plot 20% of the results fig, ax = solver.plot_best_source_receptor(first_model_name, nplot=nplot, receptor_spacing=1) plt.show()
- Parameters:
save_dir – directory to run dreamzs in, files are named via model name kwarg
n_inflections – number of inflection times to use (int) or frequency of inflection times (see https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases)
ts_data – time series data to fit to
mrt – mean residence time of the source (yrs)
mrt_p1 – mean residence time of the first piston (yrs)
mrt_p2 – mean residence time of the second piston (yrs)
frac_p1 – fraction of the source that is in the first piston
f_p1 – fraction of exponential flow the first piston
f_p2 – fraction of exponential flow the second piston (use a dummy value if frac_p1 = 1)
precision – precision of the age distribution (decimal places years) default is 2, approximately monthly
min_conc – minimum concentration for the interpolated source concentration, default is 0.01. This prevents weird interpolation behaviours in the early poorly constrained part of the time series. This is not a hard limit, but rather a minimum value for the interpolated source concentration. The actual value will be set to the maxumum of the interpolated value and min_conc.
inflection_times – None or list of datetimes of the inflection times. If None then the inflection times will be evenly spaced between the start and end of the ages + time series only one of cdf_inflection_start or inflection_times must be None inflection times must be None if n_inflections is a frequency code
cdf_inflection_start – None or float, the threshold for the cdf of ages to start the evenly spaced inflection points only one of cdf_inflection_start or inflection_times must be None.
sigma – None or float or array of floats, the uncertainty in the time series data, if None then no uncertainty is used and self.log_likelihood_no_sigma is used, if a float then the same uncertainty is used for all points, if an array then the uncertainty is used for each point. Regardless if sigma is not None then self.log_likelihood_sigma is used.
interpolator –
str, the interpolator to use to interpolate the source concentration from the n_inflection points to the needed frequency for the 1d age function (BEPFM), options are:
’linear’: Piecewise linear interpolation (default)
’cubicspline’:Cubic spline data interpolator., see scipy.interpolate.CubicSpline
’pchip’: PCHIP 1-D monotonic cubic interpolation., see scipy.interpolate.PchipInterpolator
’akima’: Akima interpolator, see scipy.interpolate.Akima1DInterpolator
interpolator_kwargs – None or dict, additional kwargs passed to the interpolator (see interpolator)
serialize_receptor_data – bool, whether to serialize the receptor data for later use (increase diskuse but decrease runtime)
predict_ncores – None or int, number of cores to use for prediction, if None then use all available (virtual) cores
nchains – number of chains to run in dreamzs note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower
niterations – The number of algorithm iterations to run. The default is 50000. Higher values will take longer to run but may produce better results.
start_age – start age for the age distribution (yrs) default is np.nan which will use the maximum of the mrt_p1 and mrt_p2
verbose – bool, whether to print progress to the console
- convert_age_to_decimal_years(age)#
convert an age to a decimal year
- Parameters:
age – years since start date
- Returns:
decimal year (e.g. 1950.54)
- convert_age_to_dt(age)#
convert an age to a datetime
- Parameters:
age – age(s) to convert (years since start date)
- Returns:
datetime(s)
- convert_decimal_years_to_age(decimal_years)#
convert a decimal year to an age
- Parameters:
decimal_years – decimal years to convert e.g. 1950.54
- Returns:
age (float) years since start date
- convert_decimal_years_to_dt(decimal_years)#
convert a decimal year to a datetime
- Parameters:
decimal_years – decimal years to convert e.g. 1950.54
- Returns:
datetime
- convert_dt_to_age(dt)#
convert a datetime to an age (years since start date)
- Parameters:
dt – datetime(s) to convert
- Returns:
age(s) years from start date
- convert_dt_to_decimal_years(dt)#
convert a datetime to a decimal year
- Parameters:
dt – datetime to convert
- Returns:
decimal year (e.g. 1950.54)
- export_source_receptor(model_name, outdir, nexport, receptor_spacing=1, percentiles=(1, 5, 10, 25), tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', percentile_format='hdf', include_raw=False)#
export the source and receptor data to csv files or hdf file store
The HDF file will contain the following keys the csv files will have the same suffixes as the keys below:
source_percentiles: source data percentiles
receptor_percentiles: receptor data percentiles
measured: measured data
source_raw: optional if include_raw=True, source data
receptor_raw: optional if include_raw=True, receptor data
- Parameters:
outdir – output directory
nexport – number of results to export (int) or fraction of results to export (float, nplot<=1)
receptor_spacing – space between receptor prediction points
percentiles – tuple of percentiles to export fill between. e.g. a percentile of 5 will plot the 5th and 95th
tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series
forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘fraction’:
forward_source = ss_conc * forward_ts_deltas
- Parameters:
percentile_format – str, ‘hdf’ or ‘csv’ format to save the percentiles
- Returns:
- get_parameter_log(model_name)#
get the parameter log for the model run from the model name :param model_name: name for the model (str) :return: dict of parameter names and str values
- get_rmse(model_name, ntest, raw=False)#
get the rmse for the model run from the model name
- Parameters:
model_name – name for the model (str)
ntest – number of test simulations to use for the rmse
raw – if True return the raw rmse values, if False return the 50th percentile and 5th and 95th percentiles
- Returns:
rmse (float)
- get_ss_conc(model_name, nplot, site=None)#
get the steady state concentration (e.g. the last concentration in the time series) for the model run :param model_name: model name :param nplot: number of results to plot (int) or fraction of results to plot (float, nplot<=1) :param site: None (not used, for compatibility with MultiSiteDilutionBASESolver) :return: pd.Series of steady state concentrations
- load_logps_params(model_name, _concat=True)#
load the unsorted logps and sampled parameters from the model run
- Parameters:
model_name – name for the model (str)
- Returns:
log_p (np.array) shape=(nsimulations)
- Returns:
params (np.array) shape=(nsimulations, nparams)
- log_likelihood_no_sigma(params)#
calculate the log likelihood of the model (where the sigma of the observations IS NOT provided). The log likelihood without observation sigma function can be changed by subclassing this class and overwriting this method.
- Parameters:
params – sampled parameters for the model instance (see BASESolver.run_base)
- Returns:
likelihood (float)
- log_likelihood_sigma(params)#
calculate the log likelihood of the model (where the sigma of the observations IS provided). The log likelihood function with observation sigma can be changed by subclassing this class and overwriting this method.
- Parameters:
params – sampled parameters for the model instance (see BASESolver.run_base)
- Returns:
likelihood (float)
- make_forward_ts_deltas(time_series=None, age_series=None, age_precision=2, values_precision=2, forward_ts_deltas_type='absolute')#
make a given time series / age series into deltas (differences between start date) this is a convenience function for creating forward_ts_deltas parameters for the model
- Parameters:
ts – pd.Series of time series (datetime index)
age_series – pd.Series of ages (years since start date)
age_precision – int, precision to round the age series index
values_precision – int, precision to round the values in the age series values
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘fraction’:
forward_source = ss_conc * forward_ts_deltas
- Returns:
pd.Series of forward_ts_deltas
- Returns:
forward_ts_deltas_type: str, ‘absolute’ or ‘fraction’ index = years since start date (self.age_bounds[1])
- Returns:
project_forward: int, the number of years to project forward
- Returns:
start_val: the value at self.age_bounds[1]
- make_receptor_points(receptor_spacing=None, project_forward=None, tvals=None, site=None)#
make the receptor points for the model, the following options are available:
make_receptor_points(receptor_spacing=None, project_forward=None, tvals=None) -> self.t (sampling points)
make_receptor_points(receptor_spacing=1, project_forward=None, tvals=None) -> evenly spaced receptor points (about yearly) from 0 to self.t.max()
make_receptor_points(receptor_spacing=1, project_forward=10, tvals=None) -> evenly spaced receptor points (about yearly) from 0 to self.t.max() + 10
make_receptor_points(receptor_spacing=None, project_forward=None, tvals=*args) -> sampling at tvals (years since data starts)
all other combinations are invalid and will raise an exception
- Parameters:
receptor_spacing – the spacing to use (decimal years)
project_forward – None or int, if Int project the final concentration N years forward.
tvals – None or list of times to sample at (years since data starts, self.start_date)
site – None and should always be None, used to ensure same signature as other methods in children classes
- Returns:
- make_source_receptor_data(model_name, nplot, receptor_spacing=1, tvals=None, project_forward=None, source_only=False, forward_ts_deltas=None, forward_ts_deltas_type='absolute', recalc=False, _return_ncomputed=False, return_interpolated_source=True)#
make the source and receptor data for the run model. if self.serialize_receptor_data is True then the receptor data will be saved to a npz file for later use, and any receptors already computed will be loaded from the file. If the file is not found then the receptor data will be recomputed. The process of predicting the receptor data is parallelized using multiprocessing, but still can take some time, especially for large numbers of simulations.
The save file is unique to the model_name, receptor_spacing, project_forward, and tvals., Note that changes to the class instance attributes are not checked and will not cause the data to be recomputed.
Note that passing tvals or forward_ts_deltas, will serialize the receptor data with the hash of the tvals/forward_ts_deltas used in the filename. Tvals/forward_ts_deltas will also be saved in the npz file and checked on reload. Note that minor changes in the tvals/forward_ts_deltas (e.g. float imprecision) will cause the hash to change and the data to be recomputed. Therefore we suggest rounding tvals/forward_ts_deltas to a reasonable precision before passing them to this method.
- Parameters:
nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)
receptor_spacing – the number of years between receptor prediction points
tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series
source_only – bool, if True only return the source data (don’t compute the receptor data)
forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘relative’:
forward_source = ss_conc * forward_ts_deltas
- Parameters:
source_only – bool, if True only return the source data (don’t compute the receptor data)
recalc – bool, if True then recalculate the receptor data even if it has been serialized
_return_ncomputed – bool, if True return the number of computed receptor data points, for testing only
return_interpolated_source – bool, if True return the interpolated source data, if False return the raw source data
- Returns:
source_data: source concentration predictions (ndarray, shape=(nsims, source_points) (interpolated to self.precision if return_interpolated_source=True)
- Returns:
source_points: source concentration timings years since start date (ndarray, shape=(self.n_inflections) if project_forward=None else shape=(self.n_inflections+1)(nsource_points,) (interpolated to self.precision if return_interpolated_source=True)
- Returns:
outreceptor_data: receptor concentration predictions (ndarray, shape=(nsims, nreceptor_points)
- Returns:
receptor_points: receptor timing years since start date (ndarray, shape=(nreceptor_points)
- parameters_match_log(model_name, params, starts, start_random, hardboundaries, multitry, **kwargs)#
check the parameters against the log file, see run_base for more info
- Parameters:
params – as would have been passed to self.run_base
starts – as would have been passed to self.run_base
start_random – as would have been passed to self.run_base
hardboundaries – as would have been passed to self.run_base
multitry – as would have been passed to self.run_base
kwargs – as would have been passed to self.run_base
- Returns:
args_match(bool), (got_args_dict, expect_arg_dict)
- plot_best_receptor(model_name, nplot, receptor_spacing=1, percentiles=(1, 5, 10, 25), pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#
plot the best n predictions of the receptor concentrations
- Parameters:
model_name – name for the model (str)
nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)
receptor_spacing – space between receptor points (decimal years)
percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th
pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])
plot_inflection_points – bool, whether to plot the inflection points
plot_annual_vlines – bool, whether to plot annual vertical lines
tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series
forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘fraction’:
forward_source = ss_conc * forward_ts_deltas
- Parameters:
figsize – size for the figure
plot_measured – bool, whether to plot the measured data on the plot
add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized
ax_recept – None or axis to plot on
allsite_on_one – not used for BASESolver
rt_leg_hand_lab – if True also return the handles and labels for the legend
- Returns:
fig, ax_recept if rt_leg_hand_lab is False (default)
- Returns:
fig, ax_recept, (handles, labels), rmse if rt_leg_hand_lab is True
- plot_best_source(model_name, nplot, percentiles=(1, 5, 10, 25), plot_inflection_points=True, plot_annual_vlines=True, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', pdf_alpha=0.1, figsize=(10, 10), plot_measured=True, ax_source=None, rt_leg_hand_lab=False)#
plot fill between plots of the best n params and the predicted receptor concentrations
- Parameters:
model_name – name for the model (str)
nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)
percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th
plot_inflection_points – bool, whether to plot the inflection points
plot_annual_vlines – bool, whether to plot annual vertical lines
project_forward – None or int, if an int then project the best n params forward by this many years at constant concentration
forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘fraction’:
forward_source = ss_conc * forward_ts_deltas
- Parameters:
pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])
figsize – size for the figure
plot_measured – bool, whether to plot the measured data on the plot
ax_source – None or axis to plot on
rt_leg_hand_lab – if True also return the handles and labels for the legend
- Returns:
fig, ax_source - if rt_leg_hand_lab is False (default)
- Returns:
fig, ax_source (handles, labels), rmse - if rt_leg_hand_lab is True
- plot_best_source_receptor(model_name, nplot, receptor_spacing=1, sharey=True, percentiles=(1, 5, 10, 25), title_additon='', pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_source=None, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#
plot fill between plots of the best n params and the predicted receptor concentrations
- Parameters:
model_name – name for the model (str)
nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)
receptor_spacing – space between receptor points (decimal years)
sharey – bool, whether to share the y axis between the source and receptor plots
percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th
title_additon – str, addition to the title no character between the model name and the addition string
pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])
plot_inflection_points – bool, whether to plot the inflection points
plot_annual_vlines – bool, whether to plot annual vertical lines
tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series
forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.
forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:
- if ‘absolute’:
forward_source = ss_conc + forward_ts_deltas
- if ‘fraction’:
forward_source = ss_conc * forward_ts_deltas
- Parameters:
figsize – size for the figure
plot_measured – bool, whether to plot the measured data on the prediction plot
add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized
ax_source – matplotlib axis to plot the source data on or None to create a new figure
ax_recept – matplotlib axis to plot the receptor data on or None to create a new figure
allsite_on_one – not used in BASESolver
rt_leg_hand_lab – if True also return the handles and labels for the legend
- Returns:
fig, (ax_source, ax_recept) - if rt_leg_hand_lab is False (default)
- Returns:
fig, (ax_source, ax_recept), (src_handles, src_labels), (rec_handles, rec_labels), rmse - if rt_leg_hand_lab is True
- plot_breakpoints(ax=None)#
plot the inflection points on the time series data
- Parameters:
ax – matplotlib axis to plot on or None to create a new figure
- Returns:
- plot_log_ps(model_name, ax=None)#
plot the log likelihoods for each chain
- Parameters:
model_name – name for the model (str)
ax – matplotlib axis to plot on or None to create a new figure
- Returns:
- plot_measured(ax, handles, labels, c=None, site=None, label='Measured data', alpha=1, **kwargs)#
plot the measured data on the plot
- Parameters:
ax – matplotlib axis to plot on
handles – handles for the legend (don’t use ax.legend() as it may cause weird behavior)
labels – labels for the legend (don’t use ax.legend() as it may cause weird behavior)
c – color if None use ‘r’
site – site to plot (not used in BASESolver)
label – label for the measured data
alpha – alpha for the measured data
kwargs – other kwargs for the scatter plot
- Returns:
- plot_single_pdf(model_name, baselab, ax, percentiles, data, xdata, project_forward, pdf_alpha, plot_inflection_points, plot_annual_vlines, plot_xlab, plot_ylab, plot_measured, add_rmse, site=None, plot_color='b', leg_color=None, data_color=None)#
Actually plot the pdfs for the source or receptor data. This is used internally both for the source and receptor data for the plot_best_source_receptor, plot_best_source, and plot_best_receptor methods. It can also be used externally to plot pdfs in the same fashion to support custom plotting. Note that it expects specific data formats, so read the docs!
- Parameters:
model_name – name for the model (str)
baselab – the fundamental label for the dataset (e.g. Source, Receptor… etc.)
ax – matplotlib axis to plot on
percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th
data – the data to plot (np.array) shape=(nsims, time). typically the data will be either source or receptor data from self.make_source_receptor_data
xdata – the xdata to plot against (np.array) shape=(time,). typically the xdata will be either source or receptor points from self.make_source_receptor_data
project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series
pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])
plot_inflection_points – bool, whether to plot the inflection points
plot_annual_vlines – bool, whether to plot annual vertical lines
plot_xlab – bool, whether to set the xlabel
plot_ylab – bool, whether to set the ylabel
plot_measured – bool, whether to plot the measured data
add_rmse – bool, whether to calculate the rmse of the model, which will be returned
site – None or the site to plot. site must == None in BASESolver. If the site is None in the multi solvers and plot_measured then the measured data will be plotted for all sites
plot_color – color for the pdfs
leg_color – color for the pdfs on the legend (if None then plot_color is used)
data_color – color for the measured data if None then ‘r’ will be used (BASESolver) or the site color will be used (MultiSolver+)
- Returns:
out_ups: the upper percentiles (useful for setting ylims)
- Returns:
out_downs: the lower percentiles (useful for setting ylims)
- Returns:
handles: handles for the legend (don’t use ax.legend() as it may cause weird behavior)
- Returns:
labels: labels for the legend (don’t use ax.legend() as it may cause weird behavior)
- Returns:
rmse: float or None, the rmse of the model if add_rmse is True
- predict_bespoke(ts_concs, receptor_tvals, site=None, use_dilution=0)#
Predict a bespoke time series source concentration to a give set of receptor time values. This is useful for predicting the concentration at a receptor point that is not in the original time series. Note that this is an advanced feature and may be more difficult to use
- Parameters:
ts_concs – Pd.Series of source concentrations (index = age in years since start date, self.start_date) values = concentration, Note that the full period of the source data must be passed in, not just the time series of interest.
receptor_tvals – list of times to sample at (years since data starts, self.start_date)
site – dummy to support same call in MultiSiteDilutionBASESolver etc.
use_dilution – dummy to support same call in MultiSiteDilutionBASESolver etc.
- Returns:
pd.Series of receptor concentrations (index = receptor_tvals, values = concentration)
- run_base(model_name, params, starts=None, verbose=None, nverbose=100, start_random=True, hardboundaries=True, multitry=False, rerun=False, test_run=False, _clear_test_data=True, _raise_exception_on_test=True, raise_if_run=False, **kwargs)#
run the BASE model. This method will run the model using pydream and save the results to the save_dir.
If the model has already been run (and rerun=False) then it will check the log files to ensure that the args/kwargs/instance.attrs match the expected args/kwargs/instance.attrs.
If the model has not been run then it will run the model and save the results to the save_dir.
If test_run is True then it will run the model for the first 10 iterations and then raise an InterruptedError after removing the test data (assuming _clear_test_data=False). This is useful for testing the setup of the model. you may also want to set _raise_exception_on_test=False to avoid the exception being raised.
- Parameters:
model_name – name for the model (str)
params – A list of pydream.parameters.SampledParam objects Note SampledParam uses scipy, so you can pass either a list of instances or an instance that will produce the correct number of parameters. e.g., [SampledParam(uniform, [1,2], [2,3)] should be equivalent to [SampledParam(uniform, 1, 2), SampledParam(uniform, 2, 3)] you can also pass a single Sampled_Param instance which will be used for all inflection points. Note be careful of scipy’s uniform function which uses loc and scale rather than min and max. You can also pass a komanawa.BASE.generators.uniform_path_generator.UniformPathGenerator or komanawa.BASE.generators.normal_path_generator.NormalPathGenerator instance as parameters.
starts – Either a list of start locations to initialize chains in, or a single start location to initialize all chains in. Default: None, note that if start_random is True then this will be ignored/overwritten. Starts can be provided as a 1d iterable (nparams,) or a 2d iterable (nchains, nparams)
verbose – None (use self.verbose) or bool Whether to print verbose output (including acceptance or rejection of moves and the current acceptance rate)
nverbose – The number of iterations between verbose output. Default = 100
start_random – Whether to intialize chains from a random point in parameter space drawn from the prior (default = True). Will override starting position set when sample was called, if any.
hardboundaries – Whether to relect point back into bounds of hard prior (i.e., if using a uniform prior, reflect points outside of boundaries back in, so you don’t waste time looking at points with logpdf = -inf).
multitry – {bool, int} hether to utilize multi-try sampling. Default is False. If set to True, will be set to 5 multiple tries. Can also directly specify an integer if desired.
rerun – bool, whether to rerun the model if it already exists
test_run – bool, if True then setup the model and run for the first 10 iterations to test the setup, note that this will clear any saved data after running then raise InterruptedError
_clear_test_data – bool, if True then clear the data after running, only valid if test_run is True, used for testing only
_raise_exception_on_test – bool, if True then raise an InterruptedError after running the test, only valid if test_run is True, used for testing only
raise_if_run – bool, if True then raise an InterruptedError if the model would actually run base. This is used for checking that runs exist, parameters are set effectively, but avoiding large re-runs
kwargs – other kwargs passed to pydream.core.run_dream
- Returns:
run (bool if the model was run)
- load_npz_source_receptor(npzfile, interpolate_source=True, convert_to_datetime=False)#
Load the source and receptor data from a npz file (self serialized by BASESolver.get_source_receptor_data)
- Parameters:
npzfile – path to file
interpolate_source – bool to interpolate the source data
convert_to_datetime – bool to convert the source and receptor points to datetime objects instead of years since start date
- Returns:
receptor_idvs: receptor identification variables (ndarray, shape=(nreceptor_points) 1:nsims with 1 being the most probable simulation)
- Returns:
source_data: source concentration predictions (ndarray, shape=(nsims, source_points) (interpolated to self.precision if return_interpolated_source=True)
- Returns:
source_points: source concentration timings years since start date (ndarray, shape=(self.n_inflections) if project_forward=None else shape=(self.n_inflections+1)(nsource_points,) (interpolated to self.precision if return_interpolated_source=True)
- Returns:
outreceptor_data: receptor concentration predictions (ndarray, shape=(nsims, nreceptor_points)
- Returns:
receptor_points: receptor timing years since start date (ndarray, shape=(nreceptor_points)