komanawa.BASE.multisite_with_dilution#

created matt_dumont on: 6/7/24

Classes#

BaseSampledParam

Schema for a sampled parameter which captures the distribution and arguments for the distribution to check that the distribution is valid and has not changed.

MultiSiteBASESolver

class for predicting the source concentration from the BPEFM model across multiple sites. This class is a subclass of BASESolver

MultiSiteDilutionBASESolver

A MultiSiteBASESolver that includes dilution parameters for each site derived from a truncated half normal distribution.

Functions#

plot_comp_test(test_results, ax[, x, plot_stats, ...])

Plot the results of the distribution test.

Module Contents#

class BaseSampledParam(distribution, *args, **kwargs)#

Schema for a sampled parameter which captures the distribution and arguments for the distribution to check that the distribution is valid and has not changed.

Parameters:
  • distribution – The distribution to use (e.g. scipy.stats.uniform, scipy.stats.norm)

  • args – The arguments to pass to the distribution

  • kwargs – The keyword arguments to pass to the distribution

serialize()#

Serialize the object to a unique string to capture metadata

Returns:

class MultiSiteBASESolver(save_dir, n_inflections, ts_data, mrt, mrt_p1, mrt_p2, frac_p1, f_p1, f_p2, precision=2, min_conc=0.01, inflection_times=None, cdf_inflection_start=None, sigma=None, interpolator='linear', interpolator_kwargs=None, serialize_receptor_data=True, predict_ncores=None, nchains=5, niterations=50000, start_age=np.nan, cmap='tab20', verbose=True)#

Bases: komanawa.BASE.BASE_solver.BASESolver

Inheritance diagram of komanawa.BASE.multisite_with_dilution.MultiSiteBASESolver

class for predicting the source concentration from the BPEFM model across multiple sites. This class is a subclass of BASESolver

BASE: Bayesian approach to source estimation. This class uses MT-DREAM(ZS) algorithm to conduct Bayesian inference on the source concentration for a measured time series data and a BPEFM model.

The MT-DREAM(ZS) algorithm was introduced in: Laloy, E. & Vrugt, J. A. High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing. Water Resources Research 48, W01526 (2012).

Instance Variables:

Variables:
  • age_bounds – Age bounds for the model (years since start date)

  • age_fractions – Dictionary of Age fractions for age in the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist), keys = site

  • age_step – Age step for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)

  • all_ages – Dictionary of Ages for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist), keys = site

  • end_date – End date of the observed time series data (pd.Timestamp)

  • f_p1 – array of Fraction of the total mass in the first peak (age model parameter), in order of self.sites

  • f_p2 – array of Fraction of the total mass in the second peak (age model parameter), in order of self.sites

  • frac_p1 – array of Fraction of the total mass in the first peak (age model parameter), in order of self.sites

  • inflection_times – Inflection times for the model (years since start date). Source concentration will be parameterized at these times. Note that len(inflection_times) = n_inflections + 2, where n_inflections is passed in the init as the method automatically includes the start and end of the time series

  • measured_age_bounds – Age bounds for the measured time series data (years since start date)

  • mrt – array of Mean residence time for the age distribution (yrs) (age model parameter), in order of self.sites

  • mrt_p1 – array of Mean residence time for the first peak (yrs) (age model parameter), in order of self.sites

  • mrt_p2 – array of Mean residence time for the second peak (yrs) (age model parameter), in order of self.sites

  • nchains – Number of pydream chains for the model. note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower

  • niterations – The number of algorithm iterations to run.

  • precision – precision of the age distribution (decimal places years) default is 2, approximately monthly

  • predict_ncores – Number of cores for prediction (see param)

  • save_dir – working directory for the model

  • serialize_receptor_data – Boolean flag to serialize receptor data

  • start_date – Start date of the time series data (pd.Timestamp)

  • sites – set of keys for the time series data, essentially a set of sites.

  • all_t – dictionary of time index for the time series data (decimal years since start date), keys = site

  • all_ts_data – dictionary of Time series data (pandas Series) Index=self.t, values=concentration, keys = site

class attributes:

Variables:
  • available_interpolators – tuple of available interpolators

  • _force_old_predict – a flag to force the use of the old predict method for testing purposes, must be False

  • ts_data – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

  • sigma – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

  • t – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

  • ages – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

  • age_fractions – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

  • _int_ages – key instance attrs for BASESolver that must be None for MultiSiteBASESolver

Basic Usage:

from komanawa.BASE import MultiSiteBASESolver, generators, BaseSampledParam
from pathlib import Path
import matplotlib.pyplot as plt

# Set up the problem
first_model_name = 'first_model'
n_inflections = 5  # this will increase to 7 with the addition of a start and end point
ndata_site1 = get_ndata1()  # get your data, this should be a pandas series with a datetime index
ndata_site2 = get_ndata2()  # get your data, this should be a pandas series with a datetime index
output_dir = Path.home()
cdf_threshold = 0.15
mrt_site1 = 12
f_p1_site1 = 0.7
mrt_site2 = 20
f_p1_site2 = 0.5

# create the solver
solver = MultiSiteBASESolver(
            save_dir=output_dir,
            ts_data=dict(site1=ndata_site1, site2=ndata_site2),
            mrt=dict(site1=mrt_site1, site2=mrt_site2),
            mrt_p1=dict(site1=mrt_site1, site2=mrt_site2),
            mrt_p2=dict(site1=None, site2=None),
            frac_p1=dict(site1=1, site2=1),
            f_p1=dict(site1=f_p1_site1, site2=f_p1_site2),
            f_p2=dict(site1=f_p1_site1, site2=f_p1_site2),
            n_inflections=n_inflections,
            cdf_inflection_start=cdf_threshold,
        )

# set priors
lows = [0.001, 0.001, 0.1, 0.1, 0.2, 0.5, 1.0]
ups = [0.5, 2, 6, 6, 6, 6, 6]
params = BaseSampledParam(distribution=generators.UniformPathGenerator, low=lows, up=ups, n=7)

# run the model
run = solver.run_base(model_name=first_model_name,
        params=params, verbose=False)

# plot the results
nplot = 0.2 # plot 20% of the results
fig, ax = solver.plot_best_source_receptor(first_model_name, nplot=nplot, receptor_spacing=1)
plt.show()
Parameters:
  • save_dir – directory to run dreamzs in, files are named via model name kwarg

  • n_inflections – number of inflection times to use (int) or frequency of inflection times (see https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases)

  • ts_data – dictionary of time series data to fit to, sets the keys for the model, keys must be strings, values must be pandas Series with datetime index, monotonically increasing index, and no NaN values

  • mrt – dictionary of mean residence time of the source (yrs), keys must match ts_data

  • mrt_p1 – dictionary of mean residence time of the first piston (yrs), keys must match ts_data

  • mrt_p2 – dictionary of mean residence time of the second piston (yrs), keys must match ts_data

  • frac_p1 – dictionary of fraction of the source that is in the first piston, keys must match ts_data

  • f_p1 – dictionary of fraction of exponential flow the first piston, keys must match ts_data

  • f_p2 – dictionary of fraction of exponential flow the second piston (use a dummy value if frac_p1 = 1), keys must match ts_data

  • precision – precision of the age distribution (decimal places years) default is 2, approximately monthly

  • min_conc – minimum concentration for the interpolated source concentration, default is 0.01. This prevents weird interpolation behaviours in the early poorly constrained part of the time series. This is not a hard limit, but rather a minimum value for the interpolated source concentration. The actual value will be set to the maxumum of the interpolated value and min_conc.

  • inflection_times – None or list of datetimes of the inflection times. If None then the inflection times will be evenly spaced between the start and end of the ages + time series only one of cdf_inflection_start or inflection_times must be None inflection times must be None if n_inflections is a frequency code

  • cdf_inflection_start – None or float, the threshold for the cdf of ages to start the evenly spaced inflection points only one of cdf_inflection_start or inflection_times must be None.

  • sigma – None or dictionary of float or array of floats, the uncertainty in the time series data, if None then no uncertainty is used and self.log_likelihood_no_sigma is used, if a float then the same uncertainty is used for all points, if an array then the uncertainty is used for each point. Regardless if sigma is not None then self.log_likelihood_sigma is used. If dictionary then keys must match the keys of ts_data

  • interpolator

    str, the interpolator to use to interpolate the source concentration from the n_inflection points to the needed frequency for the 1d age function (BEPFM), options are:

    • ’linear’: Piecewise linear interpolation (default)

    • ’cubicspline’:Cubic spline data interpolator., see scipy.interpolate.CubicSpline

    • ’pchip’: PCHIP 1-D monotonic cubic interpolation., see scipy.interpolate.PchipInterpolator

    • ’akima’: Akima interpolator, see scipy.interpolate.Akima1DInterpolator

  • interpolator_kwargs – None or dict, additional kwargs passed to the interpolator (see interpolator)

  • serialize_receptor_data – bool, whether to serialize the receptor data for later use (increase diskuse but decrease runtime)

  • predict_ncores – None or int, number of cores to use for prediction, if None then use all available (virtual) cores

  • nchains – number of chains to run in dreamzs note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower

  • niterations – The number of algorithm iterations to run. The default is 50000. Higher values will take longer to run but may produce better results.

  • start_age – start age for the age distribution (yrs) default is np.nan which will use the maximum of the mrt_p1 and mrt_p2

  • cmap – str, the colormap to use for the plot, see matplotlib.pyplot.get_cmap

  • verbose – bool, whether to print the progress of the model

export_source_receptor(model_name, outdir, nexport, receptor_spacing=1, percentiles=(1, 5, 10, 25), tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', percentile_format='hdf', include_raw=False)#

export the source and receptor data to csv files or hdf file store

The HDF file will contain the following keys the csv files will have the same suffixes as the keys below:

  • source_percentiles: source data percentiles

  • {site}_receptor_percentiles: receptor data percentiles

  • {site}_measured: measured data

  • source_raw: optional if include_raw=True, source data

  • {site}_receptor_raw: optional if include_raw=True, receptor data

Parameters:
  • model_name – name for the model (str)

  • outdir – output directory

  • nexport – number of results to export (int) or fraction of results to export (float, nplot<=1)

  • receptor_spacing – space between receptor prediction points

  • percentiles – tuple of percentiles to export fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • percentile_format – str, ‘hdf’ or ‘csv’ format to save the percentiles

  • include_raw – bool, if True include the raw data in the output

Returns:

get_colors(rt_dict=False, repeat_threshold=None)#

get colors for a list of values

Parameters:
  • rt_dict – bool if true return a dictionary of colors else return a list of colors

  • repeat_threshold

Returns:

list of colors or dictionary

get_rmse(model_name, ntest, raw=False)#

get the rmse for the model run from the model name

Parameters:
  • model_name – name for the model (str)

  • ntest – number of test simulations to use for the rmse

  • raw – if True return the raw rmse values, if False return the 50th percentile and 5th and 95th percentiles

Returns:

rmse (float)

log_likelihood_no_sigma(params)#

calculate the log likelihood of the model (where the sigma of the observations IS NOT provided). The log likelihood without observation sigma function can be changed by subclassing this class and overwriting this method.

Parameters:

params – sampled parameters for the model instance (see BASESolver.run_base)

Returns:

likelihood (float)

log_likelihood_sigma(params)#

calculate the log likelihood of the model (where the sigma of the observations IS provided). The log likelihood function with observation sigma can be changed by subclassing this class and overwriting this method.

Parameters:

params – sampled parameters for the model instance (see BASESolver.run_base)

Returns:

likelihood (float)

make_source_receptor_data(model_name, nplot, receptor_spacing=1, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', source_only=False, recalc=False, _return_ncomputed=False, return_interpolated_source=True)#

make the source and receptor data for the run model. if self.serialize_receptor_data is True then the receptor data will be saved to a npz file for later use, and any receptors already computed will be loaded from the file. If the file is not found then the receptor data will be recomputed. The process of predicting the receptor data is parallelized using multiprocessing, but still can take some time, especially for large numbers of simulations.

The save file is unique to the model_name, receptor_spacing, project_forward, and tvals., Note that changes to the class instance attributes are not checked and will not cause the data to be recomputed.

Note that passing tvals or forward_ts_deltas, will serialize the receptor data with the hash of the tvals/forward_ts_deltas used in the filename. Tvals/forward_ts_deltas will also be saved in the npz file and checked on reload. Note that minor changes in the tvals/forward_ts_deltas (e.g. float imprecision) will cause the hash to change and the data to be recomputed. Therefore we suggest rounding tvals/forward_ts_deltas to a reasonable precision before passing them to this method.

Parameters:
  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – the number of years between receptor prediction points

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • source_only – bool, if True only return the source data (don’t compute the receptor data)

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • source_only – bool, if True only return the source data (don’t compute the receptor data)

  • recalc – bool, if True then recalculate the receptor data even if it has been serialized

  • return_interpolated_source – bool, if True return the source data interpolated to self.precision

  • _return_ncomputed – bool, if True return the number of computed receptor data points, for testing only

Returns:

source_data: source concentration predictions (ndarray, shape=(nsims, source_points) (interpolated to self.precision if return_interpolated_source=True)

Returns:

source_points: source concentration timings years since start date (ndarray, shape=(self.n_inflections) if project_forward=None else shape=(self.n_inflections+1)(nsource_points,) (interpolated to self.precision if return_interpolated_source=True)

Returns:

outreceptor_data: dict {site: data}: receptor concentration predictions (ndarray, shape=(nsims, nreceptor_points)

Returns:

receptor_points: dict {site: data}: receptor timing years since start date (ndarray, shape=(nreceptor_points)

plot_best_receptor(model_name, nplot, receptor_spacing=1, percentiles=(1, 5, 10, 25), pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#

plot the best n predictions of the receptor concentrations for each site (one per plot)

Parameters:
  • model_name – name for the model (str)

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – space between receptor points (decimal years)

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • figsize – size for the figure

  • plot_measured – bool, whether to plot the measured data on the plot

  • add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized

  • ax_recept – must be None, just to match BASESolver call.

  • allsite_on_one – bool, if True plot all sites on one plot, false make a plot for each site

  • allsite_on_one – bool, if True plot all sites on one plot :param rt_leg_hand_lab: if True also return the handles and labels for the legend

Returns:

dictionary {site: (fig, ax_recept)} - if if rt_leg_hand_lab is False (default)

Returns:

dictionary {site: (fig, ax_recept)}, {site: (rec_hand, rec_lab)}, rmse - if if rt_leg_hand_lab is True

plot_best_source_receptor(model_name, nplot, receptor_spacing=1, sharey=True, percentiles=(1, 5, 10, 25), title_additon='', pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_source=None, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#

plot fill between plots of the best n params and the predicted receptor concentrations for each site, one per plot

Parameters:
  • model_name – name for the model (str)

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – space between receptor points (decimal years)

  • sharey – bool, whether to share the y axis between the source and receptor plots

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • title_additon – str, addition to the title no character between the model name and the addition string

  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • figsize – size for the figure

  • plot_measured – bool, whether to plot the measured data on the prediction plot

  • add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized

  • ax_source – must be None

  • ax_recept – must be None

  • allsite_on_one – bool, if True plot all sites on one plot :param rt_leg_hand_lab: if True also return the handles and labels for the legend

Returns:

dictionary {site: (fig, (ax_source, ax_recept))} - if if rt_leg_hand_lab is False (default)

Returns:

dictionary {site: (fig, (ax_source, ax_recept))}, {site: ((src_hand,src_lab),(rec_hand, rec_lab)), rmse - if if rt_leg_hand_lab is True

plot_measured(ax, handles, labels, c=None, site=None, label='Measured data', alpha=1, **kwargs)#

plot the measured data

Parameters:
  • ax – matplotlib axis to plot on

  • handles – handles for the legend (don’t use ax.legend() as it may cause weird behavior)

  • labels – labels for the legend (don’t use ax.legend() as it may cause weird behavior)

  • c – color If None, plot site colors

  • site – site to plot, if None then plot all sites

  • label – label for the measured data

  • alpha – alpha for the measured data

  • kwargs – other kwargs for the scatter plot

Returns:

predict_bespoke(ts_concs, receptor_tvals, site=None, use_dilution=0)#

Predict a bespoke time series source concentration to a give set of receptor time values. This is useful for predicting the concentration at a receptor point that is not in the original time series. Note that this is an advanced feature and may be more difficult to use

Parameters:
  • ts_concs – Pd.Series of source concentrations (index = age in years since start date, self.start_date) values = concentration, Note that the full period of the source data must be passed in, not just the time series of interest.

  • receptor_tvals – list of times to sample at (years since data starts, self.start_date)

  • site – Site in MultiSiteBASESolver etc.

  • use_dilution – Dilution fraction (to support MultiSiteDilutionBASESolver)

Returns:

pd.Series of receptor concentrations (index = receptor_tvals, values = concentration)

set_log_likelihood_scaler(scales)#

set the scaler for the log_likelihood for the model. These scalers are multiplied by the log_likelihood to yield the final log_likelihood. The used log likelihood for each site are calculated as log likelihood * log_likelihood[site]. If the log_likelihood are not set by the user then they are set to 1 for each site. The scalers must be greater than or equal to 0.

Parameters:

scales – dictionary {site=log_likelihood} (log_likelihood=number) or None to reset to 1 for all sites

Returns:

class MultiSiteDilutionBASESolver#

Bases: komanawa.BASE.multisite_base_solver.MultiSiteBASESolver

Inheritance diagram of komanawa.BASE.multisite_with_dilution.MultiSiteDilutionBASESolver

A MultiSiteBASESolver that includes dilution parameters for each site derived from a truncated half normal distribution.

The dilution parameter definitions are prescribed by the self.set_dilution_parameters method. Otherwise the model is the same as the MultiSiteBASESolver.

The dilution parameters are used to scale the source concentration at each site. Dilution is calculated as:

out_conc = receptor_concentration * (1 - dilution_parameter)

Basic Usage:

from komanawa.BASE import MultiSiteDilutionBASESolver, generators, BaseSampledParam
from pathlib import Path
import matplotlib.pyplot as plt

# Set up the problem
first_model_name = 'first_model'
n_inflections = 5  # this will increase to 7 with the addition of a start and end point
ndata_site1 = get_ndata1()  # get your data, this should be a pandas series with a datetime index
ndata_site2 = get_ndata2()  # get your data, this should be a pandas series with a datetime index
output_dir = Path.home()
cdf_threshold = 0.15
mrt_site1 = 12
f_p1_site1 = 0.7
mrt_site2 = 20
f_p1_site2 = 0.5

# create the solver
solver = MultiSiteDilutionBASESolver(
            save_dir=output_dir,
            ts_data=dict(site1=ndata_site1, site2=ndata_site2),
            mrt=dict(site1=mrt_site1, site2=mrt_site2),
            mrt_p1=dict(site1=mrt_site1, site2=mrt_site2),
            mrt_p2=dict(site1=None, site2=None),
            frac_p1=dict(site1=1, site2=1),
            f_p1=dict(site1=f_p1_site1, site2=f_p1_site2),
            f_p2=dict(site1=f_p1_site1, site2=f_p1_site2),
            n_inflections=n_inflections,
            cdf_inflection_start=cdf_threshold,
        )

# set the dilution priors
dilution_high = 0.75  # the maximum dilution
dilution_scale = 0.1  # the scale of the truncated half normal distribution
solver.set_dilution_parameters(dilution_high=dilution_high, dilution_scale=dilution_scale)

# set concentration priors
lows = [0.001, 0.001, 0.1, 0.1, 0.2, 0.5, 1.0]
ups = [0.5, 2, 6, 6, 6, 6, 6]
params = BaseSampledParam(distribution=generators.UniformPathGenerator, low=lows, up=ups, n=7)

# run the model
run = solver.run_base(model_name=first_model_name,
        params=params, verbose=False)

# plot the results
nplot = 0.2 # plot 20% of the results
fig, ax = solver.plot_best_source_receptor(first_model_name, nplot=nplot, receptor_spacing=1)
plt.show()
get_dilution_parameters(model_name, nplot, site=None)#

get the dilution parameters for each site

Parameters:
  • model_name – model name

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • site – None or site to return

Returns:

pd.DataFrame or pd.Series of steady state concentrations

get_ss_conc(model_name, nplot, site=None)#

get the steady state concentration for each site (e.g. last source concentration * (1 - dilution_parameter))

Parameters:
  • model_name – model name

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • site – None or site to return

Returns:

pd.DataFrame or pd.Series of steady state concentrations

plot_dilution_parameters(model_name, nplot, ax=None)#

plot the dilution parameters

Parameters:
  • model_name – str, the model name

  • nplot – int/float, the number of plots to make

  • ax – None or matplotlib axis, if None create a new figure

Returns:

plot_test_dilution_parameters(model_name, nplot, ktest_kwargs=None, plot_kwargs=None, ax=None)#

plot the results of the KS test on the dilution parameters

Parameters:
  • model_name – str, the model name

  • nplot – int/float, the number of plots to make

  • ktest_kwargs – None or dict, kwargs to pass to the ktest method (SciPy.stats.kstest)

  • plot_kwargs – None or dict, kwargs to pass to the plot method (Komanawa.BASE.utils.plot_ktest)

  • ax – None or matplotlib axis, if None create a new figure

Returns:

fig, ax, (handles, labels)

set_dilution_parameters(dilution_high, dilution_scale)#

Set the dilution parameters for each site. The dilution parameters are assumed to be derived from a truncated half normal distribution. The parameters are used to scale the source concentration at each site. Dilution is calculated as:

out_conc = receptor_concentration * (1 - dilution_parameter)

A dilution_parameter of 0 means no dilution, 0.75 means 75% the source concentration is lost to dilution.

Parameters:
  • dilution_high – float or dict of floats (one per site) the lower bound of the truncated half normal distribution (the maximum dilution)

  • dilution_scale – float or dict of floats (one per site) the scale of the truncated half normal distribution

Returns:

test_dilution_parameters(model_name, nplot, test=kstest, **kwargs)#

perform a test on the dilution parameters for the best n results

The test can be any test that takes two samples e.g. scipy.stats.kstest, Mann-Whitney U test, wasserstein_distance_nd, etc.

Parameters:
  • model_name – str, the model name

  • n_gen – int or float the number of generations to use, if None use all generations

  • test – function, the test to run on the dilution parameters

  • kwargs – passed to the kstest method

Returns:

{site:KstestResult, …} for each site

plot_comp_test(test_results, ax, x=None, plot_stats=('statistic', 'pvalue'), plot_on_twinx=True, colors=None, **kwargs)#

Plot the results of the distribution test.

This can be used with any number of scipy tests like kstest, Mann-Whitney U test, wasserstein_distance_nd, etc.

Parameters:
  • test_results – [KstestResult, …] for each time step in the path

  • ax – axis to plot on or None (if None a new figure is created)

  • plot_stats – tuple subset of result values to plot or None (assumes results are float)

  • plot_on_twinx – if True plot the first of plot_stats on the ax and then each subsequent statistic on individual twinx axes

  • colors – dictionary of colors for each statistic or None (if None use default colors)

  • kwargs – other keyword arguments to pass to the ax.plot

Returns:

fig, ax, (handles, labels)