LAtools Documentation¶

latools.analyse object¶

class latools.latools.analyse(data_folder, errorhunt=False, config='DEFAULT', dataformat=None, extension='.csv', srm_identifier='STD', cmap=None, time_format=None, internal_standard='Ca43', names='file_names', srm_file=None, pbar=None)[source]¶

Bases: object

For processing and analysing whole LA - ICPMS datasets.

Parameters:

data_folder (str) – The path to a directory containing multiple data files.
errorhunt (bool) – If True, latools prints the name of each file before it imports the data. This is useful for working out which data file is causing the import to fail.
config (str) – The name of the configuration to use for the analysis. This determines which configuration set from the latools.cfg file is used, and overrides the default configuration setup. You might sepcify this if your lab routinely uses two different instruments.
dataformat (str or dict) – Either a path to a data format file, or a dataformat dict. See documentation for more details.
extension (str) – The file extension of your data files. Defaults to ‘.csv’.
srm_identifier (str) – A string used to separate samples and standards. srm_identifier must be present in all standard measurements. Defaults to ‘STD’.
cmap (dict) – A dictionary of {analyte: colour} pairs. Colour can be any valid matplotlib colour string, RGB or RGBA sequence, or hex string.
time_format (str) – A regex string identifying the time format, used by pandas when created a universal time scale. If unspecified (None), pandas attempts to infer the time format, but in some cases this might not work.
internal_standard (str) – The name of the analyte used as an internal standard throughout analysis.
names (str) –
- ‘file_names’ : use the file names as labels (default)
- ’metadata_names’ : used the ‘names’ attribute of metadata as the name anything else : use numbers.

folder¶: str – Path to the directory containing the data files, as specified by data_folder.

dirname¶: str – The name of the directory containing the data files, without the entire path.

files¶: array_like – A list of all files in folder.

param_dir¶: str – The directory where parameters are stored.

report_dir¶: str – The directory where plots are saved.

data¶: dict – A dict of latools.D data objects, labelled by sample name.

samples¶: array_like – A list of samples.

analytes¶: array_like – A list of analytes measured.

stds¶: array_like – A list of the latools.D objects containing hte SRM data. These must contain srm_identifier in the file name.

srm_identifier¶: str – A string present in the file names of all standards.

cmaps¶: dict – An analyte - specific colour map, used for plotting.

ablation_times(samples=None, subset=None)[source]¶

apply_classifier(name, samples=None, subset=None)[source]¶

Apply a clustering classifier based on all samples, or a subset.

Parameters:	name (str) – The name of the classifier to apply. subset (str) – The subset of samples to apply the classifier to.
Returns:	name
Return type:	str

autorange(analyte='total_counts', gwin=5, swin=3, win=20, on_mult=[1.0, 1.5], off_mult=[1.5, 1], transform='log', ploterrs=True, focus_stage='despiked')[source]¶

Automatically separates signal and background data regions.

Automatically detect signal and background regions in the laser data, based on the behaviour of a single analyte. The analyte used should be abundant and homogenous in the sample.

Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.

Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.

Parameters:

analyte (str) – The analyte that autorange should consider. For best results, choose an analyte that is present homogeneously in high concentrations. This can also be ‘total_counts’ to use the sum of all analytes.
gwin (int) – The smoothing window used for calculating the first derivative. Must be odd.
win (int) – Determines the width (c +/- win) of the transition data subsets.
smwin (int) – The smoothing window used for calculating the second derivative. Must be odd.
conf (float) – The proportional intensity of the fitted gaussian tails that determines the transition width cutoff (lower = wider transition regions excluded).
trans_mult (array_like, len=2) – Multiples of the peak FWHM to add to the transition cutoffs, e.g. if the transitions consistently leave some bad data proceeding the transition, set trans_mult to [0, 0.5] to ad 0.5 * the FWHM to the right hand side of the limit.
focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’, or rawdata’ if not despiked. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.

Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.

Returns:

Outputs added as instance attributes. Returns None.
bkg, sig, trn (iterable, bool) – Boolean arrays identifying background, signal and transision regions
bkgrng, sigrng and trnrng (iterable) – (min, max) pairs identifying the boundaries of contiguous True regions in the boolean arrays.

basic_processing(noise_despiker=True, despike_win=3, despike_nlim=12.0, despike_maxiter=4, autorange_analyte='total_counts', autorange_gwin=5, autorange_swin=3, autorange_win=20, autorange_on_mult=[1.0, 1.5], autorange_off_mult=[1.5, 1], autorange_transform='log', bkg_weight_fwhm=300.0, bkg_n_min=20, bkg_n_max=None, bkg_cstep=None, bkg_filter=False, bkg_f_win=7, bkg_f_n_lim=3, bkg_errtype='stderr', calib_drift_correct=True, calib_srms_used=['NIST610', 'NIST612', 'NIST614'], calib_zero_intercept=True, calib_n_min=10, plots=True)[source]¶

bkg_calc_interp1d(analytes=None, kind=1, n_min=10, n_max=None, cstep=None, bkg_filter=False, f_win=7, f_n_lim=3, focus_stage='despiked')[source]¶

Background calculation using a 1D interpolation.

scipy.interpolate.interp1D is used for interpolation.

Parameters:

analytes (str or iterable) – Which analyte or analytes to calculate.
kind (str or int) – Integer specifying the order of the spline interpolation used, or string specifying a type of interpolation. Passed to scipy.interpolate.interp1D
n_min (int) – Background regions with fewer than n_min points will not be included in the fit.
cstep (float or None) – The interval between calculated background points.
filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour:
f_win (int) – The size of the rolling window
f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold.
focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.

Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.

bkg_calc_weightedmean(analytes=None, weight_fwhm=None, n_min=20, n_max=None, cstep=None, bkg_filter=False, f_win=7, f_n_lim=3, focus_stage='despiked')[source]¶

Background calculation using a gaussian weighted mean.

Parameters:

analytes (str or iterable) – Which analyte or analytes to calculate.
weight_fwhm (float) – The full-width-at-half-maximum of the gaussian used to calculate the weighted average.
n_min (int) – Background regions with fewer than n_min points will not be included in the fit.
cstep (float or None) – The interval between calculated background points.
filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour:
f_win (int) – The size of the rolling window
f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold.
focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.

Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.

bkg_plot(analytes=None, figsize=None, yscale='log', ylim=None, err='stderr', save=True)[source]¶

Plot the calculated background.

Parameters:	analytes (str or iterable) – Which analyte(s) to plot. figsize (tuple) – The (width, height) of the figure, in inches. If None, calculated based on number of samples. yscale (str) – ‘log’ (default) or ‘linear’. ylim (tuple) – Manually specify the y scale. err (str) – What type of error to plot. Default is stderr. save (bool) – If True, figure is saved.
Returns:	fig, ax
Return type:	matplotlib.figure, matplotlib.axes

bkg_subtract(analytes=None, errtype='stderr', focus_stage='despiked')[source]¶

Subtract calculated background from data.

Must run bkg_calc first!

Parameters:

analytes (str or iterable) – Which analyte(s) to subtract.
errtype (str) – Which type of error to propagate. default is ‘stderr’.
focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.

Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.

calibrate(analytes=None, drift_correct=True, srms_used=['NIST610', 'NIST612', 'NIST614'], zero_intercept=True, n_min=10, reload_srm_database=False)[source]¶

Calibrates the data to measured SRM values.

Assumes that y intercept is zero.

Parameters:	analytes (str or iterable) – Which analytes you’d like to calibrate. Defaults to all. drift_correct (bool) – Whether to pool all SRM measurements into a single calibration, or vary the calibration through the run, interpolating coefficients between measured SRMs. srms_used (str or iterable) – Which SRMs have been measured. Must match names given in SRM data file exactly. n_min (int) – The minimum number of data points an SRM measurement must have to be included.
Returns:
Return type:	None

calibration_plot(analytes=None, datarange=True, loglog=False, ncol=3, save=True)[source]¶

clear_calibration()[source]¶

correct_spectral_interference(target_analyte, source_analyte, f)[source]¶

Correct spectral interference.

Subtract interference counts from target_analyte, based on the intensity of a source_analayte and a known fractional contribution (f).

Correction takes the form: target_analyte -= source_analyte * f

Only operates on background-corrected data (‘bkgsub’). To undo a correction, rerun self.bkg_subtract().

Example

To correct 44Ca+ for an 88Sr++ interference, where both 43.5 and 44 Da peaks are known: f = abundance(88Sr) / (abundance(87Sr)

counts(44Ca) = counts(44 Da) - counts(43.5 Da) * f

Parameters:	target_analyte (str) – The name of the analyte to modify. source_analyte (str) – The name of the analyte to base the correction on. f (float) – The fraction of the intensity of the source_analyte to subtract from the target_analyte. Correction is: target_analyte - source_analyte * f
Returns:
Return type:	None

correlation_plots(x_analyte, y_analyte, window=15, filt=True, recalc=False, samples=None, subset=None, outdir=None)[source]¶

Plot the local correlation between two analytes.

Parameters:	y_analyte (x_analyte,) – The names of the x and y analytes to correlate. window (int, None) – The rolling window used when calculating the correlation. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns:
Return type:	None

crossplot(analytes=None, lognorm=True, bins=25, filt=False, samples=None, subset=None, figsize=(12, 12), save=False, colourful=True, mode='hist2d', **kwargs)[source]¶

Plot analytes against each other.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram. bins (int) – The number of bins in the 2D histogram. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. figsize (tuple) – Figure size (width, height) in inches. save (bool or str) – If True, plot is saves as ‘crossplot.png’, if str plot is saves as str. colourful (bool) – Whether or not the plot should be colourful :). mode (str) – ‘hist2d’ (default) or ‘scatter’
Returns:
Return type:	(fig, axes)

crossplot_filters(filter_string, analytes=None, samples=None, subset=None, filt=None)[source]¶

Plot the results of a group of filters in a crossplot.

Parameters:	filter_string (str) – A string that identifies a group of filters. e.g. ‘test’ would plot all filters with ‘test’ in the name. analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
Returns:
Return type:	fig, axes objects

despike(expdecay_despiker=False, exponent=None, noise_despiker=True, win=3, nlim=12.0, exponentplot=False, maxiter=4, autorange_kwargs={}, focus_stage='rawdata')[source]¶

Despikes data with exponential decay and noise filters.

Parameters:	expdecay_despiker (bool) – Whether or not to apply the exponential decay filter. exponent (None or float) – The exponent for the exponential decay filter. If None, it is determined automatically using find_expocoef. tstep (None or float) – The timeinterval between measurements. If None, it is determined automatically from the Time variable. noise_despiker (bool) – Whether or not to apply the standard deviation spike filter. win (int) – The rolling window over which the spike filter calculates the trace statistics. nlim (float) – The number of standard deviations above the rolling mean that data are excluded. exponentplot (bool) – Whether or not to show a plot of the automatically determined exponential decay exponent. maxiter (int) – The max number of times that the fitler is applied. focus_stage (str) – Which stage of analysis to apply processing to. Defaults to ‘rawdata’. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data. Created by self.separate, after signal and background regions have been identified by self.autorange. ’bkgsub’: background subtracted data, created by self.bkg_correct ’ratios’: element ratio data, created by self.ratio. ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Returns:
Return type:	None

export_traces(outdir=None, focus_stage=None, analytes=None, samples=None, subset='All_Analyses', filt=False, zip_archive=False)[source]¶

Function to export raw data.

Parameters:

outdir (str) – directory to save toe traces. Defaults to ‘main-dir-name_export’.
focus_stage (str) –
The name of the analysis stage to export.
- ’rawdata’: raw data, loaded from csv file.
- ’despiked’: despiked data.
- ’signal’/’background’: isolated signal and background data. Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Defaults to the most recent stage of analysis.
analytes (str or array - like) – Either a single analyte, or list of analytes to export. Defaults to all analytes.
samples (str or array - like) – Either a single sample name, or list of samples to export. Defaults to all samples.
filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.

filter_clear(samples=None, subset=None)[source]¶: Clears (deletes) all data filters.

filter_clustering(analytes, filt=False, normalise=True, method='kmeans', include_time=False, samples=None, sort=True, subset=None, level='sample', min_data=10, **kwargs)[source]¶

Applies an n - dimensional clustering filter to the data.

Parameters:	analytes (str) – The analyte(s) that the filter applies to. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. normalise (bool) – Whether or not to normalise the data to zero mean and unit variance. Reccomended if clustering based on more than 1 analyte. Uses sklearn.preprocessing.scale. method (str) – Which clustering algorithm to use: ’meanshift’: The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation. ’kmeans’: The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters. level (str) – Whether to conduct the clustering analysis at the ‘sample’ or ‘population’ level. include_time (bool) – Whether or not to include the Time variable in the clustering analysis. Useful if you’re looking for spatially continuous clusters in your data, i.e. this will identify each spot in your analysis as an individual cluster. samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples. sort (bool) – Whether or not you want the cluster labels to be sorted by the mean magnitude of the signals they are based on (0 = lowest) min_data (int) – The minimum number of data points that should be considered by the filter. Default = 10. kwargs – Parameters passed to the clustering algorithm specified by method. Parameters** (K-Means) – bandwidth : str or float The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth. bin_seeding : bool Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation. Parameters – n_clusters : int The number of clusters expected in the data.
Returns:
Return type:	None

filter_correlation(x_analyte, y_analyte, window=None, r_threshold=0.9, p_threshold=0.05, filt=True, samples=None, subset=None)[source]¶

Applies a correlation filter to the data.

Calculates a rolling correlation between every window points of two analytes, and excludes data where their Pearson’s R value is above r_threshold and statistically significant.

Data will be excluded where their absolute R value is greater than r_threshold AND the p - value associated with the correlation is less than p_threshold. i.e. only correlations that are statistically significant are considered.

Parameters:	y_analyte (x_analyte,) – The names of the x and y analytes to correlate. window (int, None) – The rolling window used when calculating the correlation. r_threshold (float) – The correlation index above which to exclude data. Note: the absolute pearson R value is considered, so negative correlations below -r_threshold will also be excluded. p_threshold (float) – The significant level below which data are excluded. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
Returns:
Return type:	None

filter_defragment(threshold, mode='include', filt=True, samples=None, subset=None)[source]¶

Remove ‘fragments’ from the calculated filter

Parameters:	threshold (int) – Contiguous data regions that contain this number or fewer points are considered ‘fragments’ mode (str) – Specifies wither to ‘include’ or ‘exclude’ the identified fragments. filt (bool or filt string) – Which filter to apply the defragmenter to. Defaults to True samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples. subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns:
Return type:	None

filter_effect(analytes=None, stats=['mean', 'std'], filt=True)[source]¶

Quantify the effects of the active filters.

Parameters:	analytes (str or list) – Which analytes to consider. stats (list) – Which statistics to calculate. file (valid filter string or bool) – Which filter to consider. If True, applies all active filters.
Returns:	Contains statistics calculated for filtered and unfiltered data, and the filtered/unfiltered ratio.
Return type:	pandas.DataFrame

filter_exclude_downhole(threshold, filt=True, samples=None, subset=None)[source]¶

Exclude all points down-hole (after) the first excluded data.

Parameters:	threhold (int) – The minimum number of contiguous excluded data points that must exist before downhole exclusion occurs. file (valid filter string or bool) – Which filter to consider. If True, applies to currently active filters.

filter_gradient_threshold(analyte, threshold, win=15, samples=None, subset=None)[source]¶

Calculate a gradient threshold filter to the data.

Generates two filters above and below the threshold value for a given analyte.

Parameters:	analyte (str) – The analyte that the filter applies to. win (int) – The window over which to calculate the moving gradient threshold (float) – The threshold value. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples. subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns:
Return type:	None

filter_gradient_threshold_percentile(analyte, percentiles, level='population', win=15, filt=False, samples=None, subset=None)[source]¶

Calculate a gradient threshold filter to the data.

Generates two filters above and below the threshold value for a given analyte.

Parameters:	analyte (str) – The analyte that the filter applies to. win (int) – The window over which to calculate the moving gradient percentiles (float or iterable of len=2) – The percentile values. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples. subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns:
Return type:	None

filter_nremoved(filt=True, quiet=False)[source]¶: Report how many data are removed by the active filters.

filter_off(filt=None, analyte=None, samples=None, subset=None, show_status=False)[source]¶

Turns data filters off for particular analytes and samples.

Parameters:	filt (optional, str or array_like) – Name, partial name or list of names of filters. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name are activated. Defaults to all filters. analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes. samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
Returns:
Return type:	None

filter_on(filt=None, analyte=None, samples=None, subset=None, show_status=False)[source]¶

Turns data filters on for particular analytes and samples.

Parameters:	filt (optional, str or array_like) – Name, partial name or list of names of filters. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name are activated. Defaults to all filters. analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes. samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
Returns:
Return type:	None

filter_reports(analytes, filt_str='all', nbin=5, samples=None, outdir=None, subset='All_Samples')[source]¶: Plot filter reports for all filters that contain filt_str in the name.

filter_status(sample=None, subset=None, stds=False)[source]¶

Prints the current status of filters for specified samples.

Parameters:	sample (str) – Which sample to print. subset (str) – Specify a subset stds (bool) – Whether or not to include standards.

filter_threshold(analyte, threshold, samples=None, subset=None)[source]¶

Applies a threshold filter to the data.

Generates two filters above and below the threshold value for a given analyte.

Parameters:	analyte (str) – The analyte that the filter applies to. threshold (float) – The threshold value. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples. subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns:
Return type:	None

filter_threshold_percentile(analyte, percentiles, level='population', filt=False, samples=None, subset=None)[source]¶

Applies a threshold filter to the data.

Generates two filters above and below the threshold value for a given analyte.

Parameters:	analyte (str) – The analyte that the filter applies to. percentiles (float or iterable of len=2) – The percentile values. level (str) – Whether to calculate percentiles from the entire dataset (‘population’) or for each individual sample (‘individual’) filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples. subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns:
Return type:	None

filter_trim(start=1, end=1, filt=True, samples=None, subset=None)[source]¶

Remove points from the start and end of filter regions.

Parameters:	end (start,) – The number of points to remove from the start and end of the specified filter. filt (valid filter string or bool) – Which filter to trim. If True, applies to currently active filters.

find_expcoef(nsd_below=0.0, plot=False, trimlim=None, autorange_kwargs={})[source]¶

Determines exponential decay coefficient for despike filter.

Fits an exponential decay function to the washout phase of standards to determine the washout time of your laser cell. The exponential coefficient reported is nsd_below standard deviations below the fitted exponent, to ensure that no real data is removed.

Total counts are used in fitting, rather than a specific analyte.

Parameters:	nsd_below (float) – The number of standard deviations to subtract from the fitted coefficient when calculating the filter exponent. plot (bool or str) – If True, creates a plot of the fit, if str the plot is to the location specified in str. trimlim (float) – A threshold limit used in determining the start of the exponential decay region of the washout. Defaults to half the increase in signal over background. If the data in the plot don’t fall on an exponential decay line, change this number. Normally you’ll need to increase it.
Returns:
Return type:	None

fit_classifier(name, analytes, method, samples=None, subset=None, filt=True, sort_by=0, **kwargs)[source]¶

Create a clustering classifier based on all samples, or a subset.

Parameters:	name (str) – The name of the classifier. analytes (str or iterable) – Which analytes the clustering algorithm should consider. method (str) – Which clustering algorithm to use. Can be: ’meanshift’ The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation. ’kmeans’ The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters. samples (iterable) – list of samples to consider. Overrides ‘subset’. subset (str) – The subset of samples used to fit the classifier. Ignored if ‘samples’ is specified. sort_by (int) – Which analyte the resulting clusters should be sorted by - defaults to 0, which is the first analyte. kwargs – method-specific keyword parameters - see below. Parameters** (Meanshift) – bandwidth : str or float The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth. bin_seeding : bool Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation. - Means Parameters (K) – n_clusters : int The number of clusters expected in the data.
Returns:	name
Return type:	str

get_background(n_min=10, n_max=None, focus_stage='despiked', bkg_filter=False, f_win=5, f_n_lim=3)[source]¶

Extract all background data from all samples on universal time scale. Used by both ‘polynomial’ and ‘weightedmean’ methods.

Parameters:	n_min (int) – The minimum number of points a background region must have to be included in calculation. n_max (int) – The maximum number of points a background region must have to be included in calculation. filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour: f_win (int) – The size of the rolling window f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold. focus_stage (str) – Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data. Created by self.separate, after signal and background regions have been identified by self.autorange. ’bkgsub’: background subtracted data, created by self.bkg_correct ’ratios’: element ratio data, created by self.ratio. ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Returns:
Return type:	pandas.DataFrame object containing background data.

get_focus(filt=False, samples=None, subset=None, nominal=False)[source]¶

Collect all data from all samples into a single array. Data from standards is not collected.

Parameters:	filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. samples (str or list) – which samples to get subset (str or int) – which subset to get
Returns:
Return type:	None

get_gradients(analytes=None, win=15, filt=False, samples=None, subset=None, recalc=True)[source]¶

Collect all data from all samples into a single array. Data from standards is not collected.

Parameters:	filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. samples (str or list) – which samples to get subset (str or int) – which subset to get
Returns:
Return type:	None

getstats(save=True, filename=None, samples=None, subset=None, ablation_time=False)[source]¶: Return pandas dataframe of all sample statistics.

gradient_crossplot(analytes=None, win=15, lognorm=True, bins=25, filt=False, samples=None, subset=None, figsize=(12, 12), save=False, colourful=True, mode='hist2d', recalc=True, **kwargs)[source]¶

Plot analyte gradients against each other.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram. bins (int) – The number of bins in the 2D histogram. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. figsize (tuple) – Figure size (width, height) in inches. save (bool or str) – If True, plot is saves as ‘crossplot.png’, if str plot is saves as str. colourful (bool) – Whether or not the plot should be colourful :). mode (str) – ‘hist2d’ (default) or ‘scatter’ recalc (bool) – Whether to re-calculate the gradients, or use existing gradients.
Returns:
Return type:	(fig, axes)

gradient_histogram(analytes=None, win=15, filt=False, bins=None, samples=None, subset=None, recalc=True, ncol=4)[source]¶

Plot a histogram of the gradients in all samples.

Parameters:	filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. bins (None or array-like) – The bins to use in the histogram samples (str or list) – which samples to get subset (str or int) – which subset to get recalc (bool) – Whether to re-calculate the gradients, or use existing gradients.
Returns:
Return type:	fig, ax

gradient_plots(analytes=None, win=15, samples=None, ranges=False, focus=None, outdir=None, figsize=[10, 4], subset='All_Analyses')[source]¶

Plot analyte gradients as a function of time.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. samples (optional, array_like or str) – The sample(s) to plot. Defaults to all samples. ranges (bool) – Whether or not to show the signal/backgroudn regions identified by ‘autorange’. focus (str) – The focus ‘stage’ of the analysis to plot. Can be ‘rawdata’, ‘despiked’:, ‘signal’, ‘background’, ‘bkgsub’, ‘ratios’ or ‘calibrated’. outdir (str) – Path to a directory where you’d like the plots to be saved. Defaults to ‘reports/[focus]’ in your data directory. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. scale (str) – If ‘log’, plots the data on a log scale. figsize (array_like) – Array of length 2 specifying figure [width, height] in inches. stats (bool) – Whether or not to overlay the mean and standard deviations for each trace. err (stat,) – The names of the statistic and error components to plot. Deafaults to ‘nanmean’ and ‘nanstd’.
Returns:
Return type:	None

histograms(analytes=None, bins=25, logy=False, filt=False, colourful=True)[source]¶

Plot histograms of analytes.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. bins (int) – The number of bins in each histogram (default = 25) logy (bool) – If true, y axis is a log scale. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. colourful (bool) – If True, histograms are colourful :)
Returns:
Return type:	(fig, axes)

make_subset(samples=None, name=None)[source]¶

Creates a subset of samples, which can be treated independently.

Parameters:	samples (str or array - like) – Name of sample, or list of sample names. name ((optional) str or number) – The name of the sample group. Defaults to n + 1, where n is the highest existing group number

minimal_export(target_analytes=None, path=None)[source]¶

Exports a analysis parameters, standard info and a minimal dataset, which can be imported by another user.

Parameters:

target_analytes (str or iterable) – Which analytes to include in the export. If specified, the export will contain these analytes, and all other analytes used during data processing (e.g. during filtering). If not specified, all analytes are exported.
path (str) – Where to save the minimal export. If it ends with .zip, a zip file is created. If it’s a folder, all data are exported to a folder.

optimisation_plots(overlay_alpha=0.5, samples=None, subset=None, **kwargs)[source]¶

Plot the result of signal_optimise.

signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.

Parameters:	d (latools.D object) – A latools data object. overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1. **kwargs – Passed to tplot

optimise_signal(analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, filt=True, weights=None, mode='minimise', samples=None, subset=None)[source]¶

Optimise data selection based on specified analytes.

Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.

Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.

The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.

All possible regions of the data that have at least min_points are considered.

For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.

Parameters:

d (latools.D object) – An latools data object.
analytes (str or array-like) – Which analytes to consider.
min_points (int) – The minimum number of contiguous points to consider.
threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number.
weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None.

ratio(internal_standard=None)[source]¶

Calculates the ratio of all analytes to a single analyte.

Parameters:	internal_standard (str) – The name of the analyte to divide all other analytes by.
Returns:
Return type:	None

sample_stats(analytes=None, filt=True, stats=['mean', 'std'], eachtrace=True, csf_dict={})[source]¶

Calculate sample statistics.

Returns samples, analytes, and arrays of statistics of shape (samples, analytes). Statistics are calculated from the ‘focus’ data variable, so output depends on how the data have been processed.

Included stat functions:

mean(): arithmetic mean
std(): arithmetic standard deviation
se(): arithmetic standard error
H15_mean(): Huber mean (outlier removal)
H15_std(): Huber standard deviation (outlier removal)
H15_se(): Huber standard error (outlier removal)

Parameters:	analytes (optional, array_like or str) – The analyte(s) to calculate statistics for. Defaults to all analytes. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. stats (array_like) – take a single array_like input, and return a single statistic. list of functions or names (see above) or functions that Function should be able to cope with NaN values. eachtrace (bool) – Whether to calculate the statistics for each analysis spot individually, or to produce per - sample means. Default is True.
Returns:	Adds dict to analyse object containing samples, analytes and functions and data.
Return type:	None

save_log(directory=None, logname=None, header=None)[source]¶: Save analysis.lalog in specified location

set_focus(focus_stage=None, samples=None, subset=None)[source]¶

Set the ‘focus’ attribute of the data file.

The ‘focus’ attribute of the object points towards data from a particular stage of analysis. It is used to identify the ‘working stage’ of the data. Processing functions operate on the ‘focus’ stage, so if steps are done out of sequence, things will break.

Names of analysis stages:

‘rawdata’: raw data, loaded from csv file when object is initialised.
‘despiked’: despiked data.
‘signal’/’background’: isolated signal and background data, padded with np.nan. Created by self.separate, after signal and background regions have been identified by self.autorange.
‘bkgsub’: background subtracted data, created by self.bkg_correct
‘ratios’: element ratio data, created by self.ratio.
‘calibrated’: ratio data calibrated to standards, created by self.calibrate.

Parameters:	focus (str) – The name of the analysis stage desired.
Returns:
Return type:	None

srm_compile_measured(n_min=10)[source]¶

srm_id_auto(srms_used=['NIST610', 'NIST612', 'NIST614'], n_min=10, reload_srm_database=False)[source]¶

Function for automarically identifying SRMs

Parameters:	srms_used (iterable) – Which SRMs have been used. Must match SRM names in SRM database exactly (case sensitive!). n_min (int) – The minimum number of data points a SRM measurement must contain to be included.

srm_load_database(srms_used=None, reload=False)[source]¶

statplot(analytes=None, samples=None, figsize=None, stat='mean', err='std', subset=None)[source]¶

Function for visualising per-ablation and per-sample means.

Parameters:

analytes (str or iterable) – Which analyte(s) to plot
samples (str or iterable) – Which sample(s) to plot
figsize (tuple) – Figure (width, height) in inches
stat (str) – Which statistic to plot. Must match the name of the functions used in ‘sample_stats’.
err (str) – Which uncertainty to plot.
subset (str) – Which subset of samples to plot.

trace_plots(analytes=None, samples=None, ranges=False, focus=None, outdir=None, filt=None, scale='log', figsize=[10, 4], stats=False, stat='nanmean', err='nanstd', subset='All_Analyses')[source]¶

Plot analytes as a function of time.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. samples (optional, array_like or str) – The sample(s) to plot. Defaults to all samples. ranges (bool) – Whether or not to show the signal/backgroudn regions identified by ‘autorange’. focus (str) – The focus ‘stage’ of the analysis to plot. Can be ‘rawdata’, ‘despiked’:, ‘signal’, ‘background’, ‘bkgsub’, ‘ratios’ or ‘calibrated’. outdir (str) – Path to a directory where you’d like the plots to be saved. Defaults to ‘reports/[focus]’ in your data directory. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt. scale (str) – If ‘log’, plots the data on a log scale. figsize (array_like) – Array of length 2 specifying figure [width, height] in inches. stats (bool) – Whether or not to overlay the mean and standard deviations for each trace. err (stat,) – The names of the statistic and error components to plot. Deafaults to ‘nanmean’ and ‘nanstd’.
Returns:
Return type:	None

zeroscreen(focus_stage=None)[source]¶: Remove all points containing data below zero (which are impossible!)

latools.latools.reproduce(past_analysis, plotting=False, data_folder=None, srm_table=None, custom_stat_functions=None)[source]¶

Reproduce a previous analysis exported with latools.analyse.minimal_export()

For normal use, supplying log_file and specifying a plotting option should be enough to reproduce an analysis. All requisites (raw data, SRM table and any custom stat functions) will then be imported from the minimal_export folder.

You may also specify your own raw_data, srm_table and custom_stat_functions, if you wish.

Parameters:

log_file (str) – The path to the log file produced by minimal_export().
plotting (bool) – Whether or not to output plots.
data_folder (str) – Optional. Specify a different data folder. Data folder should normally be in the same folder as the log file.
srm_table (str) – Optional. Specify a different SRM table. SRM table should normally be in the same folder as the log file.
custom_stat_functions (str) – Optional. Specify a python file containing custom stat functions for use by reproduce. Any custom stat functions should normally be included in the same folder as the log file.

latools.D object¶

The Data object, used to contain single laser ablation data files.

class latools.D_obj.D(data_file, dataformat=None, errorhunt=False, cmap=None, internal_standard=None, name='file_names')[source]¶

Bases: object

Container for data from a single laser ablation analysis.

Parameters:	data_file (str) – The path to a data file. errorhunt (bool) – Whether or not to print each data file name before import. This is useful for tracing which data file is causing the import to fail. dataformat (str or dict) – Either a path to a data format file, or a dataformat dict. See documentation for more details.

sample¶: str – Sample name.

meta¶: dict – Metadata extracted from the csv header. Contents varies, depending on your dataformat.

analytes¶: array_like – A list of analytes measured.

data¶

dict – A dictionary containing the raw data, and modified data from each processing stage. Entries can be:

‘rawdata’: created during initialisation.

‘despiked’: created by despike

‘signal’: created by autorange

‘background’: created by autorange

‘bkgsub’: created by bkg_correct

‘ratios’: created by ratio

‘calibrated’: created by calibrate

focus¶: dict – A dictionary containing one item from data. This is the currently ‘active’ data that processing functions will work on. This data is also directly available as class attributes with the same names as the items in focus.

focus_stage¶: str – Identifies which item in data is currently assigned to focus.

cmap¶: dict – A dictionary containing hex colour strings corresponding to each measured analyte.

bkg, sig, trn: array_like, bool – Boolean arrays identifying signal, background and transition regions. Created by autorange.

bkgrng, sigrng, trnrng: array_like – An array of shape (n, 2) containing pairs of values that describe the Time limits of background, signal and transition regions.

ns¶: array_like – An integer array the same length as the data, where each analysis spot is labelled with a unique number. Used for separating analysis spots when calculating sample statistics.

filt¶: filt object – An object for storing, selecting and creating data filters.F

ablation_times()[source]¶

Function for calculating the ablation time for each ablation.

Returns:
Return type:	dict of times for each ablation.

autorange(analyte='total_counts', gwin=5, swin=3, win=30, on_mult=[1.0, 1.0], off_mult=[1.0, 1.5], ploterrs=True, transform='log', **kwargs)[source]¶

Automatically separates signal and background data regions.

Automatically detect signal and background regions in the laser data, based on the behaviour of a single analyte. The analyte used should be abundant and homogenous in the sample.

Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.

Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.

Parameters:

analyte (str) – The analyte that autorange should consider. For best results, choose an analyte that is present homogeneously in high concentrations.
gwin (int) – The smoothing window used for calculating the first derivative. Must be odd.
win (int) – Determines the width (c +/- win) of the transition data subsets.
and off_mult (on_mult) – Factors to control the width of the excluded transition regions. A region n times the full - width - half - maximum of the transition gradient will be removed either side of the transition center. on_mult and off_mult refer to the laser - on and laser - off transitions, respectively. See manual for full explanation. Defaults to (1.5, 1) and (1, 1.5).

Returns:

Outputs added as instance attributes. Returns None.
bkg, sig, trn (iterable, bool) – Boolean arrays identifying background, signal and transision regions
bkgrng, sigrng and trnrng (iterable) – (min, max) pairs identifying the boundaries of contiguous True regions in the boolean arrays.

autorange_plot(analyte='total_counts', gwin=7, swin=None, win=20, on_mult=[1.5, 1.0], off_mult=[1.0, 1.5], transform='log')[source]¶: Plot a detailed autorange report for this sample.

bkg_subtract(analyte, bkg, ind=None, focus_stage='despiked')[source]¶

Subtract provided background from signal (focus stage).

Results is saved in new ‘bkgsub’ focus stage

Returns:
Return type:	None

calc_correlation(x_analyte, y_analyte, window=15, filt=True, recalc=True)[source]¶

Calculate local correlation between two analytes.

Parameters:	y_analyte (x_analyte,) – The names of the x and y analytes to correlate. window (int, None) – The rolling window used when calculating the correlation. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns:
Return type:	None

calibrate(calib_ps, analytes=None)[source]¶

Apply calibration to data.

The calib_dict must be calculated at the analyse level, and passed to this calibrate function.

Parameters:	calib_dict (dict) – A dict of calibration values to apply to each analyte.
Returns:
Return type:	None

correct_spectral_interference(target_analyte, source_analyte, f)[source]¶

Correct spectral interference.

Subtract interference counts from target_analyte, based on the intensity of a source_analayte and a known fractional contribution (f).

Correction takes the form: target_analyte -= source_analyte * f

Only operates on background-corrected data (‘bkgsub’).

To undo a correction, rerun self.bkg_subtract().

Parameters:	target_analyte (str) – The name of the analyte to modify. source_analyte (str) – The name of the analyte to base the correction on. f (float) – The fraction of the intensity of the source_analyte to subtract from the target_analyte. Correction is: target_analyte - source_analyte * f
Returns:
Return type:	None

correlation_plot(x_analyte, y_analyte, window=15, filt=True, recalc=False)[source]¶

Plot the local correlation between two analytes.

Parameters:	y_analyte (x_analyte,) – The names of the x and y analytes to correlate. window (int, None) – The rolling window used when calculating the correlation. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns:	fig, axs
Return type:	figure and axes objects

crossplot(analytes=None, bins=25, lognorm=True, filt=True, colourful=True, figsize=(12, 12))[source]¶

Plot analytes against each other.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram. bins (int) – The number of bins in the 2D histogram. filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
Returns:
Return type:	(fig, axes)

crossplot_filters(filter_string, analytes=None)[source]¶

Plot the results of a group of filters in a crossplot.

Parameters:	filter_string (str) – A string that identifies a group of filters. e.g. ‘test’ would plot all filters with ‘test’ in the name. analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
Returns:
Return type:	fig, axes objects

despike(expdecay_despiker=True, exponent=None, noise_despiker=True, win=3, nlim=12.0, maxiter=3)[source]¶

Applies expdecay_despiker and noise_despiker to data.

Parameters:	expdecay_despiker (bool) – Whether or not to apply the exponential decay filter. exponent (None or float) – The exponent for the exponential decay filter. If None, it is determined automatically using find_expocoef. noise_despiker (bool) – Whether or not to apply the standard deviation spike filter. win (int) – The rolling window over which the spike filter calculates the trace statistics. nlim (float) – The number of standard deviations above the rolling mean that data are excluded. maxiter (int) – The max number of times that the fitler is applied.
Returns:
Return type:	None

filt_nremoved(filt=True)[source]¶

filter_clustering(analytes, filt=False, normalise=True, method='meanshift', include_time=False, sort=None, min_data=10, **kwargs)[source]¶

Applies an n - dimensional clustering filter to the data.

Available Clustering Algorithms

‘meanshift’: The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation.
‘kmeans’: The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters.
‘DBSCAN’: The sklearn.cluster.DBSCAN algorithm. Automatically determines the number and characteristics of clusters within the data based on the ‘connectivity’ of the data (i.e. how far apart each data point is in a multi - dimensional parameter space). Requires you to set eps, the minimum distance point must be from another point to be considered in the same cluster, and min_samples, the minimum number of points that must be within the minimum distance for it to be considered a cluster. It may also be run in automatic mode by specifying n_clusters alongside min_samples, where eps is decreased until the desired number of clusters is obtained.

For more information on these algorithms, refer to the documentation.

Parameters:	analytes (str) – The analyte(s) that the filter applies to. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. normalise (bool) – Whether or not to normalise the data to zero mean and unit variance. Reccomended if clustering based on more than 1 analyte. Uses sklearn.preprocessing.scale. method (str) – Which clustering algorithm to use (see above). include_time (bool) – Whether or not to include the Time variable in the clustering analysis. Useful if you’re looking for spatially continuous clusters in your data, i.e. this will identify each spot in your analysis as an individual cluster. sort (bool, str or array-like) – Whether or not to label the resulting clusters according to their contents. If used, the cluster with the lowest values will be labelled from 0, in order of increasing cluster mean value.analytes. The sorting rules depend on the value of ‘sort’, which can be the name of a single analyte (str), a list of several analyte names (array-like) or True (bool), to specify all analytes used to calcualte the cluster. min_data (int) – The minimum number of data points that should be considered by the filter. Default = 10. kwargs – Parameters passed to the clustering algorithm specified by method. Parameters** (DBSCAN) – -------------------- – bandwidth (str or float) – The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth. bin_seeding (bool) – Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation. - Means Parameters (K) – ------------------ – n_clusters (int) – The number of clusters expected in the data. Parameters – ----------------- – eps (float) – The minimum ‘distance’ points must be apart for them to be in the same cluster. Defaults to 0.3. Note: If the data are normalised (they should be for DBSCAN) this is in terms of total sample variance. Normalised data have a mean of 0 and a variance of 1. min_samples (int) – The minimum number of samples within distance eps required to be considered as an independent cluster. n_clusters – The number of clusters expected. If specified, eps will be incrementally reduced until the expected number of clusters is found. maxiter (int) – The maximum number of iterations DBSCAN will run.
Returns:
Return type:	None

filter_correlation(x_analyte, y_analyte, window=15, r_threshold=0.9, p_threshold=0.05, filt=True, recalc=False)[source]¶

Calculate correlation filter.

Parameters:	y_analyte (x_analyte,) – The names of the x and y analytes to correlate. window (int, None) – The rolling window used when calculating the correlation. r_threshold (float) – The correlation index above which to exclude data. Note: the absolute pearson R value is considered, so negative correlations below -r_threshold will also be excluded. p_threshold (float) – The significant level below which data are excluded. filt (bool) – Whether or not to apply existing filters to the data before calculating this filter. recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns:
Return type:	None

filter_exclude_downhole(threshold, filt=True)[source]¶

Exclude all points down-hole (after) the first excluded data.

Parameters:	threhold (int) – The minimum number of contiguous excluded data points that must exist before downhole exclusion occurs. file (valid filter string or bool) – Which filter to consider. If True, applies to currently active filters.

filter_gradient_threshold(analyte, win, threshold, recalc=True)[source]¶

Apply gradient threshold filter.

Generates threshold filters for the given analytes above and below the specified threshold.

Two filters are created with prefixes ‘_above’ and ‘_below’.: ‘_above’ keeps all the data above the threshold. ‘_below’ keeps all the data below the threshold.

i.e. to select data below the threshold value, you should turn the ‘_above’ filter off.

Parameters:	analyte (str) – Description of analyte. threshold (float) – Description of threshold. win (int) – Window used to calculate gradients (n points) recalc (bool) – Whether or not to re-calculate the gradients.
Returns:
Return type:	None

filter_new(name, filt_str)[source]¶

Make new filter from combination of other filters.

Parameters:	name (str) – The name of the new filter. Should be unique. filt_str (str) – A logical combination of partial strings which will create the new filter. For example, ‘Albelow & Mnbelow’ will combine all filters that partially match ‘Albelow’ with those that partially match ‘Mnbelow’ using the ‘AND’ logical operator.
Returns:
Return type:	None

filter_report(filt=None, analytes=None, savedir=None, nbin=5)[source]¶

Visualise effect of data filters.

Parameters:	filt (str) – Exact or partial name of filter to plot. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name will be plotted. Defaults to all filters. analyte (str) – Name of analyte to plot. save (str) – file path to save the plot
Returns:
Return type:	(fig, axes)

filter_threshold(analyte, threshold)[source]¶

Apply threshold filter.

Generates threshold filters for the given analytes above and below the specified threshold.

Two filters are created with prefixes ‘_above’ and ‘_below’.: ‘_above’ keeps all the data above the threshold. ‘_below’ keeps all the data below the threshold.

i.e. to select data below the threshold value, you should turn the ‘_above’ filter off.

Parameters:	analyte (TYPE) – Description of analyte. threshold (TYPE) – Description of threshold.
Returns:
Return type:	None

filter_trim(start=1, end=1, filt=True)[source]¶

Remove points from the start and end of filter regions.

Parameters:	end (start,) – The number of points to remove from the start and end of the specified filter. filt (valid filter string or bool) – Which filter to trim. If True, applies to currently active filters.

get_params()[source]¶

Returns paramters used to process data.

Returns:	dict of analysis parameters
Return type:	dict

gplot(analytes=None, win=5, figsize=[10, 4], ranges=False, focus_stage=None, ax=None)[source]¶

Plot analytes gradients as a function of Time.

Parameters:	analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes. win (int) – The window over which to calculate the rolling gradient. figsize (tuple) – size of final figure. ranges (bool) – show signal/background regions.
Returns:
Return type:	figure, axis

mkrngs()[source]¶

Transform boolean arrays into list of limit pairs.

Gets Time limits of signal/background boolean arrays and stores them as sigrng and bkgrng arrays. These arrays can be saved by ‘save_ranges’ in the analyse object.

optimisation_plot(overlay_alpha=0.5, **kwargs)[source]¶

Plot the result of signal_optimise.

signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.

Parameters:	d (latools.D object) – A latools data object. overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1. **kwargs – Passed to tplot

ratio(internal_standard=None)[source]¶

Divide all analytes by a specified internal_standard analyte.

Parameters:	internal_standard (str) – The analyte used as the internal_standard.
Returns:
Return type:	None

sample_stats(analytes=None, filt=True, stat_fns={}, eachtrace=True)[source]¶

Calculate sample statistics

Returns samples, analytes, and arrays of statistics of shape (samples, analytes). Statistics are calculated from the ‘focus’ data variable, so output depends on how the data have been processed.

Parameters:	analytes (array_like) – List of analytes to calculate the statistic on filt (bool or str) – The filter to apply to the data when calculating sample statistics. bool: True applies filter specified in filt.switches. str: logical string specifying a partucular filter stat_fns (dict) – Dict of {name: function} pairs. Functions that take a single array_like input, and return a single statistic. Function should be able to cope with NaN values. eachtrace (bool) – True: per - ablation statistics False: whole sample statistics
Returns:
Return type:	None

setfocus(focus)[source]¶

Set the ‘focus’ attribute of the data file.

The ‘focus’ attribute of the object points towards data from a particular stage of analysis. It is used to identify the ‘working stage’ of the data. Processing functions operate on the ‘focus’ stage, so if steps are done out of sequence, things will break.

Names of analysis stages:

‘rawdata’: raw data, loaded from csv file when object is initialised.
‘despiked’: despiked data.
‘signal’/’background’: isolated signal and background data, padded with np.nan. Created by self.separate, after signal and background regions have been identified by self.autorange.
‘bkgsub’: background subtracted data, created by self.bkg_correct
‘ratios’: element ratio data, created by self.ratio.
‘calibrated’: ratio data calibrated to standards, created by self.calibrate.

Parameters:	focus (str) – The name of the analysis stage desired.
Returns:
Return type:	None

signal_optimiser(analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, weights=None, filt=True, mode='minimise')[source]¶

Optimise data selection based on specified analytes.

Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.

Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.

The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.

All possible regions of the data that have at least min_points are considered.

For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.

Parameters:

d (latools.D object) – An latools data object.
analytes (str or array-like) – Which analytes to consider.
min_points (int) – The minimum number of contiguous points to consider.
threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number.
weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None.

tplot(analytes=None, figsize=[10, 4], scale='log', filt=None, ranges=False, stats=False, stat='nanmean', err='nanstd', focus_stage=None, err_envelope=False, ax=None)[source]¶

Plot analytes as a function of Time.

Parameters:	analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes. figsize (tuple) – size of final figure. scale (str or None) – ‘log’ = plot data on log scale filt (bool, str or dict) – False: plot unfiltered data. True: plot filtered data over unfiltered data. str: apply filter key to all analytes dict: apply key to each analyte in dict. Must contain all analytes plotted. Can use self.filt.keydict. ranges (bool) – show signal/background regions. stats (bool) – plot average and error of each trace, as specified by stat and err. stat (str) – average statistic to plot. err (str) – error statistic to plot.
Returns:
Return type:	figure, axis

Filtering¶

class latools.filtering.filt_obj.filt(size, analytes)[source]¶

Bases: object

Container for creating, storing and selecting data filters.

Parameters:	size (int) – The length that the filters need to be (should be the same as your data). analytes (array_like) – A list of the analytes measured in your data.

size¶: int – The length that the filters need to be (should be the same as your data).

analytes¶: array_like – A list of the analytes measured in your data.

components¶: dict – A dict containing each individual filter that has been created.

info¶: dict – A dict containing descriptive information about each filter in components.

params¶: dict – A dict containing the parameters used to create each filter, which can be passed directly to the corresponding filter function to recreate the filter.

switches¶: dict – A dict of boolean switches specifying which filters are active for each analyte.

keys¶: dict – A dict of logical strings specifying which filters are applied to each analyte.

sequence¶: dict – A numbered dict specifying what order the filters were applied in (for some filters, order matters).

n¶: int – The number of filters applied to the data.

add(name, filt, info='', params=(), setn=None)[source]¶

Add filter.

Parameters:	name (str) – filter name filt (array_like) – boolean filter array info (str) – informative description of the filter params (tuple) – parameters used to make the filter
Returns:
Return type:	None

clean()[source]¶: Remove unused filters.

clear()[source]¶: Clear all filters.

fuzzmatch(fuzzkey, multi=False)[source]¶

Identify a filter by fuzzy string matching.

Partial (‘fuzzy’) matching performed by fuzzywuzzy.fuzzy.ratio

Parameters:	fuzzkey (str) – A string that partially matches one filter name more than the others.
Returns:	The name of the most closely matched filter.
Return type:	str

get_components(key, analyte=None)[source]¶

Extract filter components for specific analyte(s).

Parameters:	key (str) – string present in one or more filter names. e.g. ‘Al27’ will return all filters with ‘Al27’ in their names. analyte (str) – name of analyte the filter is for
Returns:	boolean filter
Return type:	array-like

get_info()[source]¶: Get info for all filters.

grab_filt(filt, analyte=None)[source]¶

Flexible access to specific filter using any key format.

Parameters:	f (str, dict or bool) – either logical filter expression, dict of expressions, or a boolean analyte (str) – name of analyte the filter is for.
Returns:	boolean filter
Return type:	array_like

make(analyte)[source]¶

Make filter for specified analyte(s).

Filter specified in filt.switches.

Parameters:	analyte (str or array_like) – Name or list of names of analytes.
Returns:	boolean filter
Return type:	array_like

make_fromkey(key)[source]¶

Make filter from logical expression.

Takes a logical expression as an input, and returns a filter. Used for advanced filtering, where combinations of nested and/or filters are desired. Filter names must exactly match the names listed by print(filt).

Example: key = '(Filter_1 | Filter_2) & Filter_3' is equivalent to: (Filter_1 OR Filter_2) AND Filter_3 statements in parentheses are evaluated first.

Parameters:	key (str) – logical expression describing filter construction.
Returns:	boolean filter
Return type:	array_like

make_keydict(analyte=None)[source]¶

Make logical expressions describing the filter(s) for specified analyte(s).

Parameters:	analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes.
Returns:	containing the logical filter expression for each analyte.
Return type:	dict

off(analyte=None, filt=None)[source]¶

Turn off specified filter(s) for specified analyte(s).

Parameters:	analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes. filt (optional. int, list of int or str) – Number(s) or partial string that corresponds to filter name(s).
Returns:
Return type:	None

on(analyte=None, filt=None)[source]¶

Turn on specified filter(s) for specified analyte(s).

Parameters:	analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes. filt (optional. int, str or array_like) – Name/number or iterable names/numbers of filters.
Returns:
Return type:	None

remove(name=None, setn=None)[source]¶

Remove filter.

Parameters:	name (str) – name of the filter to remove setn (int or True) – int: number of set to remove True: remove all filters in set that ‘name’ belongs to
Returns:
Return type:	None

Functions for automatic selection optimisation.

latools.filtering.signal_optimiser.bayes_scale(s)[source]¶: Remove mean and divide by standard deviation, using bayes_kvm statistics.

latools.filtering.signal_optimiser.calc_window_mean_std(s, min_points, ind=None)[source]¶: Apply fn to all contiguous regions in s that have at least min_points.

latools.filtering.signal_optimiser.calc_windows(fn, s, min_points)[source]¶: Apply fn to all contiguous regions in s that have at least min_points.

latools.filtering.signal_optimiser.calculate_optimisation_stats(d, analytes, min_points, weights, ind, x_bias=0)[source]¶

latools.filtering.signal_optimiser.median_scaler(s)[source]¶: Remove median, divide by IQR.

latools.filtering.signal_optimiser.optimisation_plot(d, overlay_alpha=0.5, **kwargs)[source]¶

Plot the result of signal_optimise.

signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.

Parameters:	d (latools.D object) – A latools data object. overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1. **kwargs – Passed to tplot

latools.filtering.signal_optimiser.scale(s)[source]¶: Remove the mean, and divide by the standard deviation.

latools.filtering.signal_optimiser.scaler(s)¶: Remove median, divide by IQR.

latools.filtering.signal_optimiser.signal_optimiser(d, analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, weights=None, ind=None, mode='minimise')[source]¶

Optimise data selection based on specified analytes.

Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.

Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.

The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.

All possible regions of the data that have at least min_points are considered.

For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.

Parameters:	d (latools.D object) – An latools data object. analytes (str or array-like) – Which analytes to consider. min_points (int) – The minimum number of contiguous points to consider. threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number. threshood_mult (float or tuple) – A multiplier applied to the calculated threshold before use. If a tuple, the first value is applied to the mean threshold, and the second is applied to the standard deviation threshold. Reduce this to make data selection more stringent. x_bias (float) – If non-zero, a bias is applied to the calculated statistics to prefer the beginning (if > 0) or end (if < 0) of the signal. Should be between zero and 1. weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None. ind (boolean array) – A boolean array the same length as the data. Where false, data will not be included. mode (str) – Whether to ‘minimise’ or ‘maximise’ the concentration of the elements.
Returns:	dict, str
Return type:	optimisation result, error message

class latools.filtering.classifier_obj.classifier(analytes, sort_by=0)[source]¶

Bases: object

fit(data, method='kmeans', **kwargs)[source]¶

fit classifiers from large dataset.

Parameters:	data (dict) – A dict of data for clustering. Must contain items with the same name as analytes used for clustering. method (str) – A string defining the clustering method used. Can be: ’kmeans’ : K-Means clustering algorithm ’meanshift’ : Meanshift algorithm n_clusters (int) – K-Means only. The numebr of clusters to identify bandwidth (float) – Meanshift only. The bandwidth value used during clustering. If none, determined automatically. Note: the data are scaled before clutering, so this is not in the same units as the data. bin_seeding (bool) – Meanshift only. Whether or not to use ‘bin_seeding’. See documentation for sklearn.cluster.MeanShift. **kwargs – passed to sklearn.cluster.MeanShift.
Returns:
Return type:	list

fit_kmeans(data, n_clusters, **kwargs)[source]¶

Fit KMeans clustering algorithm to data.

Parameters:	data (array-like) – A dataset formatted by classifier.fitting_data. n_clusters (int) – The number of clusters in the data. **kwargs – passed to sklearn.cluster.KMeans.
Returns:
Return type:	Fitted sklearn.cluster.KMeans object.

fit_meanshift(data, bandwidth=None, bin_seeding=False, **kwargs)[source]¶

Fit MeanShift clustering algorithm to data.

Parameters:	data (array-like) – A dataset formatted by classifier.fitting_data. bandwidth (float) – The bandwidth value used during clustering. If none, determined automatically. Note: the data are scaled before clutering, so this is not in the same units as the data. bin_seeding (bool) – Whether or not to use ‘bin_seeding’. See documentation for sklearn.cluster.MeanShift. **kwargs – passed to sklearn.cluster.MeanShift.
Returns:
Return type:	Fitted sklearn.cluster.MeanShift object.

fitting_data(data)[source]¶

Function to format data for cluster fitting.

Parameters:	data (dict) – A dict of data, containing all elements of analytes as items.
Returns:
Return type:	A data array for initial cluster fitting.

format_data(data, scale=True)[source]¶

Function for converting a dict to an array suitable for sklearn.

Parameters:	data (dict) – A dict of data, containing all elements of analytes as items. scale (bool) – Whether or not to scale the data. Should always be True, unless used by classifier.fitting_data where a scaler hasn’t been created yet.
Returns:
Return type:	A data array suitable for use with sklearn.cluster.

map_clusters(size, sampled, clusters)[source]¶

Translate cluster identity back to original data size.

Parameters:

size (int) – size of original dataset
sampled (array-like) – integer array describing location of finite values in original data.
clusters (array-like) – integer array of cluster identities

Returns:

list of cluster identities the same length as original
data. Where original data are non-finite, returns -2.

predict(data)[source]¶

Label new data with cluster identities.

Parameters:	data (dict) – A data dict containing the same analytes used to fit the classifier. sort_by (str) – The name of an analyte used to sort the resulting clusters. If None, defaults to the first analyte used in fitting.
Returns:
Return type:	array of clusters the same length as the data.

sort_clusters(data, cs, sort_by)[source]¶

Sort clusters by the concentration of a particular analyte.

Parameters:	data (dict) – A dataset containing sort_by as a key. cs (array-like) – An array of clusters, the same length as values of data. sort_by (str) – analyte to sort the clusters by
Returns:
Return type:	array of clusters, sorted by mean value of sort_by analyte.

Configuration¶

Note: the entire config module is available at the top level (i.e. latools.config).

latools.helpers.config.change_default(config)[source]¶: Change the default configuration.

latools.helpers.config.copy_SRM_file(destination=None, config='DEFAULT')[source]¶

Creates a copy of the default SRM table at the specified location.

Parameters:

destination (str) – The save location for the SRM file. If no location specified, saves it as ‘LAtools_[config]_SRMTable.csv’ in the current working directory.
config (str) – It’s possible to set up different configurations with different SRM files. This specifies the name of the configuration that you want to copy the SRM file from. If not specified, the ‘DEFAULT’ configuration is used.

latools.helpers.config.create(config_name, srmfile=None, dataformat=None, base_on='DEFAULT', make_default=False)[source]¶

Adds a new configuration to latools.cfg.

Parameters:	config_name (str) – The name of the new configuration. This should be descriptive (e.g. UC Davis Foram Group) srmfile (str (optional)) – The location of the srm file used for calibration. dataformat (str (optional)) – The location of the dataformat definition to use. base_on (str) – The name of the existing configuration to base the new one on. If either srm_file or dataformat are not specified, the new config will copy this information from the base_on config. make_default (bool) – Whether or not to make the new configuration the default for future analyses. Default = False.
Returns:
Return type:	None

latools.helpers.config.delete(config)[source]¶

latools.helpers.config.get_dataformat_template(destination='./LAtools_dataformat_template.json')[source]¶: Copies a data format description JSON template to the specified location.

latools.helpers.config.locate()[source]¶: Prints and returns the location of the latools.cfg file.

latools.helpers.config.print_all()[source]¶: Prints all currently defined configurations.

latools.helpers.config.read_configuration(config='DEFAULT')[source]¶: Read LAtools configuration file, and return parameters as dict.

latools.helpers.config.read_latoolscfg()[source]¶

Reads configuration, returns a ConfigParser object.

Distinct from read_configuration, which returns a dict.

latools.helpers.config.test_dataformat(data_file, dataformat_file, name_mode='file_names')[source]¶

Test a data formatfile against a particular data file.

This goes through all the steps of data import printing out the results of each step, so you can see where the import fails.

Parameters:

data_file (str) – Path to data file, including extension.
dataformat (dict or str) – A dataformat dict, or path to file. See example below.
name_mode (str) – How to identyfy sample names. If ‘file_names’ uses the input name of the file, stripped of the extension. If ‘metadata_names’ uses the ‘name’ attribute of the ‘meta’ sub-dictionary in dataformat. If any other str, uses this str as the sample name.

Example

>>>
{'genfromtext_args': {'delimiter': ',',
                      'skip_header': 4},  # passed directly to np.genfromtxt
 'column_id': {'name_row': 3,  # which row contains the column names
               'delimiter': ',',  # delimeter between column names
               'timecolumn': 0,  # which column contains the 'time' variable
               'pattern': '([A-z]{1,2}[0-9]{1,3})'},  # a regex pattern which captures the column names
 'meta_regex': {  # a dict of (line_no: ([descriptors], [regexs])) pairs
                0: (['path'], '(.*)'),
                2: (['date', 'method'],  # MUST include date
                 '([A-Z][a-z]+ [0-9]+ [0-9]{4}[ ]+[0-9:]+ [amp]+).* ([A-z0-9]+\.m)')
               }
}

Returns:	sample, analytes, data, meta
Return type:	tuple

latools.helpers.config.update(config, parameter, new_value)[source]¶

Preprocessing¶

latools.preprocessing.split.by_regex(file, outdir=None, split_pattern=None, global_header_rows=0, fname_pattern=None, trim_tail_lines=0, trim_head_lines=0)[source]¶

Split one long analysis file into multiple smaller ones.

Parameters:	file (str) – The path to the file you want to split. outdir (str) – The directory to save the split files to. If None, files are saved to a new directory called ‘split’, which is created inside the data directory. split_pattern (regex string) – A regular expression that will match lines in the file that mark the start of a new section. Does not have to match the whole line, but must provide a positive match to the lines containing the pattern. global_header_rows (int) – How many rows at the start of the file to include in each new sub-file. fname_pattern (regex string) – A regular expression that identifies a new file name in the lines identified by split_pattern. If none, files will be called ‘noname_N’. The extension of the main file will be used for all sub-files. trim_head_lines (int) – If greater than zero, this many lines are removed from the start of each segment trim_tail_lines (int) – If greater than zero, this many lines are removed from the end of each segment
Returns:	Path to new directory
Return type:	str

latools.preprocessing.split.long_file(data_file, dataformat, sample_list, savedir=None, srm_id=None, **autorange_args)[source]¶

Helpers¶

latools.processes.data_read.read_data(data_file, dataformat, name_mode)[source]¶

Load data_file described by a dataformat dict.

Parameters:

data_file (str) – Path to data file, including extension.
dataformat (dict) – A dataformat dict, see example below.
name_mode (str) – How to identyfy sample names. If ‘file_names’ uses the input name of the file, stripped of the extension. If ‘metadata_names’ uses the ‘name’ attribute of the ‘meta’ sub-dictionary in dataformat. If any other str, uses this str as the sample name.

Example

>>>
{'genfromtext_args': {'delimiter': ',',
                      'skip_header': 4},  # passed directly to np.genfromtxt
 'column_id': {'name_row': 3,  # which row contains the column names
               'delimiter': ',',  # delimeter between column names
               'timecolumn': 0,  # which column contains the 'time' variable
               'pattern': '([A-z]{1,2}[0-9]{1,3})'},  # a regex pattern which captures the column names
 'meta_regex': {  # a dict of (line_no: ([descriptors], [regexs])) pairs
                0: (['path'], '(.*)'),
                2: (['date', 'method'],  # MUST include date
                 '([A-Z][a-z]+ [0-9]+ [0-9]{4}[ ]+[0-9:]+ [amp]+).* ([A-z0-9]+\.m)')
               }
}

Returns:	sample, analytes, data, meta
Return type:	tuple

latools.processes.despiking.expdecay_despike(sig, expdecay_coef, tstep, maxiter=3)[source]¶

Apply exponential decay filter to remove physically impossible data based on instrumental washout.

The filter is re-applied until no more points are removed, or maxiter is reached.

Parameters:	exponent (float) – Exponent used in filter tstep (float) – The time increment between data points. maxiter (int) – The maximum number of times the filter should be applied.
Returns:
Return type:	None

latools.processes.despiking.noise_despike(sig, win=3, nlim=24.0, maxiter=4)[source]¶

Apply standard deviation filter to remove anomalous values.

Parameters:	win (int) – The window used to calculate rolling statistics. nlim (float) – The number of standard deviations above the rolling mean above which data are considered outliers.
Returns:
Return type:	None

latools.processes.signal_id.autorange(t, sig, gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), nbin=10, transform='log', thresh=None)[source]¶

Automatically separates signal and background in an on/off data stream.

Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.

Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.

Parameters:	t (array-like) – Independent variable (usually time). sig (array-like) – Dependent signal, with distinctive ‘on’ and ‘off’ regions. gwin (int) – The window used for calculating first derivative. Defaults to 7. swin (int) – The window used for signal smoothing. If None, `gwin // 2`. win (int) – The width (c +/- win) of the transition data subsets. Defaults to 20. and off_mult (on_mult) – Control the width of the excluded transition regions, which is defined relative to the peak full-width-half-maximum (FWHM) of the transition gradient. The region n * FHWM below the transition, and m * FWHM above the tranision will be excluded, where (n, m) are specified in on_mult and off_mult. on_mult and off_mult apply to the off-on and on-off transitions, respectively. Defaults to (1.5, 1) and (1, 1.5). transform (str) – How to transform the data. Default is ‘log’.
Returns:	fbkg, fsig, ftrn, failed – where fbkg, fsig and ftrn are boolean arrays the same length as sig, that are True where sig is background, signal and transition, respecively. failed contains a list of transition positions where gaussian fitting has failed.
Return type:	tuple

latools.processes.signal_id.autorange_components(t, sig, transform='log', gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), thresh=None)[source]¶

Returns the components underlying the autorange algorithm.

Returns:

t (array-like) – Time axis (independent variable)
sig (array-like) – Raw signal (dependent variable)
sigs (array-like) – Smoothed signal (swin)
tsig (array-like) – Transformed raw signal (transform)
tsigs (array-like) – Transformed smoothed signal (transform, swin)
kde_x (array-like) – kernel density estimate of smoothed signal.
yd (array-like) – bins of kernel density estimator.
g (array-like) – gradient of smoothed signal (swin, gwin)
trans (dict) – per-transition data.
thresh (float) – threshold identified from kernel density plot

Helper functions used by multiple parts of LAtools.

class latools.helpers.helpers.Bunch(*args, **kwds)[source]¶

Bases: dict

clear() → None. Remove all items from D.¶

copy() → a shallow copy of D¶

fromkeys()¶: Returns a new dict with keys from iterable and values equal to value.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶

items() → a set-like object providing a view on D's items¶

keys() → a set-like object providing a view on D's keys¶

pop(k[, d]) → v, remove specified key and return the corresponding value.¶: If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a¶: 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶

update([E, ]**F) → None. Update D from dict/iterable E and F.¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values¶

latools.helpers.helpers.analyte_2_massname(s)[source]¶

Converts analytes in format ‘Al27’ to ‘27Al’.

Parameters:	s (str) – of format [0-9]{1,3}[A-z]{1,3}
Returns:	Name in format [A-z]{1,3}[0-9]{1,3}
Return type:	str

latools.helpers.helpers.analyte_2_namemass(s)[source]¶

Converts analytes in format ‘27Al’ to ‘Al27’.

Parameters:	s (str) – of format [A-z]{1,3}[0-9]{1,3}
Returns:	Name in format [0-9]{1,3}[A-z]{1,3}
Return type:	str

latools.helpers.helpers.bool_2_indices(a)[source]¶: Convert boolean array into a 2D array of (start, stop) pairs.

latools.helpers.helpers.calc_grads(x, dat, keys=None, win=5)[source]¶

Calculate gradients of values in dat.

Parameters:	x (array like) – Independent variable for items in dat. dat (dict) – {key: dependent_variable} pairs keys (str or array-like) – Which keys in dict to calculate the gradient of. win (int) – The side of the rolling window for gradient calculation
Returns:
Return type:	dict of gradients

latools.helpers.helpers.collate_data(in_dir, extension='.csv', out_dir=None)[source]¶

Copy all csvs in nested directroy to single directory.

Function to copy all csvs from a directory, and place them in a new directory.

Parameters:	in_dir (str) – Input directory containing csv files in subfolders extension (str) – The extension that identifies your data files. Defaults to ‘.csv’. out_dir (str) – Destination directory
Returns:
Return type:	None

latools.helpers.helpers.enumerate_bool(bool_array, nstart=0)[source]¶

Consecutively numbers contiguous booleans in array.

i.e. a boolean sequence, and resulting numbering T F T T T F T F F F T T F 0-1 1 1 - 2 —3 3 -

where ‘ - ‘

Parameters:	bool_array (array_like) – Array of booleans. nstart (int) – The number of the first boolean group.

latools.helpers.helpers.fastgrad(a, win=11)[source]¶

Returns rolling - window gradient of a.

Function to efficiently calculate the rolling gradient of a numpy array using ‘stride_tricks’ to split up a 1D array into an ndarray of sub - sections of the original array, of dimensions [len(a) - win, win].

Parameters:	a (array_like) – The 1D array to calculate the rolling gradient of. win (int) – The width of the rolling window.
Returns:	Gradient of a, assuming as constant integer x - scale.
Return type:	array_like

latools.helpers.helpers.fastsmooth(a, win=11)[source]¶

Returns rolling - window smooth of a.

Function to efficiently calculate the rolling mean of a numpy array using ‘stride_tricks’ to split up a 1D array into an ndarray of sub - sections of the original array, of dimensions [len(a) - win, win].

Parameters:	a (array_like) – The 1D array to calculate the rolling gradient of. win (int) – The width of the rolling window.
Returns:	Gradient of a, assuming as constant integer x - scale.
Return type:	array_like

latools.helpers.helpers.findmins(x, y)[source]¶

Function to find local minima.

Parameters:	y (x,) – 1D arrays of the independent (x) and dependent (y) variables.
Returns:	Array of points in x where y has a local minimum.
Return type:	array_like

latools.helpers.helpers.get_date(datetime, time_format=None)[source]¶

Return a datetime oject from a string, with optional time format.

Parameters:	datetime (str) – Date-time as string in any sensible format. time_format (datetime str (optional)) – String describing the datetime format. If missing uses dateutil.parser to guess time format.

latools.helpers.helpers.get_example_data(destination_dir)[source]¶

latools.helpers.helpers.get_total_n_points(d)[source]¶

Returns the total number of data points in values of dict.

d : dict

latools.helpers.helpers.get_total_time_span(d)[source]¶: Returns total length of analysis.

latools.helpers.helpers.pretty_element(s)[source]¶

Returns formatted element name.

Parameters:	s (str) – of format [A-Z][a-z]?[0-9]+
Returns:	LaTeX formatted string with superscript numbers.
Return type:	str

latools.helpers.helpers.rangecalc(xs, pad=0.05)[source]¶

latools.helpers.helpers.rolling_window(a, window, pad=None)[source]¶

Returns (win, len(a)) rolling - window array of data.

Parameters:	a (array_like) – Array to calculate the rolling window of window (int) – Description of window. pad (same as dtype(a)) – Description of pad.
Returns:	An array of shape (n, window), where n is either len(a) - window if pad is None, or len(a) if pad is not None.
Return type:	array_like

latools.helpers.helpers.stack_keys(ddict, keys, extra=None)[source]¶

Combine elements of ddict into an array of shape (len(ddict[key]), len(keys)).

Useful for preparing data for sklearn.

Parameters:	ddict (dict) – A dict containing arrays or lists to be stacked. Must be of equal length. keys (list or str) – The keys of dict to stack. Must be present in ddict. extra (list (optional)) – A list of additional arrays to stack. Elements of extra must be the same length as arrays in ddict. Extras are inserted as the first columns of output.

latools.helpers.helpers.tuples_2_bool(tuples, x)[source]¶

Generate boolean array from list of limit tuples.

Parameters:	tuples (array_like) – [2, n] array of (start, end) values x (array_like) – x scale the tuples are mapped to
Returns:	boolean array, True where x is between each pair of tuples.
Return type:	array_like

class latools.helpers.helpers.un_interp1d(x, y, **kwargs)[source]¶

Bases: object

object for handling interpolation of values with uncertainties.

new(xn)[source]¶

new_nom(xn)[source]¶

new_std(xn)[source]¶

latools.helpers.helpers.unitpicker(a, llim=0.1, denominator=None, focus_stage=None)[source]¶

Determines the most appropriate plotting unit for data.

Parameters:	a (float or array-like) – number to optimise. If array like, the 25% quantile is optimised. llim (float) – minimum allowable value in scaled data.
Returns:	(multiplier, unit)
Return type:	(float, str)

latools.helpers.chemistry.calc_M(molecule)[source]¶

Returns molecular weight of molecule.

Where molecule is in standard chemical notation, e.g. ‘CO2’, ‘HCO3’ or B(OH)4

Returns:	molecular_weight
Return type:	float

latools.helpers.chemistry.elements(all_isotopes=True)[source]¶

Loads a DataFrame of all elements and isotopes.

Scraped from https://www.webelements.com/

Returns:
Return type:	pandas DataFrame with columns (element, atomic_number, isotope, atomic_weight, percent)

latools.helpers.chemistry.to_mass_fraction(molar_ratio, massfrac_denominator, numerator_mass, denominator_mass)[source]¶

Converts per-mass concentrations to molar elemental ratios.

Be careful with units.

Parameters:	molar_ratio (float or array-like) – The molar ratio of elements. massfrac_denominator (float or array-like) – The mass fraction of the denominator element denominator_mass (numerator_mass,) – The atomic mass of the numerator and denominator.
Returns:	float or array-like
Return type:	The mass fraction of the numerator element.

latools.helpers.chemistry.to_molar_ratio(massfrac_numerator, massfrac_denominator, numerator_mass, denominator_mass)[source]¶

Converts per-mass concentrations to molar elemental ratios.

Be careful with units.

Parameters:	denominator_mass (numerator_mass,) – The atomic mass of the numerator and denominator. massfrac_denominator (massfrac_numerator,) – The per-mass fraction of the numnerator and denominator.
Returns:	float or array-like
Return type:	The molar ratio of elements in the material

latools.helpers.stat_fns.H15_mean(x)[source]¶

Calculate the Huber (H15) Robust mean of x.

For details, see:: http://www.cscjp.co.jp/fera/document/ANALYSTVol114Decpgs1693-97_1989.pdf http://www.rsc.org/images/robust-statistics-technical-brief-6_tcm18-214850.pdf

latools.helpers.stat_fns.H15_se(x)[source]¶

Calculate the Huber (H15) Robust standard deviation of x.

For details, see:: http://www.cscjp.co.jp/fera/document/ANALYSTVol114Decpgs1693-97_1989.pdf http://www.rsc.org/images/robust-statistics-technical-brief-6_tcm18-214850.pdf

latools.helpers.stat_fns.H15_std(x)[source]¶

Calculate the Huber (H15) Robust standard deviation of x.

For details, see:: http://www.cscjp.co.jp/fera/document/ANALYSTVol114Decpgs1693-97_1989.pdf http://www.rsc.org/images/robust-statistics-technical-brief-6_tcm18-214850.pdf

latools.helpers.stat_fns.R2calc(meas, model, force_zero=False)[source]¶

latools.helpers.stat_fns.gauss(x, *p)[source]¶

Gaussian function.

Parameters:	x (array_like) – Independent variable. p (parameters unpacked to A, mu, sigma*) – A = amplitude, mu = centre, sigma = width
Returns:	gaussian descriped by *p.
Return type:	array_like

latools.helpers.stat_fns.gauss_weighted_stats(x, yarray, x_new, fwhm)[source]¶

Calculate gaussian weigted moving mean, SD and SE.

Parameters:	x (array-like) – The independent variable yarray ((n,m) array) – Where n = x.size, and m is the number of dependent variables to smooth. x_new (array-like) – The new x-scale to interpolate the data fwhm (int) – FWHM of the gaussian kernel.
Returns:	(mean, std, se)
Return type:	tuple

latools.helpers.stat_fns.nan_pearsonr(x, y)[source]¶

latools.helpers.stat_fns.nominal_values(a)[source]¶

latools.helpers.stat_fns.std_devs(a)[source]¶

latools.helpers.stat_fns.stderr(a)[source]¶: Calculate the standard error of a.

latools.helpers.stat_fns.unpack_uncertainties(uarray)[source]¶

Convenience function to unpack nominal values and uncertainties from an uncertainties.uarray.

Returns:	(nominal_values, std_devs)

latools.helpers.plot.autorange_plot(t, sig, gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), nbin=10, thresh=None)[source]¶

Function for visualising the autorange mechanism.

Parameters:	t (array-like) – Independent variable (usually time). sig (array-like) – Dependent signal, with distinctive ‘on’ and ‘off’ regions. gwin (int) – The window used for calculating first derivative. Defaults to 7. swin (int) – The window ised for signal smoothing. If None, gwin // 2. win (int) – The width (c +/- win) of the transition data subsets. Defaults to 20. and off_mult (on_mult) – Control the width of the excluded transition regions, which is defined relative to the peak full-width-half-maximum (FWHM) of the transition gradient. The region n * FHWM below the transition, and m * FWHM above the tranision will be excluded, where (n, m) are specified in on_mult and off_mult. on_mult and off_mult apply to the off-on and on-off transitions, respectively. Defaults to (1.5, 1) and (1, 1.5). nbin (ind) – Used to calculate the number of bins in the data histogram. bins = len(sig) // nbin
Returns:
Return type:	fig, axes

latools.helpers.plot.calc_nrow(n, ncol)[source]¶

latools.helpers.plot.calibration_drift_plot(self, analytes=None, ncol=3, save=True)[source]¶

Plot calibration slopes through the entire session.

Parameters:	self (latools.analyse) – Analyse object, containing analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. ncol (int) – Number of columns of plots save (bool) – Whether or not to save the plot.
Returns:
Return type:	(fig, axes)

latools.helpers.plot.calibration_plot(self, analytes=None, datarange=True, loglog=False, ncol=3, save=True)[source]¶

Plot the calibration lines between measured and known SRM values.

Parameters:	analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. datarange (boolean) – Whether or not to show the distribution of the measured data alongside the calibration curve. loglog (boolean) – Whether or not to plot the data on a log - log scale. This is useful if you have two low standards very close together, and want to check whether your data are between them, or below them.
Returns:
Return type:	(fig, axes)

latools.helpers.plot.correlation_plot(self, corr=None)[source]¶

latools.helpers.plot.crossplot(dat, keys=None, lognorm=True, bins=25, figsize=(12, 12), colourful=True, focus_stage=None, denominator=None, mode='hist2d', cmap=None, **kwargs)[source]¶

Plot analytes against each other.

The number of plots is n**2 - n, where n = len(keys).

Parameters:	keys (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes. lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram. bins (int) – The number of bins in the 2D histogram.
Returns:
Return type:	(fig, axes)

latools.helpers.plot.filter_report(Data, filt=None, analytes=None, savedir=None, nbin=5)[source]¶

Visualise effect of data filters.

Parameters:	filt (str) – Exact or partial name of filter to plot. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name will be plotted. Defaults to all filters. analyte (str) – Name of analyte to plot. save (str) – file path to save the plot
Returns:
Return type:	(fig, axes)

latools.helpers.plot.gplot(self, analytes=None, win=25, figsize=[10, 4], ranges=False, focus_stage=None, ax=None, recalc=True)[source]¶

Plot analytes gradients as a function of Time.

Parameters:	analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes. win (int) – The window over which to calculate the rolling gradient. figsize (tuple) – size of final figure. ranges (bool) – show signal/background regions.
Returns:
Return type:	figure, axis

latools.helpers.plot.histograms(dat, keys=None, bins=25, logy=False, cmap=None, ncol=4)[source]¶

Plot histograms of all items in dat.

Parameters:	dat (dict) – Data in {key: array} pairs. keys (arra-like) – The keys in dat that you want to plot. If None, all are plotted. bins (int) – The number of bins in each histogram (default = 25) logy (bool) – If true, y axis is a log scale. cmap (dict) – The colours that the different items should be. If None, all are grey.
Returns:
Return type:	fig, axes

latools.helpers.plot.tplot(self, analytes=None, figsize=[10, 4], scale='log', filt=None, ranges=False, stats=False, stat='nanmean', err='nanstd', focus_stage=None, err_envelope=False, ax=None)[source]¶

Plot analytes as a function of Time.

Parameters:	analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes. figsize (tuple) – size of final figure. scale (str or None) – ‘log’ = plot data on log scale filt (bool, str or dict) – False: plot unfiltered data. True: plot filtered data over unfiltered data. str: apply filter key to all analytes dict: apply key to each analyte in dict. Must contain all analytes plotted. Can use self.filt.keydict. ranges (bool) – show signal/background regions. stats (bool) – plot average and error of each trace, as specified by stat and err. stat (str) – average statistic to plot. err (str) – error statistic to plot.
Returns:
Return type:	figure, axis