LAtools Documentation¶
latools.analyse object¶
-
class
latools.latools.
analyse
(data_folder, errorhunt=False, config='DEFAULT', dataformat=None, extension='.csv', srm_identifier='STD', cmap=None, time_format=None, internal_standard='Ca43', names='file_names', srm_file=None, pbar=None)[source]¶ Bases:
object
For processing and analysing whole LA - ICPMS datasets.
Parameters: - data_folder (str) – The path to a directory containing multiple data files.
- errorhunt (bool) – If True, latools prints the name of each file before it imports the data. This is useful for working out which data file is causing the import to fail.
- config (str) – The name of the configuration to use for the analysis. This determines which configuration set from the latools.cfg file is used, and overrides the default configuration setup. You might sepcify this if your lab routinely uses two different instruments.
- dataformat (str or dict) – Either a path to a data format file, or a dataformat dict. See documentation for more details.
- extension (str) – The file extension of your data files. Defaults to ‘.csv’.
- srm_identifier (str) – A string used to separate samples and standards. srm_identifier must be present in all standard measurements. Defaults to ‘STD’.
- cmap (dict) – A dictionary of {analyte: colour} pairs. Colour can be any valid matplotlib colour string, RGB or RGBA sequence, or hex string.
- time_format (str) – A regex string identifying the time format, used by pandas when created a universal time scale. If unspecified (None), pandas attempts to infer the time format, but in some cases this might not work.
- internal_standard (str) – The name of the analyte used as an internal standard throughout analysis.
- names (str) –
- ‘file_names’ : use the file names as labels (default)
- ’metadata_names’ : used the ‘names’ attribute of metadata as the name anything else : use numbers.
-
folder
¶ str – Path to the directory containing the data files, as specified by data_folder.
-
dirname
¶ str – The name of the directory containing the data files, without the entire path.
-
files
¶ array_like – A list of all files in folder.
-
param_dir
¶ str – The directory where parameters are stored.
-
report_dir
¶ str – The directory where plots are saved.
-
data
¶ dict – A dict of latools.D data objects, labelled by sample name.
-
samples
¶ array_like – A list of samples.
-
analytes
¶ array_like – A list of analytes measured.
-
stds
¶ array_like – A list of the latools.D objects containing hte SRM data. These must contain srm_identifier in the file name.
-
srm_identifier
¶ str – A string present in the file names of all standards.
-
cmaps
¶ dict – An analyte - specific colour map, used for plotting.
-
apply_classifier
(name, samples=None, subset=None)[source]¶ Apply a clustering classifier based on all samples, or a subset.
Parameters: Returns: name
Return type:
-
autorange
(analyte='total_counts', gwin=5, swin=3, win=20, on_mult=[1.0, 1.5], off_mult=[1.5, 1], transform='log', ploterrs=True, focus_stage='despiked')[source]¶ Automatically separates signal and background data regions.
Automatically detect signal and background regions in the laser data, based on the behaviour of a single analyte. The analyte used should be abundant and homogenous in the sample.
Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.
Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.
Parameters: - analyte (str) – The analyte that autorange should consider. For best results, choose an analyte that is present homogeneously in high concentrations. This can also be ‘total_counts’ to use the sum of all analytes.
- gwin (int) – The smoothing window used for calculating the first derivative. Must be odd.
- win (int) – Determines the width (c +/- win) of the transition data subsets.
- smwin (int) – The smoothing window used for calculating the second derivative. Must be odd.
- conf (float) – The proportional intensity of the fitted gaussian tails that determines the transition width cutoff (lower = wider transition regions excluded).
- trans_mult (array_like, len=2) – Multiples of the peak FWHM to add to the transition cutoffs, e.g. if the transitions consistently leave some bad data proceeding the transition, set trans_mult to [0, 0.5] to ad 0.5 * the FWHM to the right hand side of the limit.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’, or rawdata’ if not despiked. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Returns: - Outputs added as instance attributes. Returns None.
- bkg, sig, trn (iterable, bool) – Boolean arrays identifying background, signal and transision regions
- bkgrng, sigrng and trnrng (iterable) – (min, max) pairs identifying the boundaries of contiguous True regions in the boolean arrays.
-
basic_processing
(noise_despiker=True, despike_win=3, despike_nlim=12.0, despike_maxiter=4, autorange_analyte='total_counts', autorange_gwin=5, autorange_swin=3, autorange_win=20, autorange_on_mult=[1.0, 1.5], autorange_off_mult=[1.5, 1], autorange_transform='log', bkg_weight_fwhm=300.0, bkg_n_min=20, bkg_n_max=None, bkg_cstep=None, bkg_filter=False, bkg_f_win=7, bkg_f_n_lim=3, bkg_errtype='stderr', calib_drift_correct=True, calib_srms_used=['NIST610', 'NIST612', 'NIST614'], calib_zero_intercept=True, calib_n_min=10, plots=True)[source]¶
-
bkg_calc_interp1d
(analytes=None, kind=1, n_min=10, n_max=None, cstep=None, bkg_filter=False, f_win=7, f_n_lim=3, focus_stage='despiked')[source]¶ Background calculation using a 1D interpolation.
scipy.interpolate.interp1D is used for interpolation.
Parameters: - analytes (str or iterable) – Which analyte or analytes to calculate.
- kind (str or int) – Integer specifying the order of the spline interpolation used, or string specifying a type of interpolation. Passed to scipy.interpolate.interp1D
- n_min (int) – Background regions with fewer than n_min points will not be included in the fit.
- cstep (float or None) – The interval between calculated background points.
- filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour:
- f_win (int) – The size of the rolling window
- f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
-
bkg_calc_weightedmean
(analytes=None, weight_fwhm=None, n_min=20, n_max=None, cstep=None, bkg_filter=False, f_win=7, f_n_lim=3, focus_stage='despiked')[source]¶ Background calculation using a gaussian weighted mean.
Parameters: - analytes (str or iterable) – Which analyte or analytes to calculate.
- weight_fwhm (float) – The full-width-at-half-maximum of the gaussian used to calculate the weighted average.
- n_min (int) – Background regions with fewer than n_min points will not be included in the fit.
- cstep (float or None) – The interval between calculated background points.
- filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour:
- f_win (int) – The size of the rolling window
- f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
-
bkg_plot
(analytes=None, figsize=None, yscale='log', ylim=None, err='stderr', save=True)[source]¶ Plot the calculated background.
Parameters: - analytes (str or iterable) – Which analyte(s) to plot.
- figsize (tuple) – The (width, height) of the figure, in inches. If None, calculated based on number of samples.
- yscale (str) – ‘log’ (default) or ‘linear’.
- ylim (tuple) – Manually specify the y scale.
- err (str) – What type of error to plot. Default is stderr.
- save (bool) – If True, figure is saved.
Returns: fig, ax
Return type: matplotlib.figure, matplotlib.axes
-
bkg_subtract
(analytes=None, errtype='stderr', focus_stage='despiked')[source]¶ Subtract calculated background from data.
Must run bkg_calc first!
Parameters: - analytes (str or iterable) – Which analyte(s) to subtract.
- errtype (str) – Which type of error to propagate. default is ‘stderr’.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
-
calibrate
(analytes=None, drift_correct=True, srms_used=['NIST610', 'NIST612', 'NIST614'], zero_intercept=True, n_min=10, reload_srm_database=False)[source]¶ Calibrates the data to measured SRM values.
Assumes that y intercept is zero.
Parameters: - analytes (str or iterable) – Which analytes you’d like to calibrate. Defaults to all.
- drift_correct (bool) – Whether to pool all SRM measurements into a single calibration, or vary the calibration through the run, interpolating coefficients between measured SRMs.
- srms_used (str or iterable) – Which SRMs have been measured. Must match names given in SRM data file exactly.
- n_min (int) – The minimum number of data points an SRM measurement must have to be included.
Returns: Return type:
-
correct_spectral_interference
(target_analyte, source_analyte, f)[source]¶ Correct spectral interference.
Subtract interference counts from target_analyte, based on the intensity of a source_analayte and a known fractional contribution (f).
Correction takes the form: target_analyte -= source_analyte * f
Only operates on background-corrected data (‘bkgsub’). To undo a correction, rerun self.bkg_subtract().
Example
To correct 44Ca+ for an 88Sr++ interference, where both 43.5 and 44 Da peaks are known: f = abundance(88Sr) / (abundance(87Sr)
counts(44Ca) = counts(44 Da) - counts(43.5 Da) * f
Parameters: Returns: Return type:
-
correlation_plots
(x_analyte, y_analyte, window=15, filt=True, recalc=False, samples=None, subset=None, outdir=None)[source]¶ Plot the local correlation between two analytes.
Parameters: - y_analyte (x_analyte,) – The names of the x and y analytes to correlate.
- window (int, None) – The rolling window used when calculating the correlation.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns: Return type:
-
crossplot
(analytes=None, lognorm=True, bins=25, filt=False, samples=None, subset=None, figsize=(12, 12), save=False, colourful=True, mode='hist2d', **kwargs)[source]¶ Plot analytes against each other.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram.
- bins (int) – The number of bins in the 2D histogram.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- figsize (tuple) – Figure size (width, height) in inches.
- save (bool or str) – If True, plot is saves as ‘crossplot.png’, if str plot is saves as str.
- colourful (bool) – Whether or not the plot should be colourful :).
- mode (str) – ‘hist2d’ (default) or ‘scatter’
Returns: Return type: (fig, axes)
-
crossplot_filters
(filter_string, analytes=None, samples=None, subset=None, filt=None)[source]¶ Plot the results of a group of filters in a crossplot.
Parameters: Returns: Return type: fig, axes objects
-
despike
(expdecay_despiker=False, exponent=None, noise_despiker=True, win=3, nlim=12.0, exponentplot=False, maxiter=4, autorange_kwargs={}, focus_stage='rawdata')[source]¶ Despikes data with exponential decay and noise filters.
Parameters: - expdecay_despiker (bool) – Whether or not to apply the exponential decay filter.
- exponent (None or float) – The exponent for the exponential decay filter. If None, it is determined automatically using find_expocoef.
- tstep (None or float) – The timeinterval between measurements. If None, it is determined automatically from the Time variable.
- noise_despiker (bool) – Whether or not to apply the standard deviation spike filter.
- win (int) – The rolling window over which the spike filter calculates the trace statistics.
- nlim (float) – The number of standard deviations above the rolling mean that data are excluded.
- exponentplot (bool) – Whether or not to show a plot of the automatically determined exponential decay exponent.
- maxiter (int) – The max number of times that the fitler is applied.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘rawdata’. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Returns: Return type:
-
export_traces
(outdir=None, focus_stage=None, analytes=None, samples=None, subset='All_Analyses', filt=False, zip_archive=False)[source]¶ Function to export raw data.
Parameters: - outdir (str) – directory to save toe traces. Defaults to ‘main-dir-name_export’.
- focus_stage (str) –
The name of the analysis stage to export.
- ’rawdata’: raw data, loaded from csv file.
- ’despiked’: despiked data.
- ’signal’/’background’: isolated signal and background data. Created by self.separate, after signal and background regions have been identified by self.autorange.
- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Defaults to the most recent stage of analysis.
- analytes (str or array - like) – Either a single analyte, or list of analytes to export. Defaults to all analytes.
- samples (str or array - like) – Either a single sample name, or list of samples to export. Defaults to all samples.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
-
filter_clustering
(analytes, filt=False, normalise=True, method='kmeans', include_time=False, samples=None, sort=True, subset=None, level='sample', min_data=10, **kwargs)[source]¶ Applies an n - dimensional clustering filter to the data.
Parameters: - analytes (str) – The analyte(s) that the filter applies to.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- normalise (bool) – Whether or not to normalise the data to zero mean and unit variance. Reccomended if clustering based on more than 1 analyte. Uses sklearn.preprocessing.scale.
- method (str) –
Which clustering algorithm to use:
- ’meanshift’: The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation.
- ’kmeans’: The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters.
- level (str) – Whether to conduct the clustering analysis at the ‘sample’ or ‘population’ level.
- include_time (bool) – Whether or not to include the Time variable in the clustering analysis. Useful if you’re looking for spatially continuous clusters in your data, i.e. this will identify each spot in your analysis as an individual cluster.
- samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- sort (bool) – Whether or not you want the cluster labels to be sorted by the mean magnitude of the signals they are based on (0 = lowest)
- min_data (int) – The minimum number of data points that should be considered by the filter. Default = 10.
- **kwargs – Parameters passed to the clustering algorithm specified by method.
- Parameters (K-Means) –
- bandwidth : str or float
- The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth.
- bin_seeding : bool
- Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation.
- Parameters –
- n_clusters : int
- The number of clusters expected in the data.
Returns: Return type:
-
filter_correlation
(x_analyte, y_analyte, window=None, r_threshold=0.9, p_threshold=0.05, filt=True, samples=None, subset=None)[source]¶ Applies a correlation filter to the data.
Calculates a rolling correlation between every window points of two analytes, and excludes data where their Pearson’s R value is above r_threshold and statistically significant.
Data will be excluded where their absolute R value is greater than r_threshold AND the p - value associated with the correlation is less than p_threshold. i.e. only correlations that are statistically significant are considered.
Parameters: - y_analyte (x_analyte,) – The names of the x and y analytes to correlate.
- window (int, None) – The rolling window used when calculating the correlation.
- r_threshold (float) – The correlation index above which to exclude data. Note: the absolute pearson R value is considered, so negative correlations below -r_threshold will also be excluded.
- p_threshold (float) – The significant level below which data are excluded.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
Returns: Return type:
-
filter_defragment
(threshold, mode='include', filt=True, samples=None, subset=None)[source]¶ Remove ‘fragments’ from the calculated filter
Parameters: - threshold (int) – Contiguous data regions that contain this number or fewer points are considered ‘fragments’
- mode (str) – Specifies wither to ‘include’ or ‘exclude’ the identified fragments.
- filt (bool or filt string) – Which filter to apply the defragmenter to. Defaults to True
- samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns: Return type:
-
filter_effect
(analytes=None, stats=['mean', 'std'], filt=True)[source]¶ Quantify the effects of the active filters.
Parameters: Returns: Contains statistics calculated for filtered and unfiltered data, and the filtered/unfiltered ratio.
Return type: pandas.DataFrame
-
filter_exclude_downhole
(threshold, filt=True, samples=None, subset=None)[source]¶ Exclude all points down-hole (after) the first excluded data.
Parameters:
-
filter_gradient_threshold
(analyte, threshold, win=15, samples=None, subset=None)[source]¶ Calculate a gradient threshold filter to the data.
Generates two filters above and below the threshold value for a given analyte.
Parameters: - analyte (str) – The analyte that the filter applies to.
- win (int) – The window over which to calculate the moving gradient
- threshold (float) – The threshold value.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns: Return type:
-
filter_gradient_threshold_percentile
(analyte, percentiles, level='population', win=15, filt=False, samples=None, subset=None)[source]¶ Calculate a gradient threshold filter to the data.
Generates two filters above and below the threshold value for a given analyte.
Parameters: - analyte (str) – The analyte that the filter applies to.
- win (int) – The window over which to calculate the moving gradient
- percentiles (float or iterable of len=2) – The percentile values.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns: Return type:
-
filter_nremoved
(filt=True, quiet=False)[source]¶ Report how many data are removed by the active filters.
-
filter_off
(filt=None, analyte=None, samples=None, subset=None, show_status=False)[source]¶ Turns data filters off for particular analytes and samples.
Parameters: - filt (optional, str or array_like) – Name, partial name or list of names of filters. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name are activated. Defaults to all filters.
- analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes.
- samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
Returns: Return type:
-
filter_on
(filt=None, analyte=None, samples=None, subset=None, show_status=False)[source]¶ Turns data filters on for particular analytes and samples.
Parameters: - filt (optional, str or array_like) – Name, partial name or list of names of filters. Supports partial matching. i.e. if ‘cluster’ is specified, all filters with ‘cluster’ in the name are activated. Defaults to all filters.
- analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes.
- samples (optional, array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
Returns: Return type:
-
filter_reports
(analytes, filt_str='all', nbin=5, samples=None, outdir=None, subset='All_Samples')[source]¶ Plot filter reports for all filters that contain
filt_str
in the name.
-
filter_status
(sample=None, subset=None, stds=False)[source]¶ Prints the current status of filters for specified samples.
Parameters:
-
filter_threshold
(analyte, threshold, samples=None, subset=None)[source]¶ Applies a threshold filter to the data.
Generates two filters above and below the threshold value for a given analyte.
Parameters: - analyte (str) – The analyte that the filter applies to.
- threshold (float) – The threshold value.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns: Return type:
-
filter_threshold_percentile
(analyte, percentiles, level='population', filt=False, samples=None, subset=None)[source]¶ Applies a threshold filter to the data.
Generates two filters above and below the threshold value for a given analyte.
Parameters: - analyte (str) – The analyte that the filter applies to.
- percentiles (float or iterable of len=2) – The percentile values.
- level (str) – Whether to calculate percentiles from the entire dataset (‘population’) or for each individual sample (‘individual’)
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- samples (array_like or None) – Which samples to apply this filter to. If None, applies to all samples.
- subset (str or number) – The subset of samples (defined by make_subset) you want to apply the filter to.
Returns: Return type:
-
filter_trim
(start=1, end=1, filt=True, samples=None, subset=None)[source]¶ Remove points from the start and end of filter regions.
Parameters: - end (start,) – The number of points to remove from the start and end of the specified filter.
- filt (valid filter string or bool) – Which filter to trim. If True, applies to currently active filters.
-
find_expcoef
(nsd_below=0.0, plot=False, trimlim=None, autorange_kwargs={})[source]¶ Determines exponential decay coefficient for despike filter.
Fits an exponential decay function to the washout phase of standards to determine the washout time of your laser cell. The exponential coefficient reported is nsd_below standard deviations below the fitted exponent, to ensure that no real data is removed.
Total counts are used in fitting, rather than a specific analyte.
Parameters: - nsd_below (float) – The number of standard deviations to subtract from the fitted coefficient when calculating the filter exponent.
- plot (bool or str) – If True, creates a plot of the fit, if str the plot is to the location specified in str.
- trimlim (float) – A threshold limit used in determining the start of the exponential decay region of the washout. Defaults to half the increase in signal over background. If the data in the plot don’t fall on an exponential decay line, change this number. Normally you’ll need to increase it.
Returns: Return type:
-
fit_classifier
(name, analytes, method, samples=None, subset=None, filt=True, sort_by=0, **kwargs)[source]¶ Create a clustering classifier based on all samples, or a subset.
Parameters: - name (str) – The name of the classifier.
- analytes (str or iterable) – Which analytes the clustering algorithm should consider.
- method (str) –
Which clustering algorithm to use. Can be:
- ’meanshift’
- The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation.
- ’kmeans’
- The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters.
- samples (iterable) – list of samples to consider. Overrides ‘subset’.
- subset (str) – The subset of samples used to fit the classifier. Ignored if ‘samples’ is specified.
- sort_by (int) – Which analyte the resulting clusters should be sorted by - defaults to 0, which is the first analyte.
- **kwargs – method-specific keyword parameters - see below.
- Parameters (Meanshift) –
- bandwidth : str or float
- The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth.
- bin_seeding : bool
- Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation.
- - Means Parameters (K) –
- n_clusters : int
- The number of clusters expected in the data.
Returns: name
Return type:
-
get_background
(n_min=10, n_max=None, focus_stage='despiked', bkg_filter=False, f_win=5, f_n_lim=3)[source]¶ Extract all background data from all samples on universal time scale. Used by both ‘polynomial’ and ‘weightedmean’ methods.
Parameters: - n_min (int) – The minimum number of points a background region must have to be included in calculation.
- n_max (int) – The maximum number of points a background region must have to be included in calculation.
- filter (bool) – If true, apply a rolling filter to the isolated background regions to exclude regions with anomalously high values. If True, two parameters alter the filter’s behaviour:
- f_win (int) – The size of the rolling window
- f_n_lim (float) – The number of standard deviations above the rolling mean to set the threshold.
- focus_stage (str) –
Which stage of analysis to apply processing to. Defaults to ‘despiked’ if present, or ‘rawdata’ if not. Can be one of: * ‘rawdata’: raw data, loaded from csv file. * ‘despiked’: despiked data. * ‘signal’/’background’: isolated signal and background data.
Created by self.separate, after signal and background regions have been identified by self.autorange.- ’bkgsub’: background subtracted data, created by self.bkg_correct
- ’ratios’: element ratio data, created by self.ratio.
- ’calibrated’: ratio data calibrated to standards, created by self.calibrate.
Returns: Return type: pandas.DataFrame object containing background data.
-
get_focus
(filt=False, samples=None, subset=None, nominal=False)[source]¶ Collect all data from all samples into a single array. Data from standards is not collected.
Parameters: Returns: Return type:
-
get_gradients
(analytes=None, win=15, filt=False, samples=None, subset=None, recalc=True)[source]¶ Collect all data from all samples into a single array. Data from standards is not collected.
Parameters: Returns: Return type:
-
getstats
(save=True, filename=None, samples=None, subset=None, ablation_time=False)[source]¶ Return pandas dataframe of all sample statistics.
-
gradient_crossplot
(analytes=None, win=15, lognorm=True, bins=25, filt=False, samples=None, subset=None, figsize=(12, 12), save=False, colourful=True, mode='hist2d', recalc=True, **kwargs)[source]¶ Plot analyte gradients against each other.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram.
- bins (int) – The number of bins in the 2D histogram.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- figsize (tuple) – Figure size (width, height) in inches.
- save (bool or str) – If True, plot is saves as ‘crossplot.png’, if str plot is saves as str.
- colourful (bool) – Whether or not the plot should be colourful :).
- mode (str) – ‘hist2d’ (default) or ‘scatter’
- recalc (bool) – Whether to re-calculate the gradients, or use existing gradients.
Returns: Return type: (fig, axes)
-
gradient_histogram
(analytes=None, win=15, filt=False, bins=None, samples=None, subset=None, recalc=True, ncol=4)[source]¶ Plot a histogram of the gradients in all samples.
Parameters: - filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- bins (None or array-like) – The bins to use in the histogram
- samples (str or list) – which samples to get
- subset (str or int) – which subset to get
- recalc (bool) – Whether to re-calculate the gradients, or use existing gradients.
Returns: Return type: fig, ax
-
gradient_plots
(analytes=None, win=15, samples=None, ranges=False, focus=None, outdir=None, figsize=[10, 4], subset='All_Analyses')[source]¶ Plot analyte gradients as a function of time.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- samples (optional, array_like or str) – The sample(s) to plot. Defaults to all samples.
- ranges (bool) – Whether or not to show the signal/backgroudn regions identified by ‘autorange’.
- focus (str) – The focus ‘stage’ of the analysis to plot. Can be ‘rawdata’, ‘despiked’:, ‘signal’, ‘background’, ‘bkgsub’, ‘ratios’ or ‘calibrated’.
- outdir (str) – Path to a directory where you’d like the plots to be saved. Defaults to ‘reports/[focus]’ in your data directory.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- scale (str) – If ‘log’, plots the data on a log scale.
- figsize (array_like) – Array of length 2 specifying figure [width, height] in inches.
- stats (bool) – Whether or not to overlay the mean and standard deviations for each trace.
- err (stat,) – The names of the statistic and error components to plot. Deafaults to ‘nanmean’ and ‘nanstd’.
Returns: Return type:
-
histograms
(analytes=None, bins=25, logy=False, filt=False, colourful=True)[source]¶ Plot histograms of analytes.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- bins (int) – The number of bins in each histogram (default = 25)
- logy (bool) – If true, y axis is a log scale.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- colourful (bool) – If True, histograms are colourful :)
Returns: Return type: (fig, axes)
-
make_subset
(samples=None, name=None)[source]¶ Creates a subset of samples, which can be treated independently.
Parameters:
-
minimal_export
(target_analytes=None, path=None)[source]¶ Exports a analysis parameters, standard info and a minimal dataset, which can be imported by another user.
Parameters: - target_analytes (str or iterable) – Which analytes to include in the export. If specified, the export will contain these analytes, and all other analytes used during data processing (e.g. during filtering). If not specified, all analytes are exported.
- path (str) – Where to save the minimal export. If it ends with .zip, a zip file is created. If it’s a folder, all data are exported to a folder.
-
optimisation_plots
(overlay_alpha=0.5, samples=None, subset=None, **kwargs)[source]¶ Plot the result of signal_optimise.
signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.
Parameters: - d (latools.D object) – A latools data object.
- overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1.
- **kwargs – Passed to tplot
-
optimise_signal
(analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, filt=True, weights=None, mode='minimise', samples=None, subset=None)[source]¶ Optimise data selection based on specified analytes.
Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.
Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.
The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.
All possible regions of the data that have at least min_points are considered.
For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.
Parameters: - d (latools.D object) – An latools data object.
- analytes (str or array-like) – Which analytes to consider.
- min_points (int) – The minimum number of contiguous points to consider.
- threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number.
- weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None.
-
ratio
(internal_standard=None)[source]¶ Calculates the ratio of all analytes to a single analyte.
Parameters: internal_standard (str) – The name of the analyte to divide all other analytes by. Returns: Return type: None
-
sample_stats
(analytes=None, filt=True, stats=['mean', 'std'], eachtrace=True, csf_dict={})[source]¶ Calculate sample statistics.
Returns samples, analytes, and arrays of statistics of shape (samples, analytes). Statistics are calculated from the ‘focus’ data variable, so output depends on how the data have been processed.
Included stat functions:
mean()
: arithmetic meanstd()
: arithmetic standard deviationse()
: arithmetic standard errorH15_mean()
: Huber mean (outlier removal)H15_std()
: Huber standard deviation (outlier removal)H15_se()
: Huber standard error (outlier removal)
Parameters: - analytes (optional, array_like or str) – The analyte(s) to calculate statistics for. Defaults to all analytes.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- stats (array_like) – take a single array_like input, and return a single statistic. list of functions or names (see above) or functions that Function should be able to cope with NaN values.
- eachtrace (bool) – Whether to calculate the statistics for each analysis spot individually, or to produce per - sample means. Default is True.
Returns: Adds dict to analyse object containing samples, analytes and functions and data.
Return type:
-
save_log
(directory=None, logname=None, header=None)[source]¶ Save analysis.lalog in specified location
-
set_focus
(focus_stage=None, samples=None, subset=None)[source]¶ Set the ‘focus’ attribute of the data file.
The ‘focus’ attribute of the object points towards data from a particular stage of analysis. It is used to identify the ‘working stage’ of the data. Processing functions operate on the ‘focus’ stage, so if steps are done out of sequence, things will break.
Names of analysis stages:
- ‘rawdata’: raw data, loaded from csv file when object is initialised.
- ‘despiked’: despiked data.
- ‘signal’/’background’: isolated signal and background data, padded with np.nan. Created by self.separate, after signal and background regions have been identified by self.autorange.
- ‘bkgsub’: background subtracted data, created by self.bkg_correct
- ‘ratios’: element ratio data, created by self.ratio.
- ‘calibrated’: ratio data calibrated to standards, created by self.calibrate.
Parameters: focus (str) – The name of the analysis stage desired. Returns: Return type: None
-
srm_id_auto
(srms_used=['NIST610', 'NIST612', 'NIST614'], n_min=10, reload_srm_database=False)[source]¶ Function for automarically identifying SRMs
Parameters: - srms_used (iterable) – Which SRMs have been used. Must match SRM names in SRM database exactly (case sensitive!).
- n_min (int) – The minimum number of data points a SRM measurement must contain to be included.
-
statplot
(analytes=None, samples=None, figsize=None, stat='mean', err='std', subset=None)[source]¶ Function for visualising per-ablation and per-sample means.
Parameters: - analytes (str or iterable) – Which analyte(s) to plot
- samples (str or iterable) – Which sample(s) to plot
- figsize (tuple) – Figure (width, height) in inches
- stat (str) – Which statistic to plot. Must match the name of the functions used in ‘sample_stats’.
- err (str) – Which uncertainty to plot.
- subset (str) – Which subset of samples to plot.
-
trace_plots
(analytes=None, samples=None, ranges=False, focus=None, outdir=None, filt=None, scale='log', figsize=[10, 4], stats=False, stat='nanmean', err='nanstd', subset='All_Analyses')[source]¶ Plot analytes as a function of time.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- samples (optional, array_like or str) – The sample(s) to plot. Defaults to all samples.
- ranges (bool) – Whether or not to show the signal/backgroudn regions identified by ‘autorange’.
- focus (str) – The focus ‘stage’ of the analysis to plot. Can be ‘rawdata’, ‘despiked’:, ‘signal’, ‘background’, ‘bkgsub’, ‘ratios’ or ‘calibrated’.
- outdir (str) – Path to a directory where you’d like the plots to be saved. Defaults to ‘reports/[focus]’ in your data directory.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
- scale (str) – If ‘log’, plots the data on a log scale.
- figsize (array_like) – Array of length 2 specifying figure [width, height] in inches.
- stats (bool) – Whether or not to overlay the mean and standard deviations for each trace.
- err (stat,) – The names of the statistic and error components to plot. Deafaults to ‘nanmean’ and ‘nanstd’.
Returns: Return type:
-
latools.latools.
reproduce
(past_analysis, plotting=False, data_folder=None, srm_table=None, custom_stat_functions=None)[source]¶ Reproduce a previous analysis exported with
latools.analyse.minimal_export()
For normal use, supplying log_file and specifying a plotting option should be enough to reproduce an analysis. All requisites (raw data, SRM table and any custom stat functions) will then be imported from the minimal_export folder.
You may also specify your own raw_data, srm_table and custom_stat_functions, if you wish.
Parameters: - log_file (str) – The path to the log file produced by
minimal_export()
. - plotting (bool) – Whether or not to output plots.
- data_folder (str) – Optional. Specify a different data folder. Data folder should normally be in the same folder as the log file.
- srm_table (str) – Optional. Specify a different SRM table. SRM table should normally be in the same folder as the log file.
- custom_stat_functions (str) – Optional. Specify a python file containing custom stat functions for use by reproduce. Any custom stat functions should normally be included in the same folder as the log file.
- log_file (str) – The path to the log file produced by
latools.D object¶
The Data object, used to contain single laser ablation data files.
-
class
latools.D_obj.
D
(data_file, dataformat=None, errorhunt=False, cmap=None, internal_standard=None, name='file_names')[source]¶ Bases:
object
Container for data from a single laser ablation analysis.
Parameters: - data_file (str) – The path to a data file.
- errorhunt (bool) – Whether or not to print each data file name before import. This is useful for tracing which data file is causing the import to fail.
- dataformat (str or dict) – Either a path to a data format file, or a dataformat dict. See documentation for more details.
-
sample
¶ str – Sample name.
-
meta
¶ dict – Metadata extracted from the csv header. Contents varies, depending on your dataformat.
-
analytes
¶ array_like – A list of analytes measured.
-
data
¶ dict – A dictionary containing the raw data, and modified data from each processing stage. Entries can be:
- ‘rawdata’: created during initialisation.
- ‘despiked’: created by despike
- ‘signal’: created by autorange
- ‘background’: created by autorange
- ‘bkgsub’: created by bkg_correct
- ‘ratios’: created by ratio
- ‘calibrated’: created by calibrate
-
focus
¶ dict – A dictionary containing one item from data. This is the currently ‘active’ data that processing functions will work on. This data is also directly available as class attributes with the same names as the items in focus.
-
focus_stage
¶ str – Identifies which item in data is currently assigned to focus.
-
cmap
¶ dict – A dictionary containing hex colour strings corresponding to each measured analyte.
-
bkg, sig, trn
array_like, bool – Boolean arrays identifying signal, background and transition regions. Created by autorange.
-
bkgrng, sigrng, trnrng
array_like – An array of shape (n, 2) containing pairs of values that describe the Time limits of background, signal and transition regions.
-
ns
¶ array_like – An integer array the same length as the data, where each analysis spot is labelled with a unique number. Used for separating analysis spots when calculating sample statistics.
-
filt
¶ filt object – An object for storing, selecting and creating data filters.F
-
ablation_times
()[source]¶ Function for calculating the ablation time for each ablation.
Returns: Return type: dict of times for each ablation.
-
autorange
(analyte='total_counts', gwin=5, swin=3, win=30, on_mult=[1.0, 1.0], off_mult=[1.0, 1.5], ploterrs=True, transform='log', **kwargs)[source]¶ Automatically separates signal and background data regions.
Automatically detect signal and background regions in the laser data, based on the behaviour of a single analyte. The analyte used should be abundant and homogenous in the sample.
Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.
Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.
Parameters: - analyte (str) – The analyte that autorange should consider. For best results, choose an analyte that is present homogeneously in high concentrations.
- gwin (int) – The smoothing window used for calculating the first derivative. Must be odd.
- win (int) – Determines the width (c +/- win) of the transition data subsets.
- and off_mult (on_mult) – Factors to control the width of the excluded transition regions. A region n times the full - width - half - maximum of the transition gradient will be removed either side of the transition center. on_mult and off_mult refer to the laser - on and laser - off transitions, respectively. See manual for full explanation. Defaults to (1.5, 1) and (1, 1.5).
Returns: - Outputs added as instance attributes. Returns None.
- bkg, sig, trn (iterable, bool) – Boolean arrays identifying background, signal and transision regions
- bkgrng, sigrng and trnrng (iterable) – (min, max) pairs identifying the boundaries of contiguous True regions in the boolean arrays.
-
autorange_plot
(analyte='total_counts', gwin=7, swin=None, win=20, on_mult=[1.5, 1.0], off_mult=[1.0, 1.5], transform='log')[source]¶ Plot a detailed autorange report for this sample.
-
bkg_subtract
(analyte, bkg, ind=None, focus_stage='despiked')[source]¶ Subtract provided background from signal (focus stage).
Results is saved in new ‘bkgsub’ focus stage
Returns: Return type: None
-
calc_correlation
(x_analyte, y_analyte, window=15, filt=True, recalc=True)[source]¶ Calculate local correlation between two analytes.
Parameters: - y_analyte (x_analyte,) – The names of the x and y analytes to correlate.
- window (int, None) – The rolling window used when calculating the correlation.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns: Return type:
-
calibrate
(calib_ps, analytes=None)[source]¶ Apply calibration to data.
The calib_dict must be calculated at the analyse level, and passed to this calibrate function.
Parameters: calib_dict (dict) – A dict of calibration values to apply to each analyte. Returns: Return type: None
-
correct_spectral_interference
(target_analyte, source_analyte, f)[source]¶ Correct spectral interference.
Subtract interference counts from target_analyte, based on the intensity of a source_analayte and a known fractional contribution (f).
Correction takes the form: target_analyte -= source_analyte * f
Only operates on background-corrected data (‘bkgsub’).
To undo a correction, rerun self.bkg_subtract().
Parameters: Returns: Return type:
-
correlation_plot
(x_analyte, y_analyte, window=15, filt=True, recalc=False)[source]¶ Plot the local correlation between two analytes.
Parameters: - y_analyte (x_analyte,) – The names of the x and y analytes to correlate.
- window (int, None) – The rolling window used when calculating the correlation.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns: fig, axs
Return type: figure and axes objects
-
crossplot
(analytes=None, bins=25, lognorm=True, filt=True, colourful=True, figsize=(12, 12))[source]¶ Plot analytes against each other.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- lognorm (bool) – Whether or not to log normalise the colour scale of the 2D histogram.
- bins (int) – The number of bins in the 2D histogram.
- filt (str, dict or bool) – Either logical filter expression contained in a str, a dict of expressions specifying the filter string to use for each analyte or a boolean. Passed to grab_filt.
Returns: Return type: (fig, axes)
-
crossplot_filters
(filter_string, analytes=None)[source]¶ Plot the results of a group of filters in a crossplot.
Parameters: Returns: Return type: fig, axes objects
-
despike
(expdecay_despiker=True, exponent=None, noise_despiker=True, win=3, nlim=12.0, maxiter=3)[source]¶ Applies expdecay_despiker and noise_despiker to data.
Parameters: - expdecay_despiker (bool) – Whether or not to apply the exponential decay filter.
- exponent (None or float) – The exponent for the exponential decay filter. If None, it is determined automatically using find_expocoef.
- noise_despiker (bool) – Whether or not to apply the standard deviation spike filter.
- win (int) – The rolling window over which the spike filter calculates the trace statistics.
- nlim (float) – The number of standard deviations above the rolling mean that data are excluded.
- maxiter (int) – The max number of times that the fitler is applied.
Returns: Return type:
-
filter_clustering
(analytes, filt=False, normalise=True, method='meanshift', include_time=False, sort=None, min_data=10, **kwargs)[source]¶ Applies an n - dimensional clustering filter to the data.
Available Clustering Algorithms
- ‘meanshift’: The sklearn.cluster.MeanShift algorithm. Automatically determines number of clusters in data based on the bandwidth of expected variation.
- ‘kmeans’: The sklearn.cluster.KMeans algorithm. Determines the characteristics of a known number of clusters within the data. Must provide n_clusters to specify the expected number of clusters.
- ‘DBSCAN’: The sklearn.cluster.DBSCAN algorithm. Automatically determines the number and characteristics of clusters within the data based on the ‘connectivity’ of the data (i.e. how far apart each data point is in a multi - dimensional parameter space). Requires you to set eps, the minimum distance point must be from another point to be considered in the same cluster, and min_samples, the minimum number of points that must be within the minimum distance for it to be considered a cluster. It may also be run in automatic mode by specifying n_clusters alongside min_samples, where eps is decreased until the desired number of clusters is obtained.
For more information on these algorithms, refer to the documentation.
Parameters: - analytes (str) – The analyte(s) that the filter applies to.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- normalise (bool) – Whether or not to normalise the data to zero mean and unit variance. Reccomended if clustering based on more than 1 analyte. Uses sklearn.preprocessing.scale.
- method (str) – Which clustering algorithm to use (see above).
- include_time (bool) – Whether or not to include the Time variable in the clustering analysis. Useful if you’re looking for spatially continuous clusters in your data, i.e. this will identify each spot in your analysis as an individual cluster.
- sort (bool, str or array-like) – Whether or not to label the resulting clusters according to their contents. If used, the cluster with the lowest values will be labelled from 0, in order of increasing cluster mean value.analytes. The sorting rules depend on the value of ‘sort’, which can be the name of a single analyte (str), a list of several analyte names (array-like) or True (bool), to specify all analytes used to calcualte the cluster.
- min_data (int) – The minimum number of data points that should be considered by the filter. Default = 10.
- **kwargs – Parameters passed to the clustering algorithm specified by method.
- Parameters (DBSCAN) –
- -------------------- –
- bandwidth (str or float) – The bandwith (float) or bandwidth method (‘scott’ or ‘silverman’) used to estimate the data bandwidth.
- bin_seeding (bool) – Modifies the behaviour of the meanshift algorithm. Refer to sklearn.cluster.meanshift documentation.
- - Means Parameters (K) –
- ------------------ –
- n_clusters (int) – The number of clusters expected in the data.
- Parameters –
- ----------------- –
- eps (float) – The minimum ‘distance’ points must be apart for them to be in the same cluster. Defaults to 0.3. Note: If the data are normalised (they should be for DBSCAN) this is in terms of total sample variance. Normalised data have a mean of 0 and a variance of 1.
- min_samples (int) – The minimum number of samples within distance eps required to be considered as an independent cluster.
- n_clusters – The number of clusters expected. If specified, eps will be incrementally reduced until the expected number of clusters is found.
- maxiter (int) – The maximum number of iterations DBSCAN will run.
Returns: Return type:
-
filter_correlation
(x_analyte, y_analyte, window=15, r_threshold=0.9, p_threshold=0.05, filt=True, recalc=False)[source]¶ Calculate correlation filter.
Parameters: - y_analyte (x_analyte,) – The names of the x and y analytes to correlate.
- window (int, None) – The rolling window used when calculating the correlation.
- r_threshold (float) – The correlation index above which to exclude data. Note: the absolute pearson R value is considered, so negative correlations below -r_threshold will also be excluded.
- p_threshold (float) – The significant level below which data are excluded.
- filt (bool) – Whether or not to apply existing filters to the data before calculating this filter.
- recalc (bool) – If True, the correlation is re-calculated, even if it is already present.
Returns: Return type:
-
filter_exclude_downhole
(threshold, filt=True)[source]¶ Exclude all points down-hole (after) the first excluded data.
Parameters:
-
filter_gradient_threshold
(analyte, win, threshold, recalc=True)[source]¶ Apply gradient threshold filter.
Generates threshold filters for the given analytes above and below the specified threshold.
- Two filters are created with prefixes ‘_above’ and ‘_below’.
- ‘_above’ keeps all the data above the threshold. ‘_below’ keeps all the data below the threshold.
i.e. to select data below the threshold value, you should turn the ‘_above’ filter off.
Parameters: Returns: Return type:
-
filter_new
(name, filt_str)[source]¶ Make new filter from combination of other filters.
Parameters: - name (str) – The name of the new filter. Should be unique.
- filt_str (str) – A logical combination of partial strings which will create the new filter. For example, ‘Albelow & Mnbelow’ will combine all filters that partially match ‘Albelow’ with those that partially match ‘Mnbelow’ using the ‘AND’ logical operator.
Returns: Return type:
-
filter_report
(filt=None, analytes=None, savedir=None, nbin=5)[source]¶ Visualise effect of data filters.
Parameters: Returns: Return type: (fig, axes)
-
filter_threshold
(analyte, threshold)[source]¶ Apply threshold filter.
Generates threshold filters for the given analytes above and below the specified threshold.
- Two filters are created with prefixes ‘_above’ and ‘_below’.
- ‘_above’ keeps all the data above the threshold. ‘_below’ keeps all the data below the threshold.
i.e. to select data below the threshold value, you should turn the ‘_above’ filter off.
Parameters: - analyte (TYPE) – Description of analyte.
- threshold (TYPE) – Description of threshold.
Returns: Return type:
-
filter_trim
(start=1, end=1, filt=True)[source]¶ Remove points from the start and end of filter regions.
Parameters: - end (start,) – The number of points to remove from the start and end of the specified filter.
- filt (valid filter string or bool) – Which filter to trim. If True, applies to currently active filters.
-
get_params
()[source]¶ Returns paramters used to process data.
Returns: dict of analysis parameters Return type: dict
-
gplot
(analytes=None, win=5, figsize=[10, 4], ranges=False, focus_stage=None, ax=None)[source]¶ Plot analytes gradients as a function of Time.
Parameters: Returns: Return type: figure, axis
-
mkrngs
()[source]¶ Transform boolean arrays into list of limit pairs.
Gets Time limits of signal/background boolean arrays and stores them as sigrng and bkgrng arrays. These arrays can be saved by ‘save_ranges’ in the analyse object.
-
optimisation_plot
(overlay_alpha=0.5, **kwargs)[source]¶ Plot the result of signal_optimise.
signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.
Parameters: - d (latools.D object) – A latools data object.
- overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1.
- **kwargs – Passed to tplot
-
ratio
(internal_standard=None)[source]¶ Divide all analytes by a specified internal_standard analyte.
Parameters: internal_standard (str) – The analyte used as the internal_standard. Returns: Return type: None
-
sample_stats
(analytes=None, filt=True, stat_fns={}, eachtrace=True)[source]¶ Calculate sample statistics
Returns samples, analytes, and arrays of statistics of shape (samples, analytes). Statistics are calculated from the ‘focus’ data variable, so output depends on how the data have been processed.
Parameters: - analytes (array_like) – List of analytes to calculate the statistic on
- filt (bool or str) –
- The filter to apply to the data when calculating sample statistics.
- bool: True applies filter specified in filt.switches. str: logical string specifying a partucular filter
- stat_fns (dict) – Dict of {name: function} pairs. Functions that take a single array_like input, and return a single statistic. Function should be able to cope with NaN values.
- eachtrace (bool) – True: per - ablation statistics False: whole sample statistics
Returns: Return type:
-
setfocus
(focus)[source]¶ Set the ‘focus’ attribute of the data file.
The ‘focus’ attribute of the object points towards data from a particular stage of analysis. It is used to identify the ‘working stage’ of the data. Processing functions operate on the ‘focus’ stage, so if steps are done out of sequence, things will break.
Names of analysis stages:
- ‘rawdata’: raw data, loaded from csv file when object is initialised.
- ‘despiked’: despiked data.
- ‘signal’/’background’: isolated signal and background data, padded with np.nan. Created by self.separate, after signal and background regions have been identified by self.autorange.
- ‘bkgsub’: background subtracted data, created by self.bkg_correct
- ‘ratios’: element ratio data, created by self.ratio.
- ‘calibrated’: ratio data calibrated to standards, created by self.calibrate.
Parameters: focus (str) – The name of the analysis stage desired. Returns: Return type: None
-
signal_optimiser
(analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, weights=None, filt=True, mode='minimise')[source]¶ Optimise data selection based on specified analytes.
Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.
Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.
The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.
All possible regions of the data that have at least min_points are considered.
For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.
Parameters: - d (latools.D object) – An latools data object.
- analytes (str or array-like) – Which analytes to consider.
- min_points (int) – The minimum number of contiguous points to consider.
- threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number.
- weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None.
-
tplot
(analytes=None, figsize=[10, 4], scale='log', filt=None, ranges=False, stats=False, stat='nanmean', err='nanstd', focus_stage=None, err_envelope=False, ax=None)[source]¶ Plot analytes as a function of Time.
Parameters: - analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes.
- figsize (tuple) – size of final figure.
- scale (str or None) – ‘log’ = plot data on log scale
- filt (bool, str or dict) – False: plot unfiltered data. True: plot filtered data over unfiltered data. str: apply filter key to all analytes dict: apply key to each analyte in dict. Must contain all analytes plotted. Can use self.filt.keydict.
- ranges (bool) – show signal/background regions.
- stats (bool) – plot average and error of each trace, as specified by stat and err.
- stat (str) – average statistic to plot.
- err (str) – error statistic to plot.
Returns: Return type: figure, axis
Filtering¶
-
class
latools.filtering.filt_obj.
filt
(size, analytes)[source]¶ Bases:
object
Container for creating, storing and selecting data filters.
Parameters: - size (int) – The length that the filters need to be (should be the same as your data).
- analytes (array_like) – A list of the analytes measured in your data.
-
size
¶ int – The length that the filters need to be (should be the same as your data).
-
analytes
¶ array_like – A list of the analytes measured in your data.
-
components
¶ dict – A dict containing each individual filter that has been created.
-
info
¶ dict – A dict containing descriptive information about each filter in components.
-
params
¶ dict – A dict containing the parameters used to create each filter, which can be passed directly to the corresponding filter function to recreate the filter.
-
switches
¶ dict – A dict of boolean switches specifying which filters are active for each analyte.
-
keys
¶ dict – A dict of logical strings specifying which filters are applied to each analyte.
-
sequence
¶ dict – A numbered dict specifying what order the filters were applied in (for some filters, order matters).
-
n
¶ int – The number of filters applied to the data.
-
add
(name, filt, info='', params=(), setn=None)[source]¶ Add filter.
Parameters: Returns: Return type:
-
fuzzmatch
(fuzzkey, multi=False)[source]¶ Identify a filter by fuzzy string matching.
Partial (‘fuzzy’) matching performed by fuzzywuzzy.fuzzy.ratio
Parameters: fuzzkey (str) – A string that partially matches one filter name more than the others. Returns: The name of the most closely matched filter. Return type: str
-
get_components
(key, analyte=None)[source]¶ Extract filter components for specific analyte(s).
Parameters: Returns: boolean filter
Return type: array-like
-
grab_filt
(filt, analyte=None)[source]¶ Flexible access to specific filter using any key format.
Parameters: Returns: boolean filter
Return type: array_like
-
make
(analyte)[source]¶ Make filter for specified analyte(s).
Filter specified in filt.switches.
Parameters: analyte (str or array_like) – Name or list of names of analytes. Returns: boolean filter Return type: array_like
-
make_fromkey
(key)[source]¶ Make filter from logical expression.
Takes a logical expression as an input, and returns a filter. Used for advanced filtering, where combinations of nested and/or filters are desired. Filter names must exactly match the names listed by print(filt).
Example:
key = '(Filter_1 | Filter_2) & Filter_3'
is equivalent to:(Filter_1 OR Filter_2) AND Filter_3
statements in parentheses are evaluated first.Parameters: key (str) – logical expression describing filter construction. Returns: boolean filter Return type: array_like
-
make_keydict
(analyte=None)[source]¶ Make logical expressions describing the filter(s) for specified analyte(s).
Parameters: analyte (optional, str or array_like) – Name or list of names of analytes. Defaults to all analytes. Returns: containing the logical filter expression for each analyte. Return type: dict
-
off
(analyte=None, filt=None)[source]¶ Turn off specified filter(s) for specified analyte(s).
Parameters: Returns: Return type:
Functions for automatic selection optimisation.
-
latools.filtering.signal_optimiser.
bayes_scale
(s)[source]¶ Remove mean and divide by standard deviation, using bayes_kvm statistics.
-
latools.filtering.signal_optimiser.
calc_window_mean_std
(s, min_points, ind=None)[source]¶ Apply fn to all contiguous regions in s that have at least min_points.
-
latools.filtering.signal_optimiser.
calc_windows
(fn, s, min_points)[source]¶ Apply fn to all contiguous regions in s that have at least min_points.
-
latools.filtering.signal_optimiser.
calculate_optimisation_stats
(d, analytes, min_points, weights, ind, x_bias=0)[source]¶
-
latools.filtering.signal_optimiser.
optimisation_plot
(d, overlay_alpha=0.5, **kwargs)[source]¶ Plot the result of signal_optimise.
signal_optimiser must be run first, and the output stored in the opt attribute of the latools.D object.
Parameters: - d (latools.D object) – A latools data object.
- overlay_alpha (float) – The opacity of the threshold overlays. Between 0 and 1.
- **kwargs – Passed to tplot
-
latools.filtering.signal_optimiser.
scale
(s)[source]¶ Remove the mean, and divide by the standard deviation.
-
latools.filtering.signal_optimiser.
scaler
(s)¶ Remove median, divide by IQR.
-
latools.filtering.signal_optimiser.
signal_optimiser
(d, analytes, min_points=5, threshold_mode='kde_first_max', threshold_mult=1.0, x_bias=0, weights=None, ind=None, mode='minimise')[source]¶ Optimise data selection based on specified analytes.
Identifies the longest possible contiguous data region in the signal where the relative standard deviation (std) and concentration of all analytes is minimised.
Optimisation is performed via a grid search of all possible contiguous data regions. For each region, the mean std and mean scaled analyte concentration (‘amplitude’) are calculated.
The size and position of the optimal data region are identified using threshold std and amplitude values. Thresholds are derived from all calculated stds and amplitudes using the method specified by threshold_mode. For example, using the ‘kde_max’ method, a probability density function (PDF) is calculated for std and amplitude values, and the threshold is set as the maximum of the PDF. These thresholds are then used to identify the size and position of the longest contiguous region where the std is below the threshold, and the amplitude is either below the threshold.
All possible regions of the data that have at least min_points are considered.
For a graphical demonstration of the action of signal_optimiser, use optimisation_plot.
Parameters: - d (latools.D object) – An latools data object.
- analytes (str or array-like) – Which analytes to consider.
- min_points (int) – The minimum number of contiguous points to consider.
- threshold_mode (str) – The method used to calculate the optimisation thresholds. Can be ‘mean’, ‘median’, ‘kde_max’ or ‘bayes_mvs’, or a custom function. If a function, must take a 1D array, and return a single, real number.
- threshood_mult (float or tuple) – A multiplier applied to the calculated threshold before use. If a tuple, the first value is applied to the mean threshold, and the second is applied to the standard deviation threshold. Reduce this to make data selection more stringent.
- x_bias (float) – If non-zero, a bias is applied to the calculated statistics to prefer the beginning (if > 0) or end (if < 0) of the signal. Should be between zero and 1.
- weights (array-like of length len(analytes)) – An array of numbers specifying the importance of each analyte considered. Larger number makes the analyte have a greater effect on the optimisation. Default is None.
- ind (boolean array) – A boolean array the same length as the data. Where false, data will not be included.
- mode (str) – Whether to ‘minimise’ or ‘maximise’ the concentration of the elements.
Returns: dict, str
Return type: optimisation result, error message
-
class
latools.filtering.classifier_obj.
classifier
(analytes, sort_by=0)[source]¶ Bases:
object
-
fit
(data, method='kmeans', **kwargs)[source]¶ fit classifiers from large dataset.
Parameters: - data (dict) – A dict of data for clustering. Must contain items with the same name as analytes used for clustering.
- method (str) –
A string defining the clustering method used. Can be:
- ’kmeans’ : K-Means clustering algorithm
- ’meanshift’ : Meanshift algorithm
- n_clusters (int) – K-Means only. The numebr of clusters to identify
- bandwidth (float) – Meanshift only. The bandwidth value used during clustering. If none, determined automatically. Note: the data are scaled before clutering, so this is not in the same units as the data.
- bin_seeding (bool) – Meanshift only. Whether or not to use ‘bin_seeding’. See documentation for sklearn.cluster.MeanShift.
- **kwargs – passed to sklearn.cluster.MeanShift.
Returns: Return type:
-
fit_kmeans
(data, n_clusters, **kwargs)[source]¶ Fit KMeans clustering algorithm to data.
Parameters: - data (array-like) – A dataset formatted by classifier.fitting_data.
- n_clusters (int) – The number of clusters in the data.
- **kwargs – passed to sklearn.cluster.KMeans.
Returns: Return type: Fitted sklearn.cluster.KMeans object.
-
fit_meanshift
(data, bandwidth=None, bin_seeding=False, **kwargs)[source]¶ Fit MeanShift clustering algorithm to data.
Parameters: - data (array-like) – A dataset formatted by classifier.fitting_data.
- bandwidth (float) – The bandwidth value used during clustering. If none, determined automatically. Note: the data are scaled before clutering, so this is not in the same units as the data.
- bin_seeding (bool) – Whether or not to use ‘bin_seeding’. See documentation for sklearn.cluster.MeanShift.
- **kwargs – passed to sklearn.cluster.MeanShift.
Returns: Return type: Fitted sklearn.cluster.MeanShift object.
-
fitting_data
(data)[source]¶ Function to format data for cluster fitting.
Parameters: data (dict) – A dict of data, containing all elements of analytes as items. Returns: Return type: A data array for initial cluster fitting.
-
format_data
(data, scale=True)[source]¶ Function for converting a dict to an array suitable for sklearn.
Parameters: Returns: Return type: A data array suitable for use with sklearn.cluster.
-
map_clusters
(size, sampled, clusters)[source]¶ Translate cluster identity back to original data size.
Parameters: - size (int) – size of original dataset
- sampled (array-like) – integer array describing location of finite values in original data.
- clusters (array-like) – integer array of cluster identities
Returns: - list of cluster identities the same length as original
- data. Where original data are non-finite, returns -2.
-
predict
(data)[source]¶ Label new data with cluster identities.
Parameters: Returns: Return type: array of clusters the same length as the data.
-
Configuration¶
Note: the entire config module is available at the top level (i.e. latools.config
).
-
latools.helpers.config.
copy_SRM_file
(destination=None, config='DEFAULT')[source]¶ Creates a copy of the default SRM table at the specified location.
Parameters: - destination (str) – The save location for the SRM file. If no location specified, saves it as ‘LAtools_[config]_SRMTable.csv’ in the current working directory.
- config (str) – It’s possible to set up different configurations with different SRM files. This specifies the name of the configuration that you want to copy the SRM file from. If not specified, the ‘DEFAULT’ configuration is used.
-
latools.helpers.config.
create
(config_name, srmfile=None, dataformat=None, base_on='DEFAULT', make_default=False)[source]¶ Adds a new configuration to latools.cfg.
Parameters: - config_name (str) – The name of the new configuration. This should be descriptive (e.g. UC Davis Foram Group)
- srmfile (str (optional)) – The location of the srm file used for calibration.
- dataformat (str (optional)) – The location of the dataformat definition to use.
- base_on (str) – The name of the existing configuration to base the new one on. If either srm_file or dataformat are not specified, the new config will copy this information from the base_on config.
- make_default (bool) – Whether or not to make the new configuration the default for future analyses. Default = False.
Returns: Return type:
-
latools.helpers.config.
get_dataformat_template
(destination='./LAtools_dataformat_template.json')[source]¶ Copies a data format description JSON template to the specified location.
-
latools.helpers.config.
read_configuration
(config='DEFAULT')[source]¶ Read LAtools configuration file, and return parameters as dict.
-
latools.helpers.config.
read_latoolscfg
()[source]¶ Reads configuration, returns a ConfigParser object.
Distinct from read_configuration, which returns a dict.
-
latools.helpers.config.
test_dataformat
(data_file, dataformat_file, name_mode='file_names')[source]¶ Test a data formatfile against a particular data file.
This goes through all the steps of data import printing out the results of each step, so you can see where the import fails.
Parameters: - data_file (str) – Path to data file, including extension.
- dataformat (dict or str) – A dataformat dict, or path to file. See example below.
- name_mode (str) – How to identyfy sample names. If ‘file_names’ uses the input name of the file, stripped of the extension. If ‘metadata_names’ uses the ‘name’ attribute of the ‘meta’ sub-dictionary in dataformat. If any other str, uses this str as the sample name.
Example
>>> {'genfromtext_args': {'delimiter': ',', 'skip_header': 4}, # passed directly to np.genfromtxt 'column_id': {'name_row': 3, # which row contains the column names 'delimiter': ',', # delimeter between column names 'timecolumn': 0, # which column contains the 'time' variable 'pattern': '([A-z]{1,2}[0-9]{1,3})'}, # a regex pattern which captures the column names 'meta_regex': { # a dict of (line_no: ([descriptors], [regexs])) pairs 0: (['path'], '(.*)'), 2: (['date', 'method'], # MUST include date '([A-Z][a-z]+ [0-9]+ [0-9]{4}[ ]+[0-9:]+ [amp]+).* ([A-z0-9]+\.m)') } }
Returns: sample, analytes, data, meta Return type: tuple
Preprocessing¶
-
latools.preprocessing.split.
by_regex
(file, outdir=None, split_pattern=None, global_header_rows=0, fname_pattern=None, trim_tail_lines=0, trim_head_lines=0)[source]¶ Split one long analysis file into multiple smaller ones.
Parameters: - file (str) – The path to the file you want to split.
- outdir (str) – The directory to save the split files to. If None, files are saved to a new directory called ‘split’, which is created inside the data directory.
- split_pattern (regex string) – A regular expression that will match lines in the file that mark the start of a new section. Does not have to match the whole line, but must provide a positive match to the lines containing the pattern.
- global_header_rows (int) – How many rows at the start of the file to include in each new sub-file.
- fname_pattern (regex string) – A regular expression that identifies a new file name in the lines identified by split_pattern. If none, files will be called ‘noname_N’. The extension of the main file will be used for all sub-files.
- trim_head_lines (int) – If greater than zero, this many lines are removed from the start of each segment
- trim_tail_lines (int) – If greater than zero, this many lines are removed from the end of each segment
Returns: Path to new directory
Return type:
Helpers¶
-
latools.processes.data_read.
read_data
(data_file, dataformat, name_mode)[source]¶ Load data_file described by a dataformat dict.
Parameters: - data_file (str) – Path to data file, including extension.
- dataformat (dict) – A dataformat dict, see example below.
- name_mode (str) – How to identyfy sample names. If ‘file_names’ uses the input name of the file, stripped of the extension. If ‘metadata_names’ uses the ‘name’ attribute of the ‘meta’ sub-dictionary in dataformat. If any other str, uses this str as the sample name.
Example
>>> {'genfromtext_args': {'delimiter': ',', 'skip_header': 4}, # passed directly to np.genfromtxt 'column_id': {'name_row': 3, # which row contains the column names 'delimiter': ',', # delimeter between column names 'timecolumn': 0, # which column contains the 'time' variable 'pattern': '([A-z]{1,2}[0-9]{1,3})'}, # a regex pattern which captures the column names 'meta_regex': { # a dict of (line_no: ([descriptors], [regexs])) pairs 0: (['path'], '(.*)'), 2: (['date', 'method'], # MUST include date '([A-Z][a-z]+ [0-9]+ [0-9]{4}[ ]+[0-9:]+ [amp]+).* ([A-z0-9]+\.m)') } }
Returns: sample, analytes, data, meta Return type: tuple
-
latools.processes.despiking.
expdecay_despike
(sig, expdecay_coef, tstep, maxiter=3)[source]¶ Apply exponential decay filter to remove physically impossible data based on instrumental washout.
The filter is re-applied until no more points are removed, or maxiter is reached.
Parameters: Returns: Return type:
-
latools.processes.despiking.
noise_despike
(sig, win=3, nlim=24.0, maxiter=4)[source]¶ Apply standard deviation filter to remove anomalous values.
Parameters: Returns: Return type:
-
latools.processes.signal_id.
autorange
(t, sig, gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), nbin=10, transform='log', thresh=None)[source]¶ Automatically separates signal and background in an on/off data stream.
Step 1: Thresholding. The background signal is determined using a gaussian kernel density estimator (kde) of all the data. Under normal circumstances, this kde should find two distinct data distributions, corresponding to ‘signal’ and ‘background’. The minima between these two distributions is taken as a rough threshold to identify signal and background regions. Any point where the trace crosses this thrshold is identified as a ‘transition’.
Step 2: Transition Removal. The width of the transition regions between signal and background are then determined, and the transitions are excluded from analysis. The width of the transitions is determined by fitting a gaussian to the smoothed first derivative of the analyte trace, and determining its width at a point where the gaussian intensity is at at conf time the gaussian maximum. These gaussians are fit to subsets of the data centered around the transitions regions determined in Step 1, +/- win data points. The peak is further isolated by finding the minima and maxima of a second derivative within this window, and the gaussian is fit to the isolated peak.
Parameters: - t (array-like) – Independent variable (usually time).
- sig (array-like) – Dependent signal, with distinctive ‘on’ and ‘off’ regions.
- gwin (int) – The window used for calculating first derivative. Defaults to 7.
- swin (int) – The window used for signal smoothing. If None,
gwin // 2
. - win (int) – The width (c +/- win) of the transition data subsets. Defaults to 20.
- and off_mult (on_mult) – Control the width of the excluded transition regions, which is defined relative to the peak full-width-half-maximum (FWHM) of the transition gradient. The region n * FHWM below the transition, and m * FWHM above the tranision will be excluded, where (n, m) are specified in on_mult and off_mult. on_mult and off_mult apply to the off-on and on-off transitions, respectively. Defaults to (1.5, 1) and (1, 1.5).
- transform (str) – How to transform the data. Default is ‘log’.
Returns: fbkg, fsig, ftrn, failed – where fbkg, fsig and ftrn are boolean arrays the same length as sig, that are True where sig is background, signal and transition, respecively. failed contains a list of transition positions where gaussian fitting has failed.
Return type:
-
latools.processes.signal_id.
autorange_components
(t, sig, transform='log', gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), thresh=None)[source]¶ Returns the components underlying the autorange algorithm.
Returns: - t (array-like) – Time axis (independent variable)
- sig (array-like) – Raw signal (dependent variable)
- sigs (array-like) – Smoothed signal (swin)
- tsig (array-like) – Transformed raw signal (transform)
- tsigs (array-like) – Transformed smoothed signal (transform, swin)
- kde_x (array-like) – kernel density estimate of smoothed signal.
- yd (array-like) – bins of kernel density estimator.
- g (array-like) – gradient of smoothed signal (swin, gwin)
- trans (dict) – per-transition data.
- thresh (float) – threshold identified from kernel density plot
Helper functions used by multiple parts of LAtools.
-
class
latools.helpers.helpers.
Bunch
(*args, **kwds)[source]¶ Bases:
dict
-
clear
() → None. Remove all items from D.¶
-
copy
() → a shallow copy of D¶
-
fromkeys
()¶ Returns a new dict with keys from iterable and values equal to value.
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised
-
popitem
() → (k, v), remove and return some (key, value) pair as a¶ 2-tuple; but raise KeyError if D is empty.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D's values¶
-
-
latools.helpers.helpers.
analyte_2_massname
(s)[source]¶ Converts analytes in format ‘Al27’ to ‘27Al’.
Parameters: s (str) – of format [0-9]{1,3}[A-z]{1,3} Returns: Name in format [A-z]{1,3}[0-9]{1,3} Return type: str
-
latools.helpers.helpers.
analyte_2_namemass
(s)[source]¶ Converts analytes in format ‘27Al’ to ‘Al27’.
Parameters: s (str) – of format [A-z]{1,3}[0-9]{1,3} Returns: Name in format [0-9]{1,3}[A-z]{1,3} Return type: str
-
latools.helpers.helpers.
bool_2_indices
(a)[source]¶ Convert boolean array into a 2D array of (start, stop) pairs.
-
latools.helpers.helpers.
calc_grads
(x, dat, keys=None, win=5)[source]¶ Calculate gradients of values in dat.
Parameters: Returns: Return type: dict of gradients
-
latools.helpers.helpers.
collate_data
(in_dir, extension='.csv', out_dir=None)[source]¶ Copy all csvs in nested directroy to single directory.
Function to copy all csvs from a directory, and place them in a new directory.
Parameters: Returns: Return type:
-
latools.helpers.helpers.
enumerate_bool
(bool_array, nstart=0)[source]¶ Consecutively numbers contiguous booleans in array.
i.e. a boolean sequence, and resulting numbering T F T T T F T F F F T T F 0-1 1 1 - 2 —3 3 -
where ‘ - ‘
Parameters: - bool_array (array_like) – Array of booleans.
- nstart (int) – The number of the first boolean group.
-
latools.helpers.helpers.
fastgrad
(a, win=11)[source]¶ Returns rolling - window gradient of a.
Function to efficiently calculate the rolling gradient of a numpy array using ‘stride_tricks’ to split up a 1D array into an ndarray of sub - sections of the original array, of dimensions [len(a) - win, win].
Parameters: - a (array_like) – The 1D array to calculate the rolling gradient of.
- win (int) – The width of the rolling window.
Returns: Gradient of a, assuming as constant integer x - scale.
Return type: array_like
-
latools.helpers.helpers.
fastsmooth
(a, win=11)[source]¶ Returns rolling - window smooth of a.
Function to efficiently calculate the rolling mean of a numpy array using ‘stride_tricks’ to split up a 1D array into an ndarray of sub - sections of the original array, of dimensions [len(a) - win, win].
Parameters: - a (array_like) – The 1D array to calculate the rolling gradient of.
- win (int) – The width of the rolling window.
Returns: Gradient of a, assuming as constant integer x - scale.
Return type: array_like
-
latools.helpers.helpers.
findmins
(x, y)[source]¶ Function to find local minima.
Parameters: y (x,) – 1D arrays of the independent (x) and dependent (y) variables. Returns: Array of points in x where y has a local minimum. Return type: array_like
-
latools.helpers.helpers.
get_date
(datetime, time_format=None)[source]¶ Return a datetime oject from a string, with optional time format.
Parameters: - datetime (str) – Date-time as string in any sensible format.
- time_format (datetime str (optional)) – String describing the datetime format. If missing uses dateutil.parser to guess time format.
-
latools.helpers.helpers.
get_total_n_points
(d)[source]¶ Returns the total number of data points in values of dict.
d : dict
-
latools.helpers.helpers.
pretty_element
(s)[source]¶ Returns formatted element name.
Parameters: s (str) – of format [A-Z][a-z]?[0-9]+ Returns: LaTeX formatted string with superscript numbers. Return type: str
-
latools.helpers.helpers.
rolling_window
(a, window, pad=None)[source]¶ Returns (win, len(a)) rolling - window array of data.
Parameters: - a (array_like) – Array to calculate the rolling window of
- window (int) – Description of window.
- pad (same as dtype(a)) – Description of pad.
Returns: An array of shape (n, window), where n is either len(a) - window if pad is None, or len(a) if pad is not None.
Return type: array_like
-
latools.helpers.helpers.
stack_keys
(ddict, keys, extra=None)[source]¶ Combine elements of ddict into an array of shape (len(ddict[key]), len(keys)).
Useful for preparing data for sklearn.
Parameters: - ddict (dict) – A dict containing arrays or lists to be stacked. Must be of equal length.
- keys (list or str) – The keys of dict to stack. Must be present in ddict.
- extra (list (optional)) – A list of additional arrays to stack. Elements of extra must be the same length as arrays in ddict. Extras are inserted as the first columns of output.
-
latools.helpers.helpers.
tuples_2_bool
(tuples, x)[source]¶ Generate boolean array from list of limit tuples.
Parameters: - tuples (array_like) – [2, n] array of (start, end) values
- x (array_like) – x scale the tuples are mapped to
Returns: boolean array, True where x is between each pair of tuples.
Return type: array_like
-
class
latools.helpers.helpers.
un_interp1d
(x, y, **kwargs)[source]¶ Bases:
object
object for handling interpolation of values with uncertainties.
-
latools.helpers.helpers.
unitpicker
(a, llim=0.1, denominator=None, focus_stage=None)[source]¶ Determines the most appropriate plotting unit for data.
Parameters: Returns: (multiplier, unit)
Return type:
-
latools.helpers.chemistry.
calc_M
(molecule)[source]¶ Returns molecular weight of molecule.
Where molecule is in standard chemical notation, e.g. ‘CO2’, ‘HCO3’ or B(OH)4
Returns: molecular_weight Return type: float
-
latools.helpers.chemistry.
elements
(all_isotopes=True)[source]¶ Loads a DataFrame of all elements and isotopes.
Scraped from https://www.webelements.com/
Returns: Return type: pandas DataFrame with columns (element, atomic_number, isotope, atomic_weight, percent)
-
latools.helpers.chemistry.
to_mass_fraction
(molar_ratio, massfrac_denominator, numerator_mass, denominator_mass)[source]¶ Converts per-mass concentrations to molar elemental ratios.
Be careful with units.
Parameters: Returns: float or array-like
Return type: The mass fraction of the numerator element.
-
latools.helpers.chemistry.
to_molar_ratio
(massfrac_numerator, massfrac_denominator, numerator_mass, denominator_mass)[source]¶ Converts per-mass concentrations to molar elemental ratios.
Be careful with units.
Parameters: - denominator_mass (numerator_mass,) – The atomic mass of the numerator and denominator.
- massfrac_denominator (massfrac_numerator,) – The per-mass fraction of the numnerator and denominator.
Returns: float or array-like
Return type: The molar ratio of elements in the material
-
latools.helpers.stat_fns.
H15_se
(x)[source]¶ Calculate the Huber (H15) Robust standard deviation of x.
-
latools.helpers.stat_fns.
H15_std
(x)[source]¶ Calculate the Huber (H15) Robust standard deviation of x.
-
latools.helpers.stat_fns.
gauss
(x, *p)[source]¶ Gaussian function.
Parameters: - x (array_like) – Independent variable.
- *p (parameters unpacked to A, mu, sigma) – A = amplitude, mu = centre, sigma = width
Returns: gaussian descriped by *p.
Return type: array_like
-
latools.helpers.stat_fns.
gauss_weighted_stats
(x, yarray, x_new, fwhm)[source]¶ Calculate gaussian weigted moving mean, SD and SE.
Parameters: - x (array-like) – The independent variable
- yarray ((n,m) array) – Where n = x.size, and m is the number of dependent variables to smooth.
- x_new (array-like) – The new x-scale to interpolate the data
- fwhm (int) – FWHM of the gaussian kernel.
Returns: (mean, std, se)
Return type:
-
latools.helpers.stat_fns.
unpack_uncertainties
(uarray)[source]¶ Convenience function to unpack nominal values and uncertainties from an
uncertainties.uarray
.Returns: (nominal_values, std_devs)
-
latools.helpers.plot.
autorange_plot
(t, sig, gwin=7, swin=None, win=30, on_mult=(1.5, 1.0), off_mult=(1.0, 1.5), nbin=10, thresh=None)[source]¶ Function for visualising the autorange mechanism.
Parameters: - t (array-like) – Independent variable (usually time).
- sig (array-like) – Dependent signal, with distinctive ‘on’ and ‘off’ regions.
- gwin (int) – The window used for calculating first derivative. Defaults to 7.
- swin (int) – The window ised for signal smoothing. If None, gwin // 2.
- win (int) – The width (c +/- win) of the transition data subsets. Defaults to 20.
- and off_mult (on_mult) – Control the width of the excluded transition regions, which is defined relative to the peak full-width-half-maximum (FWHM) of the transition gradient. The region n * FHWM below the transition, and m * FWHM above the tranision will be excluded, where (n, m) are specified in on_mult and off_mult. on_mult and off_mult apply to the off-on and on-off transitions, respectively. Defaults to (1.5, 1) and (1, 1.5).
- nbin (ind) – Used to calculate the number of bins in the data histogram. bins = len(sig) // nbin
Returns: Return type: fig, axes
-
latools.helpers.plot.
calibration_drift_plot
(self, analytes=None, ncol=3, save=True)[source]¶ Plot calibration slopes through the entire session.
Parameters: - self (latools.analyse) – Analyse object, containing
- analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- ncol (int) – Number of columns of plots
- save (bool) – Whether or not to save the plot.
Returns: Return type: (fig, axes)
-
latools.helpers.plot.
calibration_plot
(self, analytes=None, datarange=True, loglog=False, ncol=3, save=True)[source]¶ Plot the calibration lines between measured and known SRM values.
Parameters: - analytes (optional, array_like or str) – The analyte(s) to plot. Defaults to all analytes.
- datarange (boolean) – Whether or not to show the distribution of the measured data alongside the calibration curve.
- loglog (boolean) – Whether or not to plot the data on a log - log scale. This is useful if you have two low standards very close together, and want to check whether your data are between them, or below them.
Returns: Return type: (fig, axes)
-
latools.helpers.plot.
crossplot
(dat, keys=None, lognorm=True, bins=25, figsize=(12, 12), colourful=True, focus_stage=None, denominator=None, mode='hist2d', cmap=None, **kwargs)[source]¶ Plot analytes against each other.
The number of plots is n**2 - n, where n = len(keys).
Parameters: Returns: Return type: (fig, axes)
-
latools.helpers.plot.
filter_report
(Data, filt=None, analytes=None, savedir=None, nbin=5)[source]¶ Visualise effect of data filters.
Parameters: Returns: Return type: (fig, axes)
-
latools.helpers.plot.
gplot
(self, analytes=None, win=25, figsize=[10, 4], ranges=False, focus_stage=None, ax=None, recalc=True)[source]¶ Plot analytes gradients as a function of Time.
Parameters: Returns: Return type: figure, axis
-
latools.helpers.plot.
histograms
(dat, keys=None, bins=25, logy=False, cmap=None, ncol=4)[source]¶ Plot histograms of all items in dat.
Parameters: - dat (dict) – Data in {key: array} pairs.
- keys (arra-like) – The keys in dat that you want to plot. If None, all are plotted.
- bins (int) – The number of bins in each histogram (default = 25)
- logy (bool) – If true, y axis is a log scale.
- cmap (dict) – The colours that the different items should be. If None, all are grey.
Returns: Return type: fig, axes
-
latools.helpers.plot.
tplot
(self, analytes=None, figsize=[10, 4], scale='log', filt=None, ranges=False, stats=False, stat='nanmean', err='nanstd', focus_stage=None, err_envelope=False, ax=None)[source]¶ Plot analytes as a function of Time.
Parameters: - analytes (array_like) – list of strings containing names of analytes to plot. None = all analytes.
- figsize (tuple) – size of final figure.
- scale (str or None) – ‘log’ = plot data on log scale
- filt (bool, str or dict) – False: plot unfiltered data. True: plot filtered data over unfiltered data. str: apply filter key to all analytes dict: apply key to each analyte in dict. Must contain all analytes plotted. Can use self.filt.keydict.
- ranges (bool) – show signal/background regions.
- stats (bool) – plot average and error of each trace, as specified by stat and err.
- stat (str) – average statistic to plot.
- err (str) – error statistic to plot.
Returns: Return type: figure, axis