SST2.analysis package

Submodules

SST2.analysis.data_plot module

SST2.analysis.data_plot.compare_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='sim', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', max_data=50000)[source]

SST2.analysis.data_plot.compute_Tm(temperatures, folding_fraction)[source]

Compute the melting temperature using a sigmoidal curve fit.

Parameters:

temperatureslist: List of temperatures.
folding_fractionlist: List of folding fraction.

Returns:

Tmfloat: Melting temperature.

SST2.analysis.data_plot.compute_cluster_hdbscan(pca_df, min_cluster_size=50, min_samples=50)[source]

Cluster the data using the HDBSCAN algorithm.

Parameters:

pca_dfpandas.DataFrame: Dataframe with the data to cluster.
min_cluster_sizeint, optional: Minimum cluster size. The default is 50.
min_samplesint, optional: Minimum number of samples. The default is 50.

Returns:

clust_seriepandas.Categorical: Categorical serie with the cluster.

SST2.analysis.data_plot.compute_cluster_kmean(pca_df, max_cluster=20, random_state=0)[source]

Cluster the data using the KMeans algorithm.

Parameters:

pca_dfpandas.DataFrame: Dataframe with the data to cluster.
max_clusterint, optional: Maximum number of cluster to test. The default is 20.
random_stateint, optional: Random state for the algorithm. The default is 0.

Returns:

clust_seriepandas.Categorical: Categorical serie with the cluster.
kmeans.cluster_centers_numpy.ndarray: Cluster centers.

SST2.analysis.data_plot.compute_exchange_prob(df, temp_col='Aim Temp (K)', time_ax_name='$Time\\;(\\mu s)$', exchange_time=2)[source]

Compute the exchange probability and the round trip time for a given dataframe. The dataframe should be the result of a SST2 simulation. The dataframe should have a column with the temperature and a column with the time.

Parameters:

dfpandas.DataFrame: Dataframe with the SST2 simulation data.
temp_colstr, optional: Name of the column with the temperature. The default is “Aim Temp (K)”.
time_ax_namestr, optional: Name of the column with the time. The default is r”$Time;(mu s)$”.
exchange_timefloat, optional: Time in ps between two exchange. The default is 2 ps.

Returns:

ex_probfloat: Exchange probability.
trip_timefloat: Round trip time in ns.

SST2.analysis.data_plot.compute_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, temp_col='Aim Temp (K)', temp_list=None)[source]

Compute the fraction of folded protein.

Parameters:

dfpandas.DataFrame: Dataframe with the data to plot.
colstr, optional: Column to compute folding fraction. The default is “RMSD (nm)”.
cutofffloat, optional: Cutoff value for the folding fraction. The default is 0.18 nm.
temp_colstr, optional: Column with the temperature. The default is ‘Aim Temp (K)’.
temp_listlist, optional: List of temperature to use. The default is None.

Returns:

fold_fraclist: List of folding fraction.

SST2.analysis.data_plot.compute_folding_fraction_RMSD(df, col='RMSD (nm)', temp_col='Aim Temp (K)', cutoff=0.18, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, time_interval=2.0)[source]

SST2.analysis.data_plot.compute_moving_average(df, ener='new_pot', col_name='avg_ener')[source]

SST2.analysis.data_plot.compute_weight_RMSD(df, final_weight_dict=None, ener='new_pot')[source]

SST2.analysis.data_plot.count_clust_transition(df, dt=None, sim_name_col='sim', clust_col='clust', time_ax_name='$Time\\;(\\mu s)$')[source]

SST2.analysis.data_plot.count_rmsd_transition(df, rmsd_fold=0.2, rmsd_unfold=0.4, dt=None, sim_name_col='sim', rmsd_col='RMSD (nm)', time_ax_name='$Time\\;(\\mu s)$')[source]

SST2.analysis.data_plot.filter_df(df, max_point_number)[source]

Filter a dataframe to keep a maximum number of data point. The dataframe is filtered with a step size computed to keep the maximum number of data point.

Parameters:

dfpandas.DataFrame: Dataframe to filter.
max_point_numberint: Maximum number of data point to keep.

Returns:

local_dfpandas.DataFrame: Filtered dataframe.

SST2.analysis.data_plot.get_quant_min_max(pd_serie, quant=0.001)[source]

Get the min and max value of a pandas serie from a quantile.

Parameters:

pd_seriepandas.Series: Pandas serie to analyze.
quantfloat, optional: Quantile to use. The default is 0.001.

Returns:

val_minfloat: Minimum value.
val_maxfloat: Maximum value.

SST2.analysis.data_plot.plot_distri_norm(df, x, hue, x_label=None, max_data=50000, bins=100, element='step', quant=None, bw_adjust=None)[source]

Plot a distribution plot with a gaussian filter on the y axis.

Parameters:

dfpandas.DataFrame: Dataframe with the data to plot.
xstr: Name of the column with the x axis data.
huestr: Name of the column with the hue data.
x_labelstr, optional: Label of the x axis. The default is None.
max_dataint, optional: Maximum number of data point to plot. The default is 20000.
binsint, optional: Number of bins. The default is 100.
elementstr, optional: Element of the plot. The default is “step”.
quantfloat, optional: Quantile to use to filter the data. The default is None.
bw_adjustfloat, optional: Bandwidth adjustment for the kernel density estimate. The default is None.

Returns:

ax1matplotlib.axes._subplots.AxesSubplot: Axes of the plot.

SST2.analysis.data_plot.plot_energie_swap_convergence(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', split_graph=False, ci=95, avg_start=None)[source]

SST2.analysis.data_plot.plot_energie_swap_convergence_diff(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', hue=None, color=None, label='$T_{m-1}$ update to $T_{m}$', errorbar=('ci', 95), avg_start=None)[source]

SST2.analysis.data_plot.plot_energie_swap_distri_diff(df, lag_num_list, ener_name='new_pot', time_ax_name='$Time\\;(\\mu s)$', temp_index=1, ylabel='$E_{p}$', hue=None, bins=100, element='step', ci=95, avg_start=0)[source]

SST2.analysis.data_plot.plot_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=True, temp_col='Aim Temp (K)', ref_temp=300.0)[source]

Plot the fraction of folded protein as a function of the temperature.

Parameters:

dfpandas.DataFrame: Dataframe with the data to plot.
colstr, optional: Column to compute folding fraction. The default is “RMSD (nm)”.
cutofffloat, optional: Cutoff value for the folding fraction. The default is 0.18 nm.
labelstr, optional: Label of the plot. The default is None.
start_timefloat, optional: Start time of the simulation. The default is 0 us.
time_ax_namestr, optional: Name of the time axis. The default is r”$Time;(mu s)$”.
recompute_temp_flagbool, optional: Recompute the temperature. The default is True.
temp_colstr, optional: Column with the temperature. The default is ‘Aim Temp (K)’.
ref_tempfloat, optional: Reference temperature. The default is 300.0.

Returns:

Tmfloat: Melting temperature

SST2.analysis.data_plot.plot_folding_fraction_RMSD(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, color=None, ls='-', s=20, alpha=1.0, time_interval=2.0)[source]

SST2.analysis.data_plot.plot_folding_fraction_convergence(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=False, ref_temp=300.0, time_interval=2.0)[source]

SST2.analysis.data_plot.plot_free_energy(xall, yall, weights=None, ax=None, nbins=100, ncontours=100, avoid_zero_count=False, minener_zero=True, kT=2.479, vmin=None, vmax=None, cmap='nipy_spectral', cbar=True, cbar_label='free energy (kJ/mol)', cax=None, levels=None, cbar_orientation='vertical', norm=None, range=None, level_gap=None)[source]

Plot the free energy of a 2D histogram.

Adapted from; https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/plots/plots2d.py

Parameters:

xallnp.array: Array with the x data.
yallnp.array: Array with the y data.
weightsnp.array, optional: Array with the weights. The default is None.
axmatplotlib.axes._subplots.AxesSubplot, optional: Axes of the plot. The default is None.
nbinsint, optional: Number of bins. The default is 100.
ncontoursint, optional: Number of contours. The default is 100.
avoid_zero_countbool, optional: Avoid zero count. The default is False.
minener_zerobool, optional: Minimum energy to zero. The default is True.
kTfloat, optional: kT value. The default is 2.479.
vminfloat, optional: Minimum value. The default is None.
vmaxfloat, optional: Maximum value. The default is None.
cmapstr, optional: Colormap. The default is ‘nipy_spectral’.
cbarbool, optional: Add colorbar. The default is True.
cbar_labelstr, optional: Label of the colorbar. The default is ‘free energy (kJ/mol)’.
caxmatplotlib.axes._subplots.AxesSubplot, optional: Axes of the colorbar. The default is None.
levelsint, optional: Number of levels. The default is None.
cbar_orientationstr, optional: Orientation of the colorbar. The default is ‘vertical’.
normmatplotlib.colors.Normalize, optional: Normalize object. The default is None.
rangelist, optional: Range of the data. The default is None.
level_gapfloat, optional: Gap between levels. The default is None.

Returns:

figmatplotlib.figure.Figure: Figure of the plot.
axmatplotlib.axes._subplots.AxesSubplot: Axes of the plot.
miscdict: Dictionary with the colorbar.

SST2.analysis.data_plot.plot_lineplot_avg(df, x, y, quant=None, color='black', max_data=50000, avg_win=1000, alpha=0.3)[source]

Plot a lineplot with a gaussian filter on the y axis.

Parameters:

dfpandas.DataFrame: Dataframe with the data to plot.
xstr: Name of the column with the x axis data.
ystr: Name of the column with the y axis data.
quantfloat, optional: Quantile to use to filter the data. The default is None.
colorstr, optional: Color of the line. The default is “black”.
max_dataint, optional: Maximum number of data point to plot. The default is 50000.
avg_winint, optional: Window size of the gaussian filter. The default is 1000.

Returns:

gmatplotlib.axes._subplots.AxesSubplot: Axes of the plot.

SST2.analysis.data_plot.plot_rung_occupancy(df, hue='group')[source]

SST2.analysis.data_plot.plot_scatter(df, x, y, hue=None, x_label=None, y_label=None, quant=None, s=10, color=None, linewidth=0, label=None, legend='auto', alpha=None, max_data=50000)[source]

Plot a scatter plot.

Parameters:

dfpandas.DataFrame: Dataframe with the data to plot.
xstr: Name of the column with the x axis data.
ystr: Name of the column with the y axis data.
huestr, optional: Name of the column with the hue data. The default is None.
x_labelstr, optional: Label of the x axis. The default is None.
y_labelstr, optional: Label of the y axis. The default is None.
quantfloat, optional: Quantile to use to filter the data. The default is None.
sint, optional: Size of the points. The default is 10.
colorstr, optional: Color of the points. The default is None.
linewidthfloat, optional: Width of the points. The default is 0.
labelstr, optional: Label of the plot. The default is None.
legendstr, optional: Position of the legend. The default is “auto”.
alphafloat, optional: Transparency of the points. The default is None.
max_dataint, optional: Maximum number of data point to plot. The default is 50000.

Returns:

gmatplotlib.axes._subplots.AxesSubplot: Axes of the plot.

SST2.analysis.data_plot.plot_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='Temp (K)', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', final_weight_dict=None, max_data=50000, plot_weights=False)[source]

SST2.analysis.data_plot.read_SST2_data(generic_name, dt=0.004, full_sep=',', save_step_dcd=100000, lambda_T_ref=300.0)[source]

Read the SST2 data from the csv files.

Supports both the new per-fraction format:: Step, Aim Temp (K), E frac 0.25 (kJ/mole), …, E solute not scaled (kJ/mole), …
and the legacy single-column format:: Step, Aim Temp (K), E solute scaled (kJ/mole), …

Parameters:

generic_namestr: Generic name of the csv files (without the .csv extension).
dtfloat, optional: Time step in ps. Default is 0.004 ps.
full_sepstr, optional: Separator used in the full csv file. Default is “,”.
save_step_dcdint, optional: Step number used in the dcd file. Default is 100000.
lambda_T_reffloat, optional: Reference temperature for lambda. Default is 300.0.

Returns:

df_allpandas.DataFrame: Dataframe with all the data.

SST2.analysis.data_plot.read_ST_data(generic_name, dt=0.004, fields=None, full_sep=',', save_step_dcd=100000, lambda_T_ref=None)[source]

Read SST2/ST data from csv files, merging restart parts if present.

Supports both the new per-fraction format and the legacy single-column format. If fields is None, all columns are read.

Parameters:

generic_namestr: Generic name of the csv files (without the .csv extension).
dtfloat, optional: Time step in ps. Default is 0.004 ps.
fieldslist or None, optional: List of columns to read. If None, all columns are read. Default is None.
full_sepstr, optional: Separator used in the full csv file. Default is “,”.
save_step_dcdint, optional: Step number used in the dcd file. Default is 100000.
lambda_T_reffloat or None, optional: Reference temperature for lambda. Default is None.

Returns:

df_allpandas.DataFrame: Dataframe with all the data.

SST2.analysis.data_plot.recompute_temp(df, ref_temp=300.0)[source]

SST2.analysis.trajectory module

SST2.analysis.trajectory.align_traj(md, ref, ref_Sel, tol_mass=0.1)[source]

SST2.analysis.trajectory.compute_native_contact(md, ref, sel='protein and not name H*', sel_2=None)[source]

SST2.analysis.trajectory.compute_pca(md, ref, sel='backbone', cum_var=0.8)[source]

SST2.analysis.trajectory.prepare_traj(md, ref_Sel, compound='fragments')[source]

SST2.analysis.trajectory.read_traj(start_pdb, generic_name)[source]

Module contents

SST2.analysis module.