SST2.analysis package

Submodules

SST2.analysis.data_plot module

SST2.analysis.data_plot.compare_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='sim', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', max_data=50000)[source]
SST2.analysis.data_plot.compute_Tm(temperatures, folding_fraction)[source]

Compute the melting temperature using a sigmoidal curve fit.

Parameters:
temperatureslist

List of temperatures.

folding_fractionlist

List of folding fraction.

Returns:
Tmfloat

Melting temperature.

SST2.analysis.data_plot.compute_cluster_hdbscan(pca_df, min_cluster_size=50, min_samples=50)[source]

Cluster the data using the HDBSCAN algorithm.

Parameters:
pca_dfpandas.DataFrame

Dataframe with the data to cluster.

min_cluster_sizeint, optional

Minimum cluster size. The default is 50.

min_samplesint, optional

Minimum number of samples. The default is 50.

Returns:
clust_seriepandas.Categorical

Categorical serie with the cluster.

SST2.analysis.data_plot.compute_cluster_kmean(pca_df, max_cluster=20, random_state=0)[source]

Cluster the data using the KMeans algorithm.

Parameters:
pca_dfpandas.DataFrame

Dataframe with the data to cluster.

max_clusterint, optional

Maximum number of cluster to test. The default is 20.

random_stateint, optional

Random state for the algorithm. The default is 0.

Returns:
clust_seriepandas.Categorical

Categorical serie with the cluster.

kmeans.cluster_centers_numpy.ndarray

Cluster centers.

SST2.analysis.data_plot.compute_exchange_prob(df, temp_col='Aim Temp (K)', time_ax_name='$Time\\;(\\mu s)$', exchange_time=2)[source]

Compute the exchange probability and the round trip time for a given dataframe. The dataframe should be the result of a SST2 simulation. The dataframe should have a column with the temperature and a column with the time.

Parameters:
dfpandas.DataFrame

Dataframe with the SST2 simulation data.

temp_colstr, optional

Name of the column with the temperature. The default is “Aim Temp (K)”.

time_ax_namestr, optional

Name of the column with the time. The default is r”$Time;(mu s)$”.

exchange_timefloat, optional

Time in ps between two exchange. The default is 2 ps.

Returns:
ex_probfloat

Exchange probability.

trip_timefloat

Round trip time in ns.

SST2.analysis.data_plot.compute_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, temp_col='Aim Temp (K)', temp_list=None)[source]

Compute the fraction of folded protein.

Parameters:
dfpandas.DataFrame

Dataframe with the data to plot.

colstr, optional

Column to compute folding fraction. The default is “RMSD (nm)”.

cutofffloat, optional

Cutoff value for the folding fraction. The default is 0.18 nm.

temp_colstr, optional

Column with the temperature. The default is ‘Aim Temp (K)’.

temp_listlist, optional

List of temperature to use. The default is None.

Returns:
fold_fraclist

List of folding fraction.

SST2.analysis.data_plot.compute_folding_fraction_RMSD(df, col='RMSD (nm)', temp_col='Aim Temp (K)', cutoff=0.18, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, time_interval=2.0)[source]
SST2.analysis.data_plot.compute_moving_average(df, ener='new_pot', col_name='avg_ener')[source]
SST2.analysis.data_plot.compute_weight_RMSD(df, final_weight_dict=None, ener='new_pot')[source]
SST2.analysis.data_plot.count_clust_transition(df, dt=None, sim_name_col='sim', clust_col='clust', time_ax_name='$Time\\;(\\mu s)$')[source]
SST2.analysis.data_plot.count_rmsd_transition(df, rmsd_fold=0.2, rmsd_unfold=0.4, dt=None, sim_name_col='sim', rmsd_col='RMSD (nm)', time_ax_name='$Time\\;(\\mu s)$')[source]
SST2.analysis.data_plot.filter_df(df, max_point_number)[source]

Filter a dataframe to keep a maximum number of data point. The dataframe is filtered with a step size computed to keep the maximum number of data point.

Parameters:
dfpandas.DataFrame

Dataframe to filter.

max_point_numberint

Maximum number of data point to keep.

Returns:
local_dfpandas.DataFrame

Filtered dataframe.

SST2.analysis.data_plot.get_quant_min_max(pd_serie, quant=0.001)[source]

Get the min and max value of a pandas serie from a quantile.

Parameters:
pd_seriepandas.Series

Pandas serie to analyze.

quantfloat, optional

Quantile to use. The default is 0.001.

Returns:
val_minfloat

Minimum value.

val_maxfloat

Maximum value.

SST2.analysis.data_plot.plot_distri_norm(df, x, hue, x_label=None, max_data=50000, bins=100, element='step', quant=None, bw_adjust=None)[source]

Plot a distribution plot with a gaussian filter on the y axis.

Parameters:
dfpandas.DataFrame

Dataframe with the data to plot.

xstr

Name of the column with the x axis data.

huestr

Name of the column with the hue data.

x_labelstr, optional

Label of the x axis. The default is None.

max_dataint, optional

Maximum number of data point to plot. The default is 20000.

binsint, optional

Number of bins. The default is 100.

elementstr, optional

Element of the plot. The default is “step”.

quantfloat, optional

Quantile to use to filter the data. The default is None.

bw_adjustfloat, optional

Bandwidth adjustment for the kernel density estimate. The default is None.

Returns:
ax1matplotlib.axes._subplots.AxesSubplot

Axes of the plot.

SST2.analysis.data_plot.plot_energie_swap_convergence(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', split_graph=False, ci=95, avg_start=None)[source]
SST2.analysis.data_plot.plot_energie_swap_convergence_diff(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', hue=None, color=None, label='$T_{m-1}$ update to $T_{m}$', errorbar=('ci', 95), avg_start=None)[source]
SST2.analysis.data_plot.plot_energie_swap_distri_diff(df, lag_num_list, ener_name='new_pot', time_ax_name='$Time\\;(\\mu s)$', temp_index=1, ylabel='$E_{p}$', hue=None, bins=100, element='step', ci=95, avg_start=0)[source]
SST2.analysis.data_plot.plot_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=True, temp_col='Aim Temp (K)', ref_temp=300.0)[source]

Plot the fraction of folded protein as a function of the temperature.

Parameters:
dfpandas.DataFrame

Dataframe with the data to plot.

colstr, optional

Column to compute folding fraction. The default is “RMSD (nm)”.

cutofffloat, optional

Cutoff value for the folding fraction. The default is 0.18 nm.

labelstr, optional

Label of the plot. The default is None.

start_timefloat, optional

Start time of the simulation. The default is 0 us.

time_ax_namestr, optional

Name of the time axis. The default is r”$Time;(mu s)$”.

recompute_temp_flagbool, optional

Recompute the temperature. The default is True.

temp_colstr, optional

Column with the temperature. The default is ‘Aim Temp (K)’.

ref_tempfloat, optional

Reference temperature. The default is 300.0.

Returns:
Tmfloat

Melting temperature

SST2.analysis.data_plot.plot_folding_fraction_RMSD(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, color=None, ls='-', s=20, alpha=1.0, time_interval=2.0)[source]
SST2.analysis.data_plot.plot_folding_fraction_convergence(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=False, ref_temp=300.0, time_interval=2.0)[source]
SST2.analysis.data_plot.plot_free_energy(xall, yall, weights=None, ax=None, nbins=100, ncontours=100, avoid_zero_count=False, minener_zero=True, kT=2.479, vmin=None, vmax=None, cmap='nipy_spectral', cbar=True, cbar_label='free energy (kJ/mol)', cax=None, levels=None, cbar_orientation='vertical', norm=None, range=None, level_gap=None)[source]

Plot the free energy of a 2D histogram.

Adapted from; https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/plots/plots2d.py

Parameters:
xallnp.array

Array with the x data.

yallnp.array

Array with the y data.

weightsnp.array, optional

Array with the weights. The default is None.

axmatplotlib.axes._subplots.AxesSubplot, optional

Axes of the plot. The default is None.

nbinsint, optional

Number of bins. The default is 100.

ncontoursint, optional

Number of contours. The default is 100.

avoid_zero_countbool, optional

Avoid zero count. The default is False.

minener_zerobool, optional

Minimum energy to zero. The default is True.

kTfloat, optional

kT value. The default is 2.479.

vminfloat, optional

Minimum value. The default is None.

vmaxfloat, optional

Maximum value. The default is None.

cmapstr, optional

Colormap. The default is ‘nipy_spectral’.

cbarbool, optional

Add colorbar. The default is True.

cbar_labelstr, optional

Label of the colorbar. The default is ‘free energy (kJ/mol)’.

caxmatplotlib.axes._subplots.AxesSubplot, optional

Axes of the colorbar. The default is None.

levelsint, optional

Number of levels. The default is None.

cbar_orientationstr, optional

Orientation of the colorbar. The default is ‘vertical’.

normmatplotlib.colors.Normalize, optional

Normalize object. The default is None.

rangelist, optional

Range of the data. The default is None.

level_gapfloat, optional

Gap between levels. The default is None.

Returns:
figmatplotlib.figure.Figure

Figure of the plot.

axmatplotlib.axes._subplots.AxesSubplot

Axes of the plot.

miscdict

Dictionary with the colorbar.

SST2.analysis.data_plot.plot_lineplot_avg(df, x, y, quant=None, color='black', max_data=50000, avg_win=1000)[source]

Plot a lineplot with a gaussian filter on the y axis.

Parameters:
dfpandas.DataFrame

Dataframe with the data to plot.

xstr

Name of the column with the x axis data.

ystr

Name of the column with the y axis data.

quantfloat, optional

Quantile to use to filter the data. The default is None.

colorstr, optional

Color of the line. The default is “black”.

max_dataint, optional

Maximum number of data point to plot. The default is 50000.

avg_winint, optional

Window size of the gaussian filter. The default is 1000.

Returns:
gmatplotlib.axes._subplots.AxesSubplot

Axes of the plot.

SST2.analysis.data_plot.plot_rung_occupancy(df, hue='group')[source]
SST2.analysis.data_plot.plot_scatter(df, x, y, hue=None, x_label=None, y_label=None, quant=None, s=10, color=None, linewidth=0, label=None, legend='auto', alpha=None, max_data=50000)[source]

Plot a scatter plot.

Parameters:
dfpandas.DataFrame

Dataframe with the data to plot.

xstr

Name of the column with the x axis data.

ystr

Name of the column with the y axis data.

huestr, optional

Name of the column with the hue data. The default is None.

x_labelstr, optional

Label of the x axis. The default is None.

y_labelstr, optional

Label of the y axis. The default is None.

quantfloat, optional

Quantile to use to filter the data. The default is None.

sint, optional

Size of the points. The default is 10.

colorstr, optional

Color of the points. The default is None.

linewidthfloat, optional

Width of the points. The default is 0.

labelstr, optional

Label of the plot. The default is None.

legendstr, optional

Position of the legend. The default is “auto”.

alphafloat, optional

Transparency of the points. The default is None.

max_dataint, optional

Maximum number of data point to plot. The default is 50000.

Returns:
gmatplotlib.axes._subplots.AxesSubplot

Axes of the plot.

SST2.analysis.data_plot.plot_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='Temp (K)', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', final_weight_dict=None, max_data=50000, plot_weights=False)[source]
SST2.analysis.data_plot.read_SST2_data(generic_name, dt=0.004, full_sep=',', save_step_dcd=100000, lambda_T_ref=300.0)[source]

Read the sst2 data from the csv files. The data may be splited in several files if simulation had to restart. The function merge all the files in one dataframe.

Parameters:
generic_namestr

Generic name of the csv files (without the .csv extension).

dtfloat, optional

Time step in ps of the simulation. The default is 0.004 ps.

full_sepstr, optional

Separator used in the full csv file. The default is “,”.

save_step_dcdint, optional

Step number used in the dcd file. The default is 100000.

lambda_T_reffloat, optional

Reference temperature for the lambda. The default is None.

Returns:
df_allpandas.DataFrame

Dataframe with all the data.

SST2.analysis.data_plot.read_ST_data(generic_name, dt=0.004, fields=['Steps', 'Aim Temp (K)', 'E solute scaled (kJ/mole)', 'E solute not scaled (kJ/mole)', 'E solvent (kJ/mole)', 'E solvent-solute (kJ/mole)'], full_sep=',', save_step_dcd=100000, lambda_T_ref=None)[source]

Read the sst2 data from the csv files. The data may be splited in several files if simulation had to restart. The function merge all the files in one dataframe.

Parameters:
generic_namestr

Generic name of the csv files (without the .csv extension).

dtfloat, optional

Time step in ps of the simulation. The default is 0.004 ps.

fieldslist, optional

List of the fields to read in the csv files. The default is [ “Steps”, “Aim Temp (K)”, “E solute scaled (kJ/mole)”, “E solute not scaled (kJ/mole)”, “E solvent (kJ/mole)”, “E solvent-solute (kJ/mole)”, ].

full_sepstr, optional

Separator used in the full csv file. The default is “,”.

save_step_dcdint, optional

Step number used in the dcd file. The default is 100000.

lambda_T_reffloat, optional

Reference temperature for the lambda. The default is None.

SST2.analysis.data_plot.recompute_temp(df, ref_temp=300.0)[source]

SST2.analysis.trajectory module

SST2.analysis.trajectory.align_traj(md, ref, ref_Sel, tol_mass=0.1)[source]
SST2.analysis.trajectory.compute_native_contact(md, ref, sel='protein and not name H*', sel_2=None)[source]
SST2.analysis.trajectory.compute_pca(md, ref, sel='backbone', cum_var=0.8)[source]
SST2.analysis.trajectory.prepare_traj(md, ref_Sel, compound='fragments')[source]
SST2.analysis.trajectory.read_traj(start_pdb, generic_name)[source]

Module contents

SST2.analysis module.