SST2.analysis package
Submodules
SST2.analysis.data_plot module
- SST2.analysis.data_plot.compare_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='sim', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', max_data=50000)[source]
- SST2.analysis.data_plot.compute_Tm(temperatures, folding_fraction)[source]
Compute the melting temperature using a sigmoidal curve fit.
- Parameters:
- temperatureslist
List of temperatures.
- folding_fractionlist
List of folding fraction.
- Returns:
- Tmfloat
Melting temperature.
- SST2.analysis.data_plot.compute_cluster_hdbscan(pca_df, min_cluster_size=50, min_samples=50)[source]
Cluster the data using the HDBSCAN algorithm.
- Parameters:
- pca_dfpandas.DataFrame
Dataframe with the data to cluster.
- min_cluster_sizeint, optional
Minimum cluster size. The default is 50.
- min_samplesint, optional
Minimum number of samples. The default is 50.
- Returns:
- clust_seriepandas.Categorical
Categorical serie with the cluster.
- SST2.analysis.data_plot.compute_cluster_kmean(pca_df, max_cluster=20, random_state=0)[source]
Cluster the data using the KMeans algorithm.
- Parameters:
- pca_dfpandas.DataFrame
Dataframe with the data to cluster.
- max_clusterint, optional
Maximum number of cluster to test. The default is 20.
- random_stateint, optional
Random state for the algorithm. The default is 0.
- Returns:
- clust_seriepandas.Categorical
Categorical serie with the cluster.
- kmeans.cluster_centers_numpy.ndarray
Cluster centers.
- SST2.analysis.data_plot.compute_exchange_prob(df, temp_col='Aim Temp (K)', time_ax_name='$Time\\;(\\mu s)$', exchange_time=2)[source]
Compute the exchange probability and the round trip time for a given dataframe. The dataframe should be the result of a SST2 simulation. The dataframe should have a column with the temperature and a column with the time.
- Parameters:
- dfpandas.DataFrame
Dataframe with the SST2 simulation data.
- temp_colstr, optional
Name of the column with the temperature. The default is “Aim Temp (K)”.
- time_ax_namestr, optional
Name of the column with the time. The default is r”$Time;(mu s)$”.
- exchange_timefloat, optional
Time in ps between two exchange. The default is 2 ps.
- Returns:
- ex_probfloat
Exchange probability.
- trip_timefloat
Round trip time in ns.
- SST2.analysis.data_plot.compute_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, temp_col='Aim Temp (K)', temp_list=None)[source]
Compute the fraction of folded protein.
- Parameters:
- dfpandas.DataFrame
Dataframe with the data to plot.
- colstr, optional
Column to compute folding fraction. The default is “RMSD (nm)”.
- cutofffloat, optional
Cutoff value for the folding fraction. The default is 0.18 nm.
- temp_colstr, optional
Column with the temperature. The default is ‘Aim Temp (K)’.
- temp_listlist, optional
List of temperature to use. The default is None.
- Returns:
- fold_fraclist
List of folding fraction.
- SST2.analysis.data_plot.compute_folding_fraction_RMSD(df, col='RMSD (nm)', temp_col='Aim Temp (K)', cutoff=0.18, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, time_interval=2.0)[source]
- SST2.analysis.data_plot.count_clust_transition(df, dt=None, sim_name_col='sim', clust_col='clust', time_ax_name='$Time\\;(\\mu s)$')[source]
- SST2.analysis.data_plot.count_rmsd_transition(df, rmsd_fold=0.2, rmsd_unfold=0.4, dt=None, sim_name_col='sim', rmsd_col='RMSD (nm)', time_ax_name='$Time\\;(\\mu s)$')[source]
- SST2.analysis.data_plot.filter_df(df, max_point_number)[source]
Filter a dataframe to keep a maximum number of data point. The dataframe is filtered with a step size computed to keep the maximum number of data point.
- Parameters:
- dfpandas.DataFrame
Dataframe to filter.
- max_point_numberint
Maximum number of data point to keep.
- Returns:
- local_dfpandas.DataFrame
Filtered dataframe.
- SST2.analysis.data_plot.get_quant_min_max(pd_serie, quant=0.001)[source]
Get the min and max value of a pandas serie from a quantile.
- Parameters:
- pd_seriepandas.Series
Pandas serie to analyze.
- quantfloat, optional
Quantile to use. The default is 0.001.
- Returns:
- val_minfloat
Minimum value.
- val_maxfloat
Maximum value.
- SST2.analysis.data_plot.plot_distri_norm(df, x, hue, x_label=None, max_data=50000, bins=100, element='step', quant=None, bw_adjust=None)[source]
Plot a distribution plot with a gaussian filter on the y axis.
- Parameters:
- dfpandas.DataFrame
Dataframe with the data to plot.
- xstr
Name of the column with the x axis data.
- huestr
Name of the column with the hue data.
- x_labelstr, optional
Label of the x axis. The default is None.
- max_dataint, optional
Maximum number of data point to plot. The default is 20000.
- binsint, optional
Number of bins. The default is 100.
- elementstr, optional
Element of the plot. The default is “step”.
- quantfloat, optional
Quantile to use to filter the data. The default is None.
- bw_adjustfloat, optional
Bandwidth adjustment for the kernel density estimate. The default is None.
- Returns:
- ax1matplotlib.axes._subplots.AxesSubplot
Axes of the plot.
- SST2.analysis.data_plot.plot_energie_swap_convergence(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', split_graph=False, ci=95, avg_start=None)[source]
- SST2.analysis.data_plot.plot_energie_swap_convergence_diff(df, ener_name='new_pot', lag_num=4, time_ax_name='$Time\\;(\\mu s)$', ylabel='$E_{p}$', hue=None, color=None, label='$T_{m-1}$ update to $T_{m}$', errorbar=('ci', 95), avg_start=None)[source]
- SST2.analysis.data_plot.plot_energie_swap_distri_diff(df, lag_num_list, ener_name='new_pot', time_ax_name='$Time\\;(\\mu s)$', temp_index=1, ylabel='$E_{p}$', hue=None, bins=100, element='step', ci=95, avg_start=0)[source]
- SST2.analysis.data_plot.plot_folding_fraction(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=True, temp_col='Aim Temp (K)', ref_temp=300.0)[source]
Plot the fraction of folded protein as a function of the temperature.
- Parameters:
- dfpandas.DataFrame
Dataframe with the data to plot.
- colstr, optional
Column to compute folding fraction. The default is “RMSD (nm)”.
- cutofffloat, optional
Cutoff value for the folding fraction. The default is 0.18 nm.
- labelstr, optional
Label of the plot. The default is None.
- start_timefloat, optional
Start time of the simulation. The default is 0 us.
- time_ax_namestr, optional
Name of the time axis. The default is r”$Time;(mu s)$”.
- recompute_temp_flagbool, optional
Recompute the temperature. The default is True.
- temp_colstr, optional
Column with the temperature. The default is ‘Aim Temp (K)’.
- ref_tempfloat, optional
Reference temperature. The default is 300.0.
- Returns:
- Tmfloat
Melting temperature
- SST2.analysis.data_plot.plot_folding_fraction_RMSD(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', ref_fold_frac=None, color=None, ls='-', s=20, alpha=1.0, time_interval=2.0)[source]
- SST2.analysis.data_plot.plot_folding_fraction_convergence(df, col='RMSD (nm)', cutoff=0.18, label=None, start_time=0, time_ax_name='$Time\\;(\\mu s)$', recompute_temp_flag=False, ref_temp=300.0, time_interval=2.0)[source]
- SST2.analysis.data_plot.plot_free_energy(xall, yall, weights=None, ax=None, nbins=100, ncontours=100, avoid_zero_count=False, minener_zero=True, kT=2.479, vmin=None, vmax=None, cmap='nipy_spectral', cbar=True, cbar_label='free energy (kJ/mol)', cax=None, levels=None, cbar_orientation='vertical', norm=None, range=None, level_gap=None)[source]
Plot the free energy of a 2D histogram.
Adapted from; https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/plots/plots2d.py
- Parameters:
- xallnp.array
Array with the x data.
- yallnp.array
Array with the y data.
- weightsnp.array, optional
Array with the weights. The default is None.
- axmatplotlib.axes._subplots.AxesSubplot, optional
Axes of the plot. The default is None.
- nbinsint, optional
Number of bins. The default is 100.
- ncontoursint, optional
Number of contours. The default is 100.
- avoid_zero_countbool, optional
Avoid zero count. The default is False.
- minener_zerobool, optional
Minimum energy to zero. The default is True.
- kTfloat, optional
kT value. The default is 2.479.
- vminfloat, optional
Minimum value. The default is None.
- vmaxfloat, optional
Maximum value. The default is None.
- cmapstr, optional
Colormap. The default is ‘nipy_spectral’.
- cbarbool, optional
Add colorbar. The default is True.
- cbar_labelstr, optional
Label of the colorbar. The default is ‘free energy (kJ/mol)’.
- caxmatplotlib.axes._subplots.AxesSubplot, optional
Axes of the colorbar. The default is None.
- levelsint, optional
Number of levels. The default is None.
- cbar_orientationstr, optional
Orientation of the colorbar. The default is ‘vertical’.
- normmatplotlib.colors.Normalize, optional
Normalize object. The default is None.
- rangelist, optional
Range of the data. The default is None.
- level_gapfloat, optional
Gap between levels. The default is None.
- Returns:
- figmatplotlib.figure.Figure
Figure of the plot.
- axmatplotlib.axes._subplots.AxesSubplot
Axes of the plot.
- miscdict
Dictionary with the colorbar.
- SST2.analysis.data_plot.plot_lineplot_avg(df, x, y, quant=None, color='black', max_data=50000, avg_win=1000)[source]
Plot a lineplot with a gaussian filter on the y axis.
- Parameters:
- dfpandas.DataFrame
Dataframe with the data to plot.
- xstr
Name of the column with the x axis data.
- ystr
Name of the column with the y axis data.
- quantfloat, optional
Quantile to use to filter the data. The default is None.
- colorstr, optional
Color of the line. The default is “black”.
- max_dataint, optional
Maximum number of data point to plot. The default is 50000.
- avg_winint, optional
Window size of the gaussian filter. The default is 1000.
- Returns:
- gmatplotlib.axes._subplots.AxesSubplot
Axes of the plot.
- SST2.analysis.data_plot.plot_scatter(df, x, y, hue=None, x_label=None, y_label=None, quant=None, s=10, color=None, linewidth=0, label=None, legend='auto', alpha=None, max_data=50000)[source]
Plot a scatter plot.
- Parameters:
- dfpandas.DataFrame
Dataframe with the data to plot.
- xstr
Name of the column with the x axis data.
- ystr
Name of the column with the y axis data.
- huestr, optional
Name of the column with the hue data. The default is None.
- x_labelstr, optional
Label of the x axis. The default is None.
- y_labelstr, optional
Label of the y axis. The default is None.
- quantfloat, optional
Quantile to use to filter the data. The default is None.
- sint, optional
Size of the points. The default is 10.
- colorstr, optional
Color of the points. The default is None.
- linewidthfloat, optional
Width of the points. The default is 0.
- labelstr, optional
Label of the plot. The default is None.
- legendstr, optional
Position of the legend. The default is “auto”.
- alphafloat, optional
Transparency of the points. The default is None.
- max_dataint, optional
Maximum number of data point to plot. The default is 50000.
- Returns:
- gmatplotlib.axes._subplots.AxesSubplot
Axes of the plot.
- SST2.analysis.data_plot.plot_weight_RMSD(df, x='$Time\\;(\\mu s)$', hue='Temp (K)', ener='new_pot', time_ax_name='$Time\\;(\\mu s)$', final_weight_dict=None, max_data=50000, plot_weights=False)[source]
- SST2.analysis.data_plot.read_SST2_data(generic_name, dt=0.004, full_sep=',', save_step_dcd=100000, lambda_T_ref=300.0)[source]
Read the sst2 data from the csv files. The data may be splited in several files if simulation had to restart. The function merge all the files in one dataframe.
- Parameters:
- generic_namestr
Generic name of the csv files (without the .csv extension).
- dtfloat, optional
Time step in ps of the simulation. The default is 0.004 ps.
- full_sepstr, optional
Separator used in the full csv file. The default is “,”.
- save_step_dcdint, optional
Step number used in the dcd file. The default is 100000.
- lambda_T_reffloat, optional
Reference temperature for the lambda. The default is None.
- Returns:
- df_allpandas.DataFrame
Dataframe with all the data.
- SST2.analysis.data_plot.read_ST_data(generic_name, dt=0.004, fields=['Steps', 'Aim Temp (K)', 'E solute scaled (kJ/mole)', 'E solute not scaled (kJ/mole)', 'E solvent (kJ/mole)', 'E solvent-solute (kJ/mole)'], full_sep=',', save_step_dcd=100000, lambda_T_ref=None)[source]
Read the sst2 data from the csv files. The data may be splited in several files if simulation had to restart. The function merge all the files in one dataframe.
- Parameters:
- generic_namestr
Generic name of the csv files (without the .csv extension).
- dtfloat, optional
Time step in ps of the simulation. The default is 0.004 ps.
- fieldslist, optional
List of the fields to read in the csv files. The default is [ “Steps”, “Aim Temp (K)”, “E solute scaled (kJ/mole)”, “E solute not scaled (kJ/mole)”, “E solvent (kJ/mole)”, “E solvent-solute (kJ/mole)”, ].
- full_sepstr, optional
Separator used in the full csv file. The default is “,”.
- save_step_dcdint, optional
Step number used in the dcd file. The default is 100000.
- lambda_T_reffloat, optional
Reference temperature for the lambda. The default is None.
SST2.analysis.trajectory module
Module contents
SST2.analysis module.