cellblender package
Submodules
cellblender.core module
- cellblender.core.make_blender_referennce_database(reference_database_file='DMSO_abundances.txt')
Loads and processes a reference abundance file to create a peptide-cell line mapping database.
The function reads a tab-delimited file containing peptide abundance values across various samples, reshapes the data, extracts cell line identifiers, removes missing values, and returns a cleaned DataFrame of unique (CellLine, peptide) pairs.
Parameters:
- reference_database_filestr, optional
Path to the input reference database file (default is DRUG_MAP_ABUNDANCE_FILE). The file should be a tab-separated values (TSV) file with a ‘peptide’ column and sample-specific columns (e.g., ‘CellLine-Replicate’).
Returns:
: pandas.DataFrame
A cleaned DataFrame with columns: - ‘CellLine’: Identifier extracted from the sample ID (prefix before ‘-‘) - ‘peptide’: Unique peptide sequence
- cellblender.core.make_cumulative_count(Cys_df)
Computes the cumulative number of unique peptides identified across cell lines, ordered to maximize incremental peptide discovery.
Each unique (CellLine, peptide) pair in the input is treated as an identified peptide. The function pivots the input into a binary matrix, orders rows using a greedy strategy (via order_rows_by_features), and computes the cumulative count of unique peptides row by row.
Parameters:
- Cys_dfpandas.DataFrame
A long-format DataFrame containing at least two columns: - ‘CellLine’: identifiers for each cell line/sample. - ‘peptide’: peptide sequences associated with the sample.
Returns:
: pandas.DataFrame
A pivoted DataFrame with: - Cell lines as rows - Peptides as columns (binary presence/absence values) - An additional column ‘cum_count’ indicating the cumulative number of unique peptides
identified up to and including that cell line.
- cellblender.core.make_input_from_msfrager_results(path_to_msfrager_results, cell_name='new')
Generates a formatted DataFrame suitable for the Blender pipeline using output from the MSFragger pipeline.
This function reads a tab-separated MSFragger result file, filters for cysteine-containing peptides, reshapes the intensity data, and assigns the specified cell name to all entries.
Parameters:
- path_to_msfrager_resultsstr
Path to the folder containing the MSFragger results. The expected file (defined by the constant MS_FRAGGER_INPUT_FILE) must be present in this directory.
- cell_namestr, optional (default=’new’)
Name of the cell line associated with the experiment. This value will be assigned to all rows in the ‘CellLine’ column of the output.
Returns:
: pandas.DataFrame
A DataFrame with the following columns: - ‘peptide’: Peptide sequences containing cysteine residues - ‘CellLine’: Cell line name provided as input - ‘value’: Corresponding intensity values
- cellblender.core.make_line_plot(df_pivot, xLabel='Cell line', title='', where_to_save='linePlot.svg')
Generates and saves a line plot from a DataFrame containing cumulative peptide counts.
Parameters:
- df_pivotpandas.DataFrame
A DataFrame that must contain a ‘cum_count’ column representing cumulative counts (e.g., number of peptides or modifications across samples or time points).
- xLabelstr, optional (default=’Cell line’)
Label for the x-axis of the plot.
- titlestr, optional (default=’’)
Title of the plot.
- where_to_savestr, optional (default=’linePlot.svg’)
File path to save the generated plot (format inferred from file extension).
Returns:
: None
The plot is saved to disk; nothing is returned.
- cellblender.core.make_venn_diagram2(list1, list2, labels, where_to_save='venn2.svg')
Generates and saves a 2-set Venn diagram from two input lists.
- Return type:
None
Parameters:
- list1list
The first list of elements to be compared in the Venn diagram.
- list2list
The second list of elements to be compared in the Venn diagram.
- labelstuple
A tuple of two strings specifying the labels for each set (e.g., (‘Group A’, ‘Group B’)).
- where_to_savestr, optional (default=’venn2.svg’)
The file path where the Venn diagram image will be saved. The format is inferred from the file extension.
Returns:
: None
The Venn diagram is saved to disk; nothing is returned.
- cellblender.core.make_venn_diagram3(list1, list2, list3, labels, where_to_save='venn3.svg')
Generates and saves a 3-set Venn diagram from three input lists.
- Return type:
None
Parameters:
- list1list
The first list of elements to be included in the Venn diagram.
- list2list
The second list of elements to be included in the Venn diagram.
- list3list
The third list of elements to be included in the Venn diagram.
- labelstuple
A tuple of three strings specifying the labels for each set (e.g., (‘Set A’, ‘Set B’, ‘Set C’)).
- where_to_savestr, optional (default=’venn3.svg’)
The file path where the Venn diagram image will be saved. The format is inferred from the file extension.
Returns:
: None
The Venn diagram is saved to disk; nothing is returned.
- cellblender.core.order_rows_by_features(matrix)
Orders the rows of a binary (0/1) NumPy matrix to maximize incremental feature coverage.
The function begins by selecting the row with the highest number of features (i.e., most 1s). It then iteratively selects the next row that contributes the most new features not yet seen in the previously selected rows.
Parameters:
- matrixarray-like or numpy.ndarray
A 2D binary matrix (rows: samples, columns: features) where 1 indicates the presence of a feature and 0 its absence.
Returns:
: list of int
A list of row indices representing the order in which rows should be selected to maximize new feature discovery step-by-step.