omicsdata.ssm package

Submodules

omicsdata.ssm.columns module

class omicsdata.ssm.columns.PARAMS_Columns(SAMPLES: str = 'samples', CLUSTERS: str = 'clusters', GARBAGE: str = 'garbage')[source]

Bases: object

Dataclass used to define .params.json column headers

CLUSTERS: str = 'clusters'

GARBAGE: str = 'garbage'

SAMPLES: str = 'samples'

class omicsdata.ssm.columns.SSM_Columns(NAME: str = 'name', ID: str = 'id', VAR_READS: str = 'var_reads', TOTAL_READS: str = 'total_reads', VAR_READ_PROB: str = 'var_read_prob', COL_ORDER: tuple = ('id', 'name', 'var_reads', 'total_reads', 'var_read_prob'))[source]

Bases: object

Dataclass used to define .ssm column headers

COL_ORDER: tuple = ('id', 'name', 'var_reads', 'total_reads', 'var_read_prob')

ID: str = 'id'

NAME: str = 'name'

TOTAL_READS: str = 'total_reads'

VAR_READS: str = 'var_reads'

VAR_READ_PROB: str = 'var_read_prob'

omicsdata.ssm.common module

omicsdata.ssm.common.extract_vids(variants)[source]

Extracts the unique numerical value of all variants and sorts them”

Parameters:: variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
Returns:: list of sorted variant ‘id’ values
Return type:: list

omicsdata.ssm.common.sort_vids(vids)[source]

Extracts the unique numerical value of a variant id (vid). Assumes that all vids match the regular expression r’sd+’.

Parameters:: vids (list) – list of ‘id’ values for variants
Returns:: sorted list of only the numeric values of each ‘id’ from the inputted list of variant ‘id’ values
Return type:: list

omicsdata.ssm.constants module

class omicsdata.ssm.constants.Variants_Keys(NAME: str = 'name', ID: str = 'id', CHROM: str = 'chrom', POS: str = 'pos', OMEGA_V: str = 'omega_v', VAR_READS: str = 'var_reads', REF_READS: str = 'ref_reads', TOTAL_READS: str = 'total_reads', VAF: str = 'vaf')[source]

Bases: object

Dataclass used to define variant column headers

CHROM: str = 'chrom'

ID: str = 'id'

NAME: str = 'name'

OMEGA_V: str = 'omega_v'

POS: str = 'pos'

REF_READS: str = 'ref_reads'

TOTAL_READS: str = 'total_reads'

VAF: str = 'vaf'

VAR_READS: str = 'var_reads'

omicsdata.ssm.convert module

omicsdata.ssm.convert.ssm_to_pyclone(pyclone_fn, ssm_fn, params_fn)[source]

Processes simple somatic mutation (ssm) file to a tab separated file (tsv) that can be used by PyClone-VI (https://github.com/Roth-Lab/pyclone-vi)

Parameters:

pyclone_fn (str) – path to a file to a tsv file to output the convert ssm file data
ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file

Return type:

None

omicsdata.ssm.convert.ssm_to_viber(viber_dir, ssm_fn, params_fn)[source]

Processes simple somatic mutation (ssm) file to a tab separated file (tsv) that can be used by VIBER (https://github.com/caravagnalab/VIBER)

Parameters:

viber_dir (str) – path to a directory to store all VIBER format files
ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file

Return type:

None

omicsdata.ssm.parse module

omicsdata.ssm.parse.extract_nums(S, dtype)[source]: Extracts and converts values from a comma delimited string

omicsdata.ssm.parse.load_params(params_fn)[source]

Loads a params file into a dictionary

Parameters:: params_fn (str) – The parameters file
Returns:: A dictionary where each of the key, value pairs are the same as those listed in the params_fn file
Return type:: dictionary

omicsdata.ssm.parse.load_ssm(ssm_fn, rescale_depth=False)[source]

Loads a ssm file and extracts the read count and copy number data for each variant

Parameters:

ssm_fn (str) – The simple somatic mutation file
rescale_depth (bool, optional) – A flag for whether or not to rescale the read depth for each variant using the average across each sample.

Returns:

A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)

Return type:

dictionary

omicsdata.ssm.parse.load_ssms_and_params(ssm_fn, params_fn, remove_garb=True, rescale_depth=False)[source]

Loads ssm file and params file

Parameters:

ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file
remove_garb (bool, optional) – A flag to remove garbage from the ‘variants’ dictionary (default is True)

omicsdata.ssm.parse.remove_garbage(variants, garbage)[source]

Removes garbage variants from dictionary of variants

Parameters:

variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
garbage (list) – A list of variant ‘id’ values to be removed from the ‘variants’ dictionary

Returns:

A ‘variants’ dictionary (same as what’s returned by load_ssms) with the variants listed in the garbage parameter removed

Return type:

dictionary

omicsdata.ssm.parse.write_params(params, params_fn)[source]

Writes a set of variants along with their associated read count and copy number information to an ssm file

Parameters:

params (dictionary) – A dictionary that contains a set of key/value pairs to write to a file as a JSON string
params_fn (str) – The parameters file (.params.json) to write the params to

Return type:

None

omicsdata.ssm.parse.write_ssms(variants, ssm_fn)[source]

Writes a set of variants along with their associated read count and copy number information to an ssm file

Parameters:

variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
ssm_fn (str) – The simple somatic mutation file

Return type:

None

omicsdata.ssm.supervariants module

class omicsdata.ssm.supervariants.Variant(id, var_reads, ref_reads, total_reads, vaf, omega_v)

Bases: tuple

id: Alias for field number 0

omega_v: Alias for field number 5

ref_reads: Alias for field number 2

total_reads: Alias for field number 3

vaf: Alias for field number 4

var_reads: Alias for field number 1

omicsdata.ssm.supervariants.clusters_to_supervars(clusters, variants, fill_chr_pos=False)[source]

Converts clusters into supervariants

Parameters:

clusters (list) – A list of lists, where each sublist contains the ‘id’ values for the variants that are in that cluster
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
fill_chr_pos (bool) – A flag to fill the chromosome and position fields for each supervariant. This will only work if all variant names match the pattern ‘{chromosome}_{position}’

Returns:

A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant

Return type:

dictionary

omicsdata.ssm.supervariants.convert_all_variants_to_tuples(variants)[source]

Converts a dictionary of variants each of which are represented by a dictionary into a list of tuples

Parameters:: variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
Returns:: A list of namedtuples for each variant in the variants input. Each value in the list is a ‘Variant’ namedtuple with the following keys: ‘id’, ‘var_reads’, ‘ref_reads’, ‘total_reads’,’vaf’,’omega_v’
Return type:: list

omicsdata.ssm.supervariants.convert_variant_dict_to_tuple(variant)[source]

Converts a dictionary of variants into a tuple of variants

Parameters:: variant (dictionary) – A dictionary containing all of the following keys for a particular variant: ‘id’, ‘var_reads’, ‘ref_reads’, ‘total_reads’, ‘vaf’, ‘omega_v’
Returns:: A ‘Variant’ named tuple with all of the same keys as the inputted dictionary
Return type:: namedtuple

omicsdata.ssm.supervariants.make_superclusters(supervars)[source]

Generates a clustering where each supervariant is in its own cluster

Parameters:: supervars (dictionary) – A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant
Returns:: A list of lists where each sublist contains a single supervariant
Return type:: list

omicsdata.ssm.supervariants.make_supervar(name, variants, fill_chr_pos=False)[source]

Makes a supervariant given a list of variants

Parameters:

name (str) – A name/id value to give the supervariant
variants (list) – A list of ‘variant’ dictionaries. Each variant dictionary contains the following keys:’id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
fill_chr_pos (bool) – A flag to fill the chromosome and position fields for each supervariant. This will only work if all variant names match the pattern ‘{chromosome}_{position}’

Returns:

A dictionary that has summarizes the information in the list of variants inputted. The supervariant has the following (used) keys: ‘id’ (unique id for supervariant), ‘name’ (string name of supervariant), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)

Return type:

dictionary

omicsdata.ssm.supervariants.supervars_to_binom_params(supervars)[source]

Extracts the binomial parameters for each supervariant.

Parameters:

supervars (dictionary) – A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant

Returns:

ndarray – An ndarray where each row i = 1,…,n is the variant reads for all m samples for supervariant i, and each column s = 1,…,m is the variants reads for supervariant i in sample s.
ndarray – An ndarray where each row i = 1,…,n is the total reads for all m samples for supervariant i, and each column s = 1,…,m is the total reads for supervariant i in sample s.
ndarray – An ndarray where each row i = 1,…,n is the variant read probability for all m samples for supervariant i, and each column s = 1,…,m is the variant read probability for supervariant i in sample s.

omicsdata.ssm package

Submodules

omicsdata.ssm.columns module

omicsdata.ssm.common module

omicsdata.ssm.constants module

omicsdata.ssm.convert module

omicsdata.ssm.parse module

omicsdata.ssm.supervariants module

Module contents