omicsdata.ssm package
Submodules
omicsdata.ssm.columns module
- class omicsdata.ssm.columns.PARAMS_Columns(SAMPLES: str = 'samples', CLUSTERS: str = 'clusters', GARBAGE: str = 'garbage')[source]
Bases:
objectDataclass used to define .params.json column headers
- CLUSTERS: str = 'clusters'
- GARBAGE: str = 'garbage'
- SAMPLES: str = 'samples'
- class omicsdata.ssm.columns.SSM_Columns(NAME: str = 'name', ID: str = 'id', VAR_READS: str = 'var_reads', TOTAL_READS: str = 'total_reads', VAR_READ_PROB: str = 'var_read_prob', COL_ORDER: tuple = ('id', 'name', 'var_reads', 'total_reads', 'var_read_prob'))[source]
Bases:
objectDataclass used to define .ssm column headers
- COL_ORDER: tuple = ('id', 'name', 'var_reads', 'total_reads', 'var_read_prob')
- ID: str = 'id'
- NAME: str = 'name'
- TOTAL_READS: str = 'total_reads'
- VAR_READS: str = 'var_reads'
- VAR_READ_PROB: str = 'var_read_prob'
omicsdata.ssm.common module
- omicsdata.ssm.common.extract_vids(variants)[source]
Extracts the unique numerical value of all variants and sorts them”
- Parameters:
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
- Returns:
list of sorted variant ‘id’ values
- Return type:
list
- omicsdata.ssm.common.sort_vids(vids)[source]
Extracts the unique numerical value of a variant id (vid). Assumes that all vids match the regular expression r’sd+’.
- Parameters:
vids (list) – list of ‘id’ values for variants
- Returns:
sorted list of only the numeric values of each ‘id’ from the inputted list of variant ‘id’ values
- Return type:
list
omicsdata.ssm.constants module
- class omicsdata.ssm.constants.Variants_Keys(NAME: str = 'name', ID: str = 'id', CHROM: str = 'chrom', POS: str = 'pos', OMEGA_V: str = 'omega_v', VAR_READS: str = 'var_reads', REF_READS: str = 'ref_reads', TOTAL_READS: str = 'total_reads', VAF: str = 'vaf')[source]
Bases:
objectDataclass used to define variant column headers
- CHROM: str = 'chrom'
- ID: str = 'id'
- NAME: str = 'name'
- OMEGA_V: str = 'omega_v'
- POS: str = 'pos'
- REF_READS: str = 'ref_reads'
- TOTAL_READS: str = 'total_reads'
- VAF: str = 'vaf'
- VAR_READS: str = 'var_reads'
omicsdata.ssm.convert module
- omicsdata.ssm.convert.ssm_to_pyclone(pyclone_fn, ssm_fn, params_fn)[source]
Processes simple somatic mutation (ssm) file to a tab separated file (tsv) that can be used by PyClone-VI (https://github.com/Roth-Lab/pyclone-vi)
- Parameters:
pyclone_fn (str) – path to a file to a tsv file to output the convert ssm file data
ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file
- Return type:
None
- omicsdata.ssm.convert.ssm_to_viber(viber_dir, ssm_fn, params_fn)[source]
Processes simple somatic mutation (ssm) file to a tab separated file (tsv) that can be used by VIBER (https://github.com/caravagnalab/VIBER)
- Parameters:
viber_dir (str) – path to a directory to store all VIBER format files
ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file
- Return type:
None
omicsdata.ssm.parse module
- omicsdata.ssm.parse.extract_nums(S, dtype)[source]
Extracts and converts values from a comma delimited string
- omicsdata.ssm.parse.load_params(params_fn)[source]
Loads a params file into a dictionary
- Parameters:
params_fn (str) – The parameters file
- Returns:
A dictionary where each of the key, value pairs are the same as those listed in the params_fn file
- Return type:
dictionary
- omicsdata.ssm.parse.load_ssm(ssm_fn, rescale_depth=False)[source]
Loads a ssm file and extracts the read count and copy number data for each variant
- Parameters:
ssm_fn (str) – The simple somatic mutation file
rescale_depth (bool, optional) – A flag for whether or not to rescale the read depth for each variant using the average across each sample.
- Returns:
A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
- Return type:
dictionary
- omicsdata.ssm.parse.load_ssms_and_params(ssm_fn, params_fn, remove_garb=True, rescale_depth=False)[source]
Loads ssm file and params file
- Parameters:
ssm_fn (str) – The simple somatic mutation file
params_fn (str) – The parameters file
remove_garb (bool, optional) – A flag to remove garbage from the ‘variants’ dictionary (default is True)
- omicsdata.ssm.parse.remove_garbage(variants, garbage)[source]
Removes garbage variants from dictionary of variants
- Parameters:
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
garbage (list) – A list of variant ‘id’ values to be removed from the ‘variants’ dictionary
- Returns:
A ‘variants’ dictionary (same as what’s returned by load_ssms) with the variants listed in the garbage parameter removed
- Return type:
dictionary
- omicsdata.ssm.parse.write_params(params, params_fn)[source]
Writes a set of variants along with their associated read count and copy number information to an ssm file
- Parameters:
params (dictionary) – A dictionary that contains a set of key/value pairs to write to a file as a JSON string
params_fn (str) – The parameters file (.params.json) to write the params to
- Return type:
None
- omicsdata.ssm.parse.write_ssms(variants, ssm_fn)[source]
Writes a set of variants along with their associated read count and copy number information to an ssm file
- Parameters:
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
ssm_fn (str) – The simple somatic mutation file
- Return type:
None
omicsdata.ssm.supervariants module
- class omicsdata.ssm.supervariants.Variant(id, var_reads, ref_reads, total_reads, vaf, omega_v)
Bases:
tuple- id
Alias for field number 0
- omega_v
Alias for field number 5
- ref_reads
Alias for field number 2
- total_reads
Alias for field number 3
- vaf
Alias for field number 4
- var_reads
Alias for field number 1
- omicsdata.ssm.supervariants.clusters_to_supervars(clusters, variants, fill_chr_pos=False)[source]
Converts clusters into supervariants
- Parameters:
clusters (list) – A list of lists, where each sublist contains the ‘id’ values for the variants that are in that cluster
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
fill_chr_pos (bool) – A flag to fill the chromosome and position fields for each supervariant. This will only work if all variant names match the pattern ‘{chromosome}_{position}’
- Returns:
A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant
- Return type:
dictionary
- omicsdata.ssm.supervariants.convert_all_variants_to_tuples(variants)[source]
Converts a dictionary of variants each of which are represented by a dictionary into a list of tuples
- Parameters:
variants (dictionary) – A dictionary where the keys are unique variant ‘id’ values and the value is a dictionary for each variant containing the variant’s ‘id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
- Returns:
A list of namedtuples for each variant in the variants input. Each value in the list is a ‘Variant’ namedtuple with the following keys: ‘id’, ‘var_reads’, ‘ref_reads’, ‘total_reads’,’vaf’,’omega_v’
- Return type:
list
- omicsdata.ssm.supervariants.convert_variant_dict_to_tuple(variant)[source]
Converts a dictionary of variants into a tuple of variants
- Parameters:
variant (dictionary) – A dictionary containing all of the following keys for a particular variant: ‘id’, ‘var_reads’, ‘ref_reads’, ‘total_reads’, ‘vaf’, ‘omega_v’
- Returns:
A ‘Variant’ named tuple with all of the same keys as the inputted dictionary
- Return type:
namedtuple
- omicsdata.ssm.supervariants.make_superclusters(supervars)[source]
Generates a clustering where each supervariant is in its own cluster
- Parameters:
supervars (dictionary) – A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant
- Returns:
A list of lists where each sublist contains a single supervariant
- Return type:
list
- omicsdata.ssm.supervariants.make_supervar(name, variants, fill_chr_pos=False)[source]
Makes a supervariant given a list of variants
- Parameters:
name (str) – A name/id value to give the supervariant
variants (list) – A list of ‘variant’ dictionaries. Each variant dictionary contains the following keys:’id’ (unique identifier), ‘name’ (string identifier), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
fill_chr_pos (bool) – A flag to fill the chromosome and position fields for each supervariant. This will only work if all variant names match the pattern ‘{chromosome}_{position}’
- Returns:
A dictionary that has summarizes the information in the list of variants inputted. The supervariant has the following (used) keys: ‘id’ (unique id for supervariant), ‘name’ (string name of supervariant), ‘var_reads’ (array of variants reads for each sample), ‘total_reads’ (array of total reads for each sample) ‘omega_v’ (array of variant read probabilities for each sample)
- Return type:
dictionary
- omicsdata.ssm.supervariants.supervars_to_binom_params(supervars)[source]
Extracts the binomial parameters for each supervariant.
- Parameters:
supervars (dictionary) – A dictionary of supervariants, where the keys are the supervariant ‘id’ values and the values are a dictionary containing the data for the supervariant
- Returns:
ndarray – An ndarray where each row i = 1,…,n is the variant reads for all m samples for supervariant i, and each column s = 1,…,m is the variants reads for supervariant i in sample s.
ndarray – An ndarray where each row i = 1,…,n is the total reads for all m samples for supervariant i, and each column s = 1,…,m is the total reads for supervariant i in sample s.
ndarray – An ndarray where each row i = 1,…,n is the variant read probability for all m samples for supervariant i, and each column s = 1,…,m is the variant read probability for supervariant i in sample s.