CoSMic_header.jpg

gui_interdace_w_numbers_v4

1. Browse to the file contains the miR data; the file should contain only the expression data such that each row is a miR and each column is a sample. Tab delimited text file without missing or NA values.  <example>

2. Browse to the file contains the miR names corresponding to the miR data file. <example>

3. Browse to the file contains the mRNA data; the file should contain only the expression data such that each row is a gene and each column is a sample. The order of the samples (columns) should be the same as in the miR data file. Tab delimited text file without missing or NA values.   <example>

4. Browse to the file contains the official gene symbol names corresponding to the mRNA data file. <example>

5. Select the sequence-based prediction algorithm to use.

6. Select the species.

7. If you want to use your own predictions, select the custom option. NOTE – the format of the predictions file is specified at the bottom in the notes for version 4 <example>.

8. Check this box if you would like to run the new version of CoSMic, which is suitable for large datasets (100 samples or more, see details below <NEW VERSION NOTE (version2)>).

9. Browse to the directory in which you would like to save the output files.

10. Choose a name for CoSMic results file.

11. Run your analysis. When CoSMic finishes your analysis a message will appear on the screen

 

CoSMic’s output files:

1. param_<output_file>.txt – A parameter file containing the selected species and the prediction algorithm.

2. list_of_mirs_in_sb_prediction_file.txt – Names of miRs appear in the selected sequence-based predictions.

3. mirs_found_in_sb_predictions.txt - Names of miRs, inserted by the user in the miR names file, that were found in the sequence-based predictions.

4. mirs_missing_in_sb_predictions.txt - Names of miRs, inserted by the user in the miR names file, that were not found in the sequence-based predictions.

Note that due to rapid changing in the miR names convention, the same miR (i.e. sequence) has sometimes different names (for example miR-122a in the old version of Agilent array is miR-122 in the current miRBase version). If a situation like this revealed, the user can change the name of the missing miR in the miR names file to be the same as it appears in the sequence-based predictions.

5. <output_file>.txt – Contains the CoSMic results in the following format:

exp_out_file.png

 

Column A: miR names

Column B: The p-value each miR obtained for the enrichment of the group of anti-correlated genes in the group of predicted target genes (After the random model).

Column C: The q-value each miR obtained for the enrichment of the group of anti-correlated genes in the group of predicted target genes (After the random model).

Column D: The number of target genes identified by CoSMic as functional and context specific for the corresponding miR (at the negative correlation procedure).

Column E: The names of the target genes identified by CoSMic as functional and context specific for the corresponding miR (at the negative correlation procedure).

Column F: The p-value each miR obtained for the enrichment of the group of correlated genes in the group of predicted target genes (After the random model).

Column G: The q-value each miR obtained for the enrichment of the group of correlated genes in the group of predicted target genes (After the random model).

Column H: The number of target genes identified by CoSMic as functional and context specific for the corresponding miR (at the positive correlation procedure).

Column I: The names of the target genes identified by CoSMic as functional and context specific for the corresponding miR (at the positive correlation procedure).

 

 

NEW VERSION NOTES

Version2 (17/6/2013)

CoSMic searches for the group of genes that are correlated with the miR expression levels and are enriched by its sequence-based predicted target genes. The statistic in the original version of CoSMic was based on a combination of the p-value of the enrichment score and the q-value of the correlation:  (as described in details the supplementary method of the paper). For a large dataset (more than ~100 samples) the p and q-values of the correlations get extremely low values, even when the actual value of the correlation coefficient is low. Since in the random model the q-value of the correlation is 1, all results tend to be identified as significant by CoSMIc,  due to the low q-values of the correlation, irrespective of  the enrichment score and it’s p-value. Therefore we modified the statistic of CoSMic in the new version to contain the correlation coefficient (ρcc) instead of its q-value, using- . The correlation coefficient is dependent on the sample size, and in the random model the correlation coefficients are taken from the normal distribution with

 

Version3 (30/9/2013)

Fixed a bug in which negatively correlated targets were appeared in the group of positively correlated targets and vice versa.

 

Version4 (19/11/2013)

1) We added input errors.

2) We added for the sequence-based predictions, predictions for Rat microRNAs (rno-miRs) from miRanda algorithm.

3) We added an option for the user to upload his own predictions (from any algorithm/species) (option (7) in CoSMic GUI, see above).

IMPORTANT NOTE – the sequence-based prediction file should be a .mat file (matlab file) with the following format (otherwise it will not work) <example> –

The prediction file should contain 3 variables:

a)      miR_names – a cell (nx1) with all miRs names (n=number of miRs in the predictions).

The miRs names should be the same as in the miR names file.

b)      targets_mRNA – a cell (mx1) with the gene symbols of all target genes exist in the predictions (m=number of all target genes exist in the predictions).

The gene symbols of the target genes should be the same as in the mRNA names file.

c)       scores_data – a cell (nx2) with the sorted target genes of each miR (column1), and their corresponding scores (column2).

The first column contains the indexes of the predicted target genes of each miR (the order of the miRs should be the same as in the variable miR_names).

The target genes should be sorted from the best prediction to the worst.

The indexes are corresponding to the indexes of the target genes in the variable targets_mRNA.

The second column contains the sorted scores of the predictions (each entry contains the sorted scores of all target genes of the specific miR).

The scores should be normalized between 0 to 1, with 1=the best prediction and 0=the worst prediction.

The variables names should not be changed, as well as the format of each variable; otherwise the prediction file could not be used appropriately by CoSMic algorithm!