

1. Browse to the file contains the miR data; the file should contain only the expression data such that each row is a miR and each column is a sample. Tab delimited text file without missing or NA values. <example>
2. Browse to the file contains the miR names corresponding to the miR data file. <example>
3. Browse to the file contains the mRNA data; the file should contain only the expression data such that each row is a gene and each column is a sample. The order of the samples (columns) should be the same as in the miR data file. Tab delimited text file without missing or NA values. <example>
4. Browse to the file contains the official gene symbol names corresponding to the mRNA data file. <example>
5. Select the sequence-based prediction algorithm to use.
6. Select the species.
7. If you want to use your own predictions, select the custom option. NOTE the format of the predictions file is specified at the bottom in the notes for version 4 <example>.
8. Check this box if you would like to run the new version of CoSMic, which is suitable for large datasets (100 samples or more, see details below <NEW VERSION NOTE (version2)>).
9. Browse to the directory in which you would like to save the output files.
10. Choose a name for CoSMic results file.
11. Run your analysis. When CoSMic finishes your analysis a message will appear on the screen
CoSMics output files:
1. param_<output_file>.txt A parameter file containing the selected species and the prediction algorithm.
2. list_of_mirs_in_sb_prediction_file.txt Names of miRs appear in the selected sequence-based predictions.
3. mirs_found_in_sb_predictions.txt - Names of miRs, inserted by the user in the miR names file, that were found in the sequence-based predictions.
4. mirs_missing_in_sb_predictions.txt - Names of miRs, inserted by the user in the miR names file, that were not found in the sequence-based predictions.
Note that due to rapid changing in the miR names convention, the same miR (i.e. sequence) has sometimes different names (for example miR-122a in the old version of Agilent array is miR-122 in the current miRBase version). If a situation like this revealed, the user can change the name of the missing miR in the miR names file to be the same as it appears in the sequence-based predictions.
5. <output_file>.txt Contains the CoSMic results in the following format:

Column A: miR names
Column B: The p-value each miR obtained for the enrichment of the group of anti-correlated genes in the group of predicted target genes (After the random model).
Column C: The q-value each miR obtained for the enrichment of the group of anti-correlated genes in the group of predicted target genes (After the random model).
Column D: The number of target genes identified by CoSMic as functional and context specific for the corresponding miR (at the negative correlation procedure).
Column E: The names of the target genes identified by CoSMic as functional and context specific for the corresponding miR (at the negative correlation procedure).
Column F: The p-value each miR obtained for the enrichment of the group of correlated genes in the group of predicted target genes (After the random model).
Column G: The q-value each miR obtained for the enrichment of the group of correlated genes in the group of predicted target genes (After the random model).
Column H: The number of target genes identified by CoSMic as functional and context specific for the corresponding miR (at the positive correlation procedure).
Column I: The names of the target genes identified by CoSMic as functional and context specific for the corresponding miR (at the positive correlation procedure).
NEW
VERSION NOTES
Version2 (17/6/2013)
CoSMic
searches for the group of genes that are correlated with the miR expression
levels and are enriched by its sequence-based predicted target
genes. The statistic in the original version of CoSMic was based on a
combination of the p-value of the enrichment score and the q-value of the
correlation:
(as
described in details the supplementary method of the paper). For a large dataset
(more than ~100 samples) the p and q-values of the correlations get extremely
low values, even when the actual value of the correlation coefficient is low.
Since in the random model the q-value of the correlation is 1, all results tend
to be identified as significant by CoSMIc, due to the low q-values of the
correlation, irrespective of the enrichment score and its p-value.
Therefore we modified the statistic of CoSMic in the new version to contain the
correlation coefficient (ρcc) instead of its q-value, using-
. The
correlation coefficient is dependent on the sample size, and in the random
model the correlation coefficients are taken from the normal distribution with ![]()
Version3 (30/9/2013)
Fixed a
bug in which negatively correlated targets were appeared in the group of
positively correlated targets and vice versa.
Version4 (19/11/2013)
1) We
added input errors.
2) We
added for the sequence-based predictions, predictions for Rat microRNAs
(rno-miRs) from miRanda algorithm.
3) We
added an option for the user to upload his own predictions (from any
algorithm/species) (option (7) in CoSMic GUI, see above).
IMPORTANT
NOTE the sequence-based prediction
file should be a .mat file (matlab file) with the following format (otherwise
it will not work) <example>
The
prediction file should contain 3 variables:
a) miR_names a cell (nx1) with all miRs
names (n=number of miRs in the predictions).
The
miRs names should be the same as in the miR names file.
b) targets_mRNA a cell (mx1) with the
gene symbols of all target genes exist in the predictions (m=number of
all target genes exist in the predictions).
The
gene symbols of the target genes should be the same as in the mRNA names file.
c) scores_data a cell (nx2) with the
sorted target genes of each miR (column1), and their corresponding scores (column2).
The
first column contains the indexes
of the predicted target genes of each miR (the order of the miRs should be the
same as in the variable miR_names).
The
target genes should be sorted from the best prediction to the worst.
The
indexes are corresponding to the indexes of the target genes in the
variable targets_mRNA.
The
second column contains the sorted scores
of the predictions (each entry contains the sorted scores of all target genes
of the specific miR).
The
scores should be normalized between 0 to 1, with 1=the best prediction
and 0=the worst prediction.
The
variables names should not be changed, as well as the format of each variable;
otherwise the prediction file could not be used appropriately by CoSMic
algorithm!