Exploratory data analysis is critical in a broad range of research areas, where large collections of data need to be meaningfully arranged and presented. I have developed SPIN, a novel method for the organization and visualization of data, implemented in a simple tool. SPIN utilizes traits of distance matrices to sort objects in a natural ordering that highlights the underlying structure of the original, multidimensional data. The relationships between objects can be inferred from the reordered distance matrix generated by SPIN. As an unsupervised analysis tool, SPIN does not rely on any external labels, but rather explores the inherent characteristics of the data. SPIN has been successfully utilized in the analysis of high-throughput biological experiments. In such experiments discretely-labelled data, such as clinical labels of 'sick' versus 'healthy', is traditionally organized by various clustering approaches. However, when the objects are characterized by continuous variables, e.g. survival intervals of patients or expression levels of genes, any sharp separation into distinct clusters will be rather arbitrary. Thus, a different organization approach, one which emphasizes ordering rather than grouping, could be more relevant. In several cases, the structure uncovered by SPIN has a clear biological interpretation, such as the cyclic nature of cell-cycle progression, visualized in a ring conformation. In another example the tissue composition of tested samples is captured by their relative placement in an ordered elongated cluster, formed in the space of tissue specific genes. Finally, the general applicability of SPIN makes it relevant to diverse scientific disciplines.
![]() |
| Features of sorted distance matrices: Unsorted (middle row) and sorted (bottom row) distance matrices of the 4 simple objects appearing on the top row |
| SPIN | Hierarchical Clustering |
![]() |
|
| (a) A toy example. (b) Single-linkage dendrogram of the object. (c) The distance matrix sorted according to the dendrogram. (d) The distance matrix after sorting by SPIN | |
Seven orthogonal cylinders in 7 dimensions, twisted with angles that increase linearly with the distance from the origin.
Gene expression profiles across several stages of the disease, from normal colon, through adenoma and carcinoma all the way to metastasis. Data taken from: Tsafrir et al., AACR 2004.
![]() |
| 1000 highest variance genes over the 144 samples. (a) original unsorted expression. (b) sorted expression and (c) distance matrix after applying SPIN. |
![]() |
| Yeast expression data taken from: Spellman et al., Molecular Biology of the Cell 9, 3273-3297 (1998). (a) Expression matrix obtained by sorting the genes using SPIN and ordering the samples according to time. (b) Sorted distance matrix reveals the interplay between genes associated with different stages of cell-cycle. (c) Projection of genes on the first and second PCA. |
We are currently in the process of filing a patent application for SPIN.