Information theory, originally developed for mathematical analysis of communication systems, has been applied to molecular biology for decades. In this context, the concept of entropy is utilized to measure the compositional complexity of genomes, wherein all of the hereditary information necessary to build and maintain an organism is stored. The recent explosion in the availability of genomic data, coupled with the considerable improvements in computational processing power, presents opportunities for investigating genomes far beyond the scope and depth previously achievable. In this work, we propose to characterize the informational properties of ~5000 genomes by assessing the statistical abundance and sequence space coverage of fixed-length substrings (known as ‘kmers’). Additionally, we aim to identify unique kmers that can be used as genome-specific markers for taxonomic profiling purposes.