This post is not related to Knoware, but an old half-finished project called CASPER.

CASPER is a recursive acronym standing for Casper Artist-Sorted Playlist Entropy Reducer. The idea relies on the fact that there are two extremes dominating the way people can sort a playlist: alphabetically or completely random. Since the former would be a strange way to listen to music, most people go with the shuffled approach. In my experience, the reasoning behind listening to a shuffled playlist is to promote variety. But while complete chaos might increase variety, it certainly does not maximize it. Some people notice that their media players seem to have an affinity for certain artists or songs. So I came up with a method to increase the variety of any given attribute in a playlist.

The method places songs throughout the playlist such that items with the same key (for example, songs by the same artist) occur as far apart as possible, and it does this for every song in the list. For example, if the playlist contains 3 songs by X and 3 songs by Y, one result might be X2-Y3-X1-Y2-X3-Y1, where the order of the songs by X and Y are random (so there are lots of possibilities for even such simple cases). With large playlists, this careful song placement is much less apparent, so it appears random. The total entropy of the shuffled sequence is decreased, but the variety is increased.

I've decided to release some simple, documented Python code demonstrating the algorithm used for CASPER. This should make it easy for anyone to implement this shuffling method into their media player if they so desire. If your media player can easily make use of Python, it would probably take less than a minute to do. Check out casper.py if you're curious. Running the script as-is will produce a simple example. Requires Python 2.4 or higher, but is easy to port.

The full source code of the linked file is in my extended entry if you're really lazy, but it's probably formatted poorly and won't have syntax highlighting.


# CASPER List-Generating Algorithm by Brian Beck
#
# Requirements: Python 2.4 or higher. Could be easily ported to older versions.
#
# Given a sequence, generate a new sequence in which each item is placed with
# maximum distance from items with the same key. For most sequences,
# especially large ones, the new sequence will still appear to be random.
# This process is somewhere between a sorting algorithm and a shuffling
# algorithm.
#
# The ideal result is that when the sequence is traversed in order, maximum
# key variety is experienced.
#
# Performing a sort-in-place would make the algorithm much more complicated,
# so the casperSort function generates a new sequence.

from operator import attrgetter, itemgetter
from random import random, shuffle

def casperSort(seq, keyFunc, idFunc=lambda x: x, normFunc=str.lower):
    """
    Return a new sequence in which the items are sorted with maximum distance
    between items with the same key.

    seq         Input sequence. For example, a playlist.
    
    keyFunc     Given an item, returns its key. CASPER is named for using a
                song's artist field as its key.
                
    idFunc      Given an item, return a unique identifier. For example, a
                playlist might want to use the item's filename. The resulting
                sequence will consist of these identifiers. The default is to
                just use the item itself.
                
    normFunc    Key normalizing function. For example, the default is to remove
                case. A good function might also consider artist names that
                start like "The ..."
    """
    newSeq = []
    keyItems = {}

    # Group items in the sequence with the same key
    # The key is retrieved with attrFunc and normalized with normFunc
    for item in iter(seq):
        key = normFunc(keyFunc(item))
        item = idFunc(item)
        try:
            keyItems[key].append(item)
        except KeyError:
            keyItems[key] = [item]

    # The algorithm works the best when least-occuring items are mixed into the
    # new sequence first, so count each key's items
    keys = sorted(((key, len(items)) for key, items in keyItems.iteritems()),
                  key=itemgetter(1))

    # Distribute each key's items throughout the new sequence
    for key, count in keys:
        items = keyItems[key]
        shuffle(items)
        newLen = len(newSeq) + count # Sequence length after current insertion
        spacing = newLen / float(count) # Distance each item will ideally occur
        offset = random() * (spacing - 0.5) # Common offset for each insertion
        positions = [int(round(spacing * i + offset)) for i in xrange(count)]
        for position in positions:
            newSeq.insert(position, items.pop())
    return newSeq

if __name__ == "__main__":
    class Song(object):
        def __init__(self, artist, title):
            self.artist = artist
            self.title = title

        def __str__(self):
            return "%s - %s" % (self.artist, self.title)

    print """
    This is a simple example that clearly illustrates how CASPER works.
    Notice how the songs are spaced for maximum artist variety.  In
    larger playlists, the result looks much less structured and more
    random.  In this case, the script can produce 100% ideal results every
    time, but for some playlists it is inevitable that songs with the same
    key will be adjacent.  Running the script again will produce different
    results.
    """
    playlist = [Song("The Beatles", "Hey Jude"),
                Song("The Beatles", "A Day in the Life"),
                Song("The Beatles", "Come Together"),
                Song("The Rolling Stones", "Satisfaction"),
                Song("The Rolling Stones", "Brown Sugar"),
                Song("The Rolling Stones", "Can't Always Get What You Want")]

    newPlaylist = casperSort(playlist, attrgetter("artist"))
    print "Playlist:"
    for index, song in enumerate(newPlaylist):
        print " %d. %s" % (index + 1, song)