This document will go through the design choices made when applying the standard Self-Organizing Map (SOM) algorithm to piece of interactive digital art. First part consists of a overview of program's functionality and areas where standard som has been epanded to deliver necessary performance for this project. Second part describes the actual technical implementation and provides documentation so that software engineer with sufficient knowledge about SOMs and distributed systems can cope with maintainance and possible upgrading of the system.
Functioning of neural networks in general and basic SOMs are beyond the scope of this report. Reader is advised to consult book "Self-Organizing Maps"[1] by Teuvo Kohonen for good introduction to the algorithm and it's various applications.
Module transfroms data from users' descriptions of objects into suitable form for SOM calculation, calculates maps and maintains temporal coherence between maps. Handling of the database handling and the user interface, for both data input and visualization are implemented in other systems.
Biggest difference in the SOM-techniques used in the Pockets full of memories -project compared to standard textbook usage of SOM is the dynamic nature of data: As more and more data is coming in the map should reflect the current situation in more or less realtime fashion. Two noteworthy advances are also the refined method for labelling maps to achieve desired image mosaic and and the presentation of keywords that describe objects with random projection, a technique pioneered in WebSom [2].
Input data for the SOM -algorithm is extracted solely from the user-supplied features assosiated with objects, no image data is used. This data consists of users judgements how well the object fits several ( eight in this case ) qualities ( fed as numerical values using slider controls ) and several freeform keywords describing object.
Numerical values can be used as is, but as all data processed by SOM must consist of fixed length numerical vectors, the keywords cannot be used in such a straightforward way. We deal with this problem using random vector method developed by Timo Honkela, introduced in his doctoral thesis [3]. Shortly: Each keyword is mapped to random vector of high dimensionality. While these are not truly orthogonal, high dimensionality guarantees good propability for two random vectors being quite close to being orthogonal, which is sufficient in this application. Vectors are also normalized and every component is made to have non negative value. Individual vectors that meet these conditions and represent single words can be summed together to yield a fingerprint representing a set of words.
So, final data that is used in calculating maps is just vector with first eight components from scaled values of features from user input and the rest is the fingerprint calculated from keywords with above mentioned method.
We use set of fixed size from the pool of latest objects to teach and label the map. While the visualization of the map can only hold fixed number of objects dictated by the number of cells, using larger set in teaching is motivated because it results in more accurate maps and better temporal coherence.
There are two reasons why we don't include all objects from day one to the calculation: We and visitors want to see the new objects and the map should keep changing at a constant speed. If size of dataset grows all the time, the contribution of each new object to the outcome would grow smaller and smaller and thus slow down changes. The computational load would also grow all the time and when exhibited for six months one must take practicalities like this in account.
As the map is continually updated with most recent dataset and fresh map is outputted, the runs are done with quite a few rounds compared to classical method where all the data is used once with large number of rounds. Many subsequent short learning perioids result in comparable map. Still we get good temporal coherence between maps ( ie. gradual changes ) and are able to change data in between of runs.
Objects are placed on the map by finding the cell ( called the best matching unit or BMU ) with a prototype vector that is most similair to the data vector assosiated with that object . So far the method is same as what is used for labelling classic SOMs. This method has two disadvantages in this apllication: Two or more objects can overlap each other and many cells tend to remain empty. As a solution, we consider only unoccupied cells when finding best matching units. To make sure that objects which fit the map best also get the optimal placement, objects are placed in map in an order sorted by quantization error ( that is, distance between BMU and object's data vector ).
As a result, when number of objects is larger than the number of cells, the objects that fit the map best get shown. These objects also tend to be the ones that have participated to the organization of the map and can be sofar be thought as relevant.
Downside of this method is the fact that it penalizes recent objects with new features. Because the users often wish to witness the presence of the object they have recently submitted on the map, ten most recent objects are always placed on the map by placing them first no matter how large the quantization error assosiated with them is.
As can be seen, only the map ( that means prototype vectors ) from previous step is used in calculation of the current map and the placement of the objects ( labelling ) is discarded. Temporal coherence between maps keeps positioning of objects steady.
Automatic selection of various parameters assosiated with learning rate or size and structure of the network are still hard and open problems when it comes to many artificial neural network algorithms. Often one must resort to simple trial and error strategy, tweaking parameters by hand and using one's know how about the network architecture at hand.
SOM makes this promblem even harder because there is no clear way to judge quality of the resulting map in an objective manner. Quantization error is one way of calculating how well map fits the data, but it is vulnerable to overfitting and unsuitable for many applications like data exploration.
After experimenting with system, following conclusions vere drawn: As the map is constantly updated and changes should be gradual, input data run through through the algorithm twice during each update. On first round the radius of neighbourhood is circa half of the maps size to facilitate reorganization and on the second it is only one just for finetuning things. Teaching constant is 0.1 for both rounds. This differs from traditional way of building static maps, because larger values would cause too rapid changes in map. Same reaseon restricts us from running data through many times for one update.
Overview of operation goes somewhat like this:
List of configuration parameters:
Calculation of map takes more time when size of the map, number of objects, rounds and size of neighbourhood gets higher.
Limiting number of live objects is responsibility of server. Client takes all objects given by server in account when calculating maps. If server doesn't drop older objects out of the set of live objects, it is not guaranteed that the changes in map represent the features of incoming objects in desirable and realistic way. Author of this this document has no idea how c3 has implemented this feature.
Behaviour of the client is unspecified if the disk fills up and there is no more space to work. Aside from couple of small temporary files, dictionary and maparchive grow in time. Size of the dictionary should be reasonable, but it is wise to keep track of the size of the maparchive. Program does not check for free disk space and fails silently!
Client may also fail if it pointed to corrupt mapfile on startup. Changing map topology or dimensions and trying to continue calculation from map of older format will propably also lead to failure.
[1] Teuvo Kohonen, Self-Organizing Maps , Springer 1995
[2] http://websom.hut.fi/websom/
[3] Timo Honkela, Self-Organizing Maps in natural language processing, 1997