Working with submodels : One-sided relationship enumeration

One-sided relationship enumeration

One-sided relationship enumeration, also known as base instance lookup, is a feature that makes it quicker to create associations between submodels with large numbers of instances. The problem is that when creating an association, Simile would normally loop over instances of one of the base submodels, and for each instance, loop over the instances of the other, checking the existence condition of the association for each possible pairing of base models. This means that if one base model has n instances and the other m, the number of checks that must be made is nxm, potentially a very large number. Base instance lookup can be used if there is sufficient data in the instances of one base submodel to calculate the indices of the instance(s) of the other base model with which it will be associated. For this to be the case, the association must be independent of the actual values in one of the submodels (except those that can be derived from its index, e.g., physical location of a grid square). It can of course be used in self-associations, where the same submodel serves as base model on both sides of the association.

To illustrate the technique, we will discuss the case of using it specifically to make more efficient the creation of large rasters. A raster is a grid of pixels. Each pixel is equivalent in size and shape. Usually, each pixel is square, though it is also possible to work with other rasters. In this discussion, we shall consider only square grids. Note that Simile version 6.1 introduces a feature that allows a raster to have a built-in neighbour relationship without any need for an association submodel, and that this feature internally uses something like base instance lookup to create this relationship.

Square grid example

In a square grid, each pixel has potentially eight neighbours: north-west, north, north-east, west, east, south-west, south and south-east. However, pixels at the edge of the grid will have only a subset of those neighbours. For example a pixel on the northern edge of the grid will have five neighbours (west, east, south-west, south, and south-west) and one in the south-west corner will have only three: (north, north-east and east).

To take advantage of the powerful features in Simile, we represent the pixels as instances of a multiple-instance submodel. The number of instances is equal to the width of the grid (in pixels) multiplied by the height. Each pixel has an id number, equal to the index of its position. The row and column that the pixel is in can be calculated from the id number, knowing the width of the grid.

Within each pixel, we can create an eight-instance submodel representing each possible neighbour. To allow for edge effects, we can make the existence of each of the eight conditional. Each (possible) neighbour-instance has a row and column calculated from its position relative to its parent pixel. Now each pixel has access to a list of its neighbours.

To use this knowledge, we can create an association submodel, representing the relationship of "neighbourhood" between the central pixel and the (up to) eight pixels that border it. Two role arrows are drawn between the pixel submodel and the neighbourhood submodel, representing the pixel in the centre and the pixels in the border.

The final step is to create a condition in the neighbourhood model that evaluates for each pixel whether any given pixel is in the neighbourhood. Since each pixel has a list of its own neighbours, this condition is simply:

any(my_id_border == {neighbours_ids_centre} )

In this expression, neighbours_ids_centre is the list of ids of neighbours from the instance in the "centre" role, and my_id_border is the index of the pixel that is potentially in the border (set to index(1) in a variable inside the grid square submodel). The condition is true (and the relationship therefore exists) for a centre and a border pixel, if any of the elements of the list of the neighbour_ids in the centre equates to the index of the pixel in the border.

So far, it is important to emphasise that we have followed the procedure that is standard in Simile for many different types of relationship. We now need to look carefully at what this model represents, how Simile is converting it into a computer program, and how it can be changed to make it more efficient.

What does the model represent?

This model looks unusual as the condition for the relation uses values from a submodel of the base model. But the operation is pretty similar to the self-relation in the previous section, with the difference that each base model instance can be in relation with many other instances. Any time the index of the instance in the "border" role matches any of the indices from the "neighbours" submodel of the instance in the "centre" role, an association instance will be created. Now if the "centre" instance needs to have information from all the "border" instances, this can be passed to the association via the "border" role and back via the "centre" role. Because different instances have different numbers of "neighbours" submodels, they have different numbers of associations.

How is Simile running it?

Now for the bad news. When Simile creates a program for any association submodel, it first creates nested loops for each of the base models. If there are two role arrows from the same base model as in this case, it will loop through all its instances, and for each one, loop through all its instances again. This is so it can compare values from every instance with those from every other instance, in order to decide which pairs the association holds between. In this model there is yet another loop inside these -- it has to loop through the instances of "neighbours" in the "centre" role to see if any match the index of the square in the "border" role.

This only has to be done when the model is initialized, as the definition of the relationship clearly dos not change over time or from run to run. Once the relationship is created, the association submodel instances have pointers to their base instances so they remember which ones to use. But if there are, say, a million grid squares (not an excessive number for Simile) it takes 8x1012 comparisons to set up the relationship, which would keep your computer busy for hours. So what can we do?

How can we improve efficiency?

Very easily, just get rid of the association submodel, and instead have any variable that needs to be shared between neighbours taken outside the grid square submodel into an array variable of its own. Then draw an influence from that array to a variable in the "neighbours" submodel, and give it the equation element([array_outside],neighbours_ids).  The values of this variable can then be summed inside the grid square but outside the "neighbours" submodel to give the sum of the original values from all an individual's neighbours. This only needs one level of looping through the grid square model, since it is using the neighbours_ids values directly to index the array rather than comparing them with those from other instances.

But this model is not as clear as one with an explicit neighbour relationship, especially if many values are passed between neighbours. So we have added a feature to the interpretation of the condition symbol that allows an association to be built in a similar manner, i.e., by using the index to look up one of the base instances rather than seaching through all of them and comparing their indices with a value we got from the first base instance. To do this, we rewrite the condition's equation like this:

any(index(1) is {neighbours_ids_centre} )

Previously, this expression would have been written with the equality operator "==", instead of the key word "is". The new key word has been introduced in order to invoke the efficient program generation that is possible for rasters. Note that we have also replaced the reference to the variable my_id in the submodel with the "border" role, with the function index(1). This is because index(1) means the same thing (because we set "Allow base instance lookup" in the properties dialogue of the "border" role arrow), and we cannot use a variable from the base model in that role until we have looked up the base instance itself.

This is not the only situation that is applicable: in general, as the name implies, the feature is useful for any situation where the relationship can be calculated in a loop over only ONE of the roles. In this case, because each pixel is able to calculate its own list of neighbours, it is not necessary to take each possible pair of pixels and determine if they are neighbours. It is from this short cut that the improvement in efficiency stems.

There is a model in the catalogue on Simulistics web site, showing all this in practice. If, without looking at the example, you could create an equivalent model, you are a graduate summa cum laude of Simile use. The example can be used as a starting-point for most work on rasters, grid squares and so on.

Variable-membership base models

In the original implementation, it was not possible to set an association model's membership by base instance lookup if the base instance was itself variable-membership. This was because a variable-membership submodel is implemented as a linked list of data structures, and the only way to find a member with a particular index is to search sequentially through the list. This is the normal way of setting up an association, so in the case of a one-to-one or many-to-one association, having a special construct would not improve efficiency.

However, in Simile v6.1 we have extended the element() function to select sublists from lists. This it does by going through the list sequentially, and returning values as it find indices that match the elements of the second argument (which therefore need to be in ascending order). So, we can do something similar when building a many-to-many association. If the submodel on one side can generate an array or list of index values for a variable-membership submodel on the other side (possibly the same submodel) these can now all be looked up in a single pass by using the 'any(...)' construct described above.

Having implemented this, we found that one-sided enumeration is quite a lot faster than checking every combination of base models even in the case of a one-to-one relationship involving a variable-membership base model. This is partly because searching for a single index value can be stopped as soon as it is found, but mostly because a lot less pointer arithmetic is required just to check an index than to actually get values from the model and plug them into the existence condition equation. So, using 'index(1) is...' is now also recommended for variable-membership base submodels.

Looking up instances by multiple indices

Another small change in Simile v6.1 is to allow more than one index to be specified when looking up a base submodel instance. The indices must be specified in descending order (i.e., outermost first) and joined by 'and', e.g.,


index(2) is y and index(1) is x

or

any(index(2) is {list_of_y} and index(1) is {matching_list_of_x})

This will work when looking up instances of fixed or variable membership base submodels, but the number of indices specified must be exactly the number of indices that the base submodel has. This capability has been added to make it possible to look up instances of the new special-purpose submodels.

In: Contents >> Working with submodels >> Association submodels