Deep learning was one of the great breakthroughs in the field of artificial intelligence. Generally, there is a belief that deep learning may be all that is needed to replicate human intelligence. However, the reality is that challenges remain, as exposing a neural network to an unknown dataset will reveal it in a fragile way.

For example, in the case of autonomous cars they are apparently effective, but AI systems can easily get it wrong. If the system has only been trained to identify objects from side perspectives, it is unlikely to recognise them from a higher perspective.

Taking advantage of the impact and the large investment involved in AI development today, new proposals are emerging every day to develop Deep Learning, Machine Learning, new algorithms, and so on.

Recently, Geoffrey Hinton presented GLOM, a project that tackles two of the most difficult problems for visual perception systems. Firstly, understanding an entire scene in relation to objects and their natural parts, and secondly, recognising objects when viewed from a new perspective. Although the GLOM approach currently focuses on vision, it is expected to be developed for language applications.

The problem with networks is that grouping parts can be complicated for computers, as parts can sometimes be ambiguous. That is, a circle can represent an eye or a wheel. Initially, the first generation of AI tried to recognise objects from geometry in a so-called part-whole relationship, which consists of spatial orientation between parts, and between parts and the whole.

In parallel, the second generation relied on deep learning, training the neural network with large volumes of data and information. The idea of GLOM combines the best properties of both generations.

Consequently, GLOM presents good prospects towards achieving the perception of AI in a more human-like method than current neural networks.

Thus, if his bet is successful, Hinton will trigger a new revolution in artificial intelligence as he has done on previous occasions. GLOM may represent the next generation of artificial neural networks.

The GLOM architecture

The development of this architecture has basically been about introducing intuitive strategies to AI. So it’s about looking at the heuristics that people have, building those neural networks, and then demonstrating that those networks work best in vision as a result.

So with visual perception, the parts of an object are analysed so that the AI is able to understand the whole. For example, in facial recognition, it could recognise only a person’s nose to identify him or her – this would be a hierarchy of part and whole.

In humans, the brain is able to understand this kind of part-whole hierarchy by creating something known as a “parse tree”, which is a branching diagram capable of demonstrating the hierarchical relationship between the whole, its parts and subparts. For example, a face would be the top of the hierarchical tree, where the nose, mouth and eyes would be the branches below and form the whole.

Hinton’s initial idea with GLOM is therefore to replicate this tree in a neural network, although replicating it exactly is difficult due to technical difficulties. The difficulty is that the neural network would have to parse each individual image into a single parse tree, and the static architecture of the networks makes it difficult to adopt a new structure for each new image seen.

So, the way to understand the GLOM architecture is to divide an image of interest into a grid, where each part of it is a location in the image. Now, for each location on the grid there are about five levels and, level by level, the system makes a prediction with a vector representing the information.

For example, a first level might describe the eyelashes of an eye, and the next level, in charge of constructing a coherent representation, predicts that it is part of the face seen from a particular angle. Basically, this coherence is achieved when vectors at the same level from several locations point in the same direction, generating a conclusion that both vectors belong to the same eye, and at higher levels of the tree, both vectors belong to the same face.

However, the network averages selectively, by means of neighbour predictions that show similarities, i.e. only predictions of similar locations are accepted by an echo chamber. This means, a situation where information is amplified by transmission and repetition in a closed system.

Thus, in GLOM, vectors with only slight variations of the same direction generate collective predictions that are reinforced and amplified over an image.

What makes GLOM different from other neural networks?

Some recent neural networks use vector matching for activation, while GLOM uses islands of matching vectors, known as islands of agreement, to obtain a parse tree representation convention in the neural network.

Examples in relation to different facial parts are very suitable for understanding GLOM. One of them is when different vectors are in agreement in that they all represent part of an eye, their group is representative of an eye in a network parse tree for the face.

In turn, another larger group of matching vectors may represent the nose in the tree, thus the group at the top of the tree would represent the inference and conclusion that the image as a whole represents a human face.

Consequently, the parse tree representation is that the object represents a large island and, in turn, the parts of the object represent lower level islands. The subparts are smaller and smaller islands. In the case of the face, the pupil represents an island that is smaller than the one represented by the whole eye and, at the same time, smaller than the island represented by the face.

GLOM: intuition is crucial for perception

The main purpose of GLOM is to achieve the goal of modelling intuition. According to Hinton, intuition is crucial for perception. Therefore, intuition is defined in this field as the ability to make analogies effortlessly.

Human beings give coherence to their environment by analogical reasoning, mapping similarities between objects, ideas or concepts, i.e. from one vector to another. Similarities between vectors would be the method by which neural networks would employ intuitive analogical reasoning. Otherwise said, intuition is able to capture the unique method in which the human brain generates knowledge.

Thus, the purpose of GLOM is to be able to model intuition for parts that are not well defined or if their perception is hindered by various factors, such as the position of the object or the profile from which it is observed.

First launch of GLOM

In Toronto, Google Research has started the experimental research phase of GLOM. Software engineers are using computer simulations to verify whether GLOM can create islands to understand the parts and wholeness of an object, with some of these parts being ambiguous.

Currently, experiments are using ellipses of different sizes that can be arranged to form a sheep and a face. Using random inputs of different ellipses, the model must create predictions and deal with the uncertainty of whether each ellipse is part of the sheep or the face, and within these which parts make up the ellipses in them. In addition, for any unforeseen events, the model should correct itself.

The next step forward will be to define a baseline, capable of identifying whether the current, common deep learning neural network would be wrong in performing this task. So far, GLOM is being driven by creating and labelling data to find correct predictions.


In conclusion, if GLOM meets the challenge of representing a parse tree in a neural network, it would be a success, as it would make neural networks work correctly with almost no errors.

Geoff Hinton has made great contributions of value to the world of AI, many of his intuitions have proven to be correct, and GLOM is expected to be one more of them, especially when the creator of the model himself has such high hopes for it.

Moreover, the power of this model lies in the echo chamber analogy, in mathematical analogies, and even in some biological analogies. All of this brings with it a disruptive and novel design in the engineering surrounding AI.

Initially, the idea of GLOM was born as a kind of philosophical musing, but after pilot testing by Google Research, it is proving to be a valid and effective model. Previously, neural network models never seemed in their early days to be a feasible or viable idea, but they have proven to be studies that work remarkably well. Thus, the trend for the GLOM model is expected to be the same.

However, GLOM is not intended to be the key to AI capable of solving large problems with agility, but it is one of those advances that will drive the future of neural networks, giving them a similarity to human thinking, such as building on past experiences, generalising, extrapolating and understanding.

Therefore, the future of AI lies in being as similar as possible to human thinking, because if it were more human-like, the negative aspects and mistakes made by neural networks could be predicted and their origin understood.

However, for now this is a project still to be developed and is in experimental stages, but Hinton wanted to make this information public for anyone who wants to try it out. He also suggests that the public make new combinations of this idea, hoping to achieve a new philosophy in AI science.