PA Hierarchy HTML

The Perception–Action Hierarchy and its Implementation Using Binons (Binary Neurons)


The perception–action hierarchy contains a model of the environment as experienced based on what has been recognized and done.  Binons (binary neurons) can be used to represent and implement this hierarchy. Binons are simple deterministic artificial neural nodes that represent relationships. They have two source nodes and are reused by zero or more target nodes. Binons are general purpose components that interact in an object-oriented fashion. The two types of binons are spatial and temporal.  Spatial binons represent simultaneously occurring patterns of percepts and actions. Temporal binons represent sequential patterns of percepts and actions. Two kinds of temporal binons are used to learn and control behaviour. They are the action and expectation control binons.  They are equivalent to command neurons in neuroscience, production rules in cognitive architectures, or the forward model in motor control when combined together. Learning takes place in the three stages of babbling, practicing and automaticity.  The resulting hierarchy is a transparent, compositional, unsupervised, continuously growing, deep learning artificial neural network. The hierarchy is part of the Adaptron cognitive architecture.

1. Introduction

In cognitive science, psychology and neuroscience it is commonly understood that there is a coupling between perception and action.  Actions produce percepts and percepts precede actions.  The design of a cognitive architecture that incorporates this fundamental idea requires components that represent percepts and actions and are bi-directionally coupled. The Adaptron cognitive architecture represents patterns of these components using spatial and temporal binary neurons (binons). This paper explains how binons are combined, hierarchically structured and how they dynamically interact to perform the perception–action coupling.

2. Perception–Action Hierarchies

Figure 1 illustrates the classic behavioural Perception–Action hierarchies as illustrated by Joaquín M. Fuster [4]. In his terminology it is the “Sensory–Motor Perception–Action Cycle”. Stimuli from the environment originate on the bottom left via the senses.  They are recognized by the sensory hierarchy on the left. This produces percepts which are combined into more complex multi-modal percepts. Motor responses (actions) leave on the right via motor devices and have an effect on the environment.  The motor hierarchy on the right decomposes complex actions into more primitive actions. The horizontal arrows in between the two hierarchies associate actions with percepts and vice versa.

Figure 1

Figure 1. The Perception-Action Hierarchies.

A key characteristic of this structure is its compositionality. Lower level components representing percepts and actions are reused and combined into higher level components. Lower level components are more general, simple and primitive while higher levels are more specific, complex and sophisticated. At the lowest levels components represent properties, features and atomic acts while higher levels represent multimodal percepts and action patterns. Learning takes place from the bottom-up because higher level components are built from already learnt lower levels.

A mature perception–action structure takes years to learn.  There are three stages to the learning of percepts, actions and their couplings. They are: babbling, practicing, and automaticity. Babbling is initially used by babies to perform random motor and speech acts. Once the resulting percepts are reliably associated to these acts they are reused and incorporated into more complex action sequences. Actions that produce interesting or pleasant percepts are practiced until they too become reliable. Once learnt they become habits that can be done automatically without attentional control.  Habits are then reused and incorporated into more complex actions.

3. Binary Neurons (Binons)

Binary neurons (Binons), as illustrated in the binon structure diagram of Figure 2, are simple artificial neural nodes. They are composed of two source binons but are reused and shared by many target binons. Similar binary structures can be found in numerous cognitive architectures [6, 15]. Binons represent a relationship between their two source binons [8]. They contain a ratio value that captures the relationship property. They are discrete, uniquely identifiable representations and therefore symbolic, but they are not symbols. They can represent percepts, actions, and the relationships between them. Binons are also functional components that perform operations when activated and triggered. They are learnt and can be performed. Each binon is effectively a program designed to be used for recognition [11], action performance and behavioural control. Binons interact with each other to propagate waves of activation that result in the recognition of percepts and performance of actions.

Figure 2

Figure 2. Binon structure diagram.

The words “source” and “target” are used to describe the roles played by binons based on their level of complexity. Source binons are closer to sensors and motor/action devices and are simpler and more general.  Lowest level source binons represent the more primitive properties and features of percepts and acts. Two source binons are combined to form a target binon. Binons can be reused by multiple target binons. Target binons are complex and more specific.  They represent percepts and acts composed of simpler properties and features. The words “left” and “right” are used to describe the roles played by binons based on their left to right relationship.

Binons are deterministic.  There are no probabilities involved and there are no weights on the links.  This means that what they do and why they do it can be clearly explained. Binons form an un-supervised feedforward artificial neural network. There is no backpropagation to adjust weights on links as found in popular machine learning software. In Adaptron when an error is detected something novel has occurred and a new binon is created to represent the novel type of percept or action. When the same type of percept or action is re-experienced the binon becomes familiar and available for reuse. New binons can only be created when the two source binons are familiar. This means there is no combinatorial explosion of binons. Complex percepts and actions are only learnt when their components have been learnt and are familiar due to repetition.

There are two types of binons: spatial and temporal.  The “+” and “®” symbols appear under a binon to indicate these two types. For recognition, a spatial binon represents a pattern of percepts in which the directly or indirectly linked source percepts have occurred simultaneously (in parallel). Perceptra [11] is a supervised pattern classification deep learning program that converts non-symbolic stimulus values from sensors into spatial binons to represent percepts such as shapes. Shapes are composed of widths and widths are the number of adjacent sensors with the same intensity readings within a just noticeable difference. In Adaptron such binons are called recognition binons as illustrated in Figure 3. Activation of recognition binons flows from the bottom-up.  For performing acts a spatial binon represents a pattern of acts in which the directly or indirectly linked source acts are done simultaneously on two or more action devices. These are called action binons. Activation of spatial action binons flows from the top-down.  An underlined letter in a binon (R and A) indicates the role a binon is playing.

Figure 3

Figure 3. Recognition and Action spatial binons.

Temporal binons represent percepts and actions that take place in sequence.  They are necessary to recognize patterns such as speech but also to control sequences of actions. They are equivalent to production rules [13] as used in cognitive architectures such as ACT-R [16], and SOAR [9].  As a production rule, temporal binons represent a program that is a conditional statement of the form “When activated if triggered then perform operation.” When a temporal binon performs an operation it may include the activation of a source binon and/or the triggering of a target binon notifying it that the operation has finished.

4. Adaptron Cognitive Architecture

The Adaptron cognitive architecture in Figure 4 consists of two subsystems in the form of a two-layer architecture – one for representing and controlling behavioural experiences and the other for representing and controlling mental processes. Each subsystem consists of binons efficiently structured using compositional hierarchies [3, 10]. This is the same as the dual process theory in psychology [1, 5]. The behavioural layer contains memories of the experiences that result from acting in the world.  As such it is a model of reality represented as percepts and actions coupled together by control binons. The mental control layer sits on top and uses the memories in the behavioural layer as its world.  The mental hierarchies’ concepts result from recalling percepts and actions from the behavioural layer. And its operations activate the actions or recall the percepts and actions in the behavioural layer.

Figure 4

Figure 4. The Adaptron Cognitive Architecture.

Adaptron seeks novel experiences and avoids familiar ones. It performs random babbling acts when it is bored. Boredom is recognized when a familiar percept repeats. However, when a novel percept occurs Adaptron is surprised and memorizes the percept. It searches these memories for actions that can be done to achieve previously novel experiences. If any such action exists then the action is performed. If none can be found then Adaptron is bored and babbles. Seeking novel experiences produces exploratory behaviour driven by curiosity [14].

5. Behavioural Perception–Action Hierarchies

In the Adaptron cognitive architecture temporal action and expectation binons are used to associate actions with percepts and vice versa as in the perception–action hierarchies. As illustrated in Figure 5a, temporal action binons represent and control the percept to action associations. Their program is: “When activated if recognize a percept then do an action and trigger all the right target binons.” These are equivalent to command neurons in neuroscience [2]. They are the gatekeepers for controlling the beginning of overt action sequences and must be activated before action can be performed. They are equivalent to a transistor in electronics. They act as a switch or relay.

Figure 5 a&b

Figure 5. (a) Temporal action control binon; (b) Temporal expectation control binon.

As illustrated in Figure 5b, temporal expectation binons represent and control the action to percept associations. Their program is: “When activated if done an action then expect a percept and trigger all the right target binons.” This is the second part of the forward model in motor control [7, 12] in neural networks and neuroscience. The complete forward model says: “Given a percept and an action you can expect the next percept.”

The interaction diagram in Figure 6a captures the order in which the action control process takes place. The activate signals are messages or priming signals to enable a binon while the trigger signals are messages notifying the destination binon to perform its operation.  In psychology the Activate (do) is the same concept as preparatory motor set. After an action control binon has been activated it waits for its left source binon to be recognized. Then it can fire all of its outgoing links – the right source binon and all the right target binons. The notification of all the right targets is equivalent to corollary discharge in neuroscience. Nothing will happen if the action control binon is activated but the left source does not occur. Similarly, nothing will happen if the left source occurs but the control binon has not been activated. This dynamic control flow is the same for all temporal binons.  The wave of activity flows in sequence from left to right.

Figure 6 a&b

Figure 6. (a) Action control interaction diagram; (b) Example – baby hears the toy rattle and looks at the object.

An example of a baby hearing, seeing and reaching for a toy is used to show how this process works. Figure 6b illustrates the first action step in the task. The baby is actively listening. It hears a rattling noise and it looks for the object by moving its eyes.

After performing an action, some resulting percept is expected. The interaction diagram in Figure 7a captures the order in which this expectation control process takes place. The process starts when the expectation binon is activated (1). It is primed, waiting for the action to be performed. The action will be performed if it has been activated by a previous action control binon as in step 4 on Figure 6b.  After the action is performed (2) the expectation binon is triggered (3). The expectation binon then activates the recognition binon (4) to attend to the expected result. In psychology the Activate (attend) is the same concept as perceptual set. If this percept occurs (5) the recognition binon will trigger all its right target action binons (shown as signal (3) on Figure 6a). If the percept is not recognized any associated activated action control binon will not be triggered. Without waiting for the recognition binon to recognize its percept the expectation binon notifies all of its right targets that it is done (*6).  Any previously activated right targets will then be triggered. If there are no right targets to activate then the process is finished.

Figure 7 a&b

Figure 7. (a) Expectation control interaction diagram; (b) Example – baby looks at the object and sees a toy as expected.

As an example, in Figure 7b, once the baby’s eyes have moved to where the noise originated it is expecting to see the rattling object. What it sees is a toy.

Combining the previous action and expectation examples based on the common action (A Look at object) produces a perceptual sequence equivalent to the forward model. In Figure 8 it is represented at the top of the hierarchy by the temporal recognition binon (R See rattling object). It starts with hearing the rattle and ends with seeing the toy. This perceptual sequence can now be reused as a recognition binon in more complicated action control sequences.

Figure 8

Figure 8. Baby hears a rattle and sees a toy.

This structure also represents the information that is found in the backward or inverse model in motor control.  The inverse model says: “Given the current percept and a desired future percept then perform the action.” Once this perception–action hierarchy is learnt it must be reactivated if it is to be performed. This begins when the “R Rattle heard” percept is recognized. The rattle attracts attention.  The mental control layer of Adaptron recalls the “R Toy seen” percept and decides that this is a goal worth achieving.  To repeat the perceptual sequence the mental layer activates the hierarchy with the initial Activate (do) and Activate (attend) signals. As a result the hierarchy is then activated up the left side.

In Figure 9 the expectation binon (E Object) is combined with a new reach action control binon to produce an action sequence.  It is represented at the top of the hierarchy by a temporal action sequence binon (A Look and reach for toy). It starts with looking at the object and ends with moving a hand to the toy.

Figure 9

Figure 9. Baby looks at the object and reaches for the toy.

The perceptual sequence (R See rattling object from Figure 8) and action sequence (A Look and reach for toy from Figure 9) can now be reused in a more complicated action control sequence as illustrated in Figure 10. In Adaptron, hierarchies of control binons like these continue to grow as they are learnt, practiced and reused.

Figure 10

Figure 10. Combined recognition and action sequence.

Well-practiced habits can contain multiple action control hierarchies that are simultaneously activated.  Each one is waiting for its left source binon to be recognized.  This allows Adaptron to perform complex tasks while being ready to handle unexpected events. For example, walking down a hallway while avoiding obstacles and opening closed doors.  Walking is a stepping action that repeats the same perceptual–action sequence. At the same time the “avoid obstacle” and “open closed door” action control binons can be activated but not yet triggered.  They start when and if the triggering percepts of “obstacle seen” and “closed door encountered” are recognized.

6. Simulation results

The perception–action hierarchy using action and expectation binons has been successfully implemented in a simulated robot called “Smarty.” It explores any designed maze world given a configuration of sensors and wheels.  As illustrated in Figure 11, the simulation allows for a world to be designed on a grid of squares up to 8x8 in size. Each square is represented by a letter of the alphabet if symbolic stimulus recognition is being used.

Figure 11

Figure 11. Simulated robot world, sensors and action devices.

The robot body can be configured with sensors and action devices (wheels) as illustrated in Figure 12a and 12b. Sensors are cameras and a motion detector. Cameras can be configured to see a letter in a square, the distance to a block (magnitude, non-symbolic stimulus) or an adjacent block or empty square. The wheels can be configured to move one square in any direction or rotate at the current position. A proprioceptive wheel sensor can be enabled to detect the motion of the robot. If the robot tries to move forward into a block the action is shown as an “f” but the motion sensor will indicate no motion as a “–”. In this example the vision sensor values are non-symbolic magnitudes.

Figure 12 a&b

Figure 12. (a) Vision configuration; (b) Wheel and Motion configuration.

The results of the robot performing 29 “Go / Continue” steps is illustrated in Figure 13. It shows the robot starting to explore the given maze world. Note that at step 20 the robot tries to move forwards but did not move. The motion percept in step 21 is a “–” indicating no motion was recognized.

Figure 13

Figure 13. History of percepts and actions.

This simulation only implements the behavioural perception–action hierarchies. It does not implement the Adaptron mental control hierarchy. Instead it contains a procedure that;

  • selects babbling actions when percepts repeat,
  • memorizes novel percepts, actions and perceptual–action sequences,
  • repeats novel sequences so they become familiar,
  • searches these sequences for expected novel percepts when familiar percepts or sequences are recognized and
  • performs these sequences.

The display of some of the binons in memory and the processing of the most recent step are shown in Figure 14.

Figure 14

Figure 14. Binons and the results of processing the most recent step.

7. Conclusions

Spatial and temporal binons can be used to implement a functioning perception–action hierarchy.  The result is a transparent, compositional, unsupervised, continuously growing, deep learning artificial neural network.


[1] Evans, J. St. B. T. (2003) “In two minds: dual-process accounts of reasoning.” Trends in Cognitive Sciences 7 (10): 454–459.

[2] Ewert, J. (2003) “Command Neurons and Command Systems.” In Arbib, M. A. (Ed.), The Handbook of Brain Theory and Neural Networks, 2nd ed, Cambridge, Massachusetts: MIT Press 233–238.

[3] Fidler, S., Boben, M. & Leonardis, A. (2009) “Learning Hierarchical Compositional Representations of Object Structure.” in Dickinson, S., Leonardis, A., Schiele, B. and Tarr, M. (Eds.), Object Categorization: Computer and Human Vision Perspectives, New York: Cambridge University Press

[4] Fuster, J. M. (2003) Cortex and Mind, Unifying Cognition, Oxford University Press, New York

[5] Gawronski, B. and Creighton, L. A. (2013) “Dual Process Theories.” in Carlston, D. E. (Ed.) The Oxford handbook of social cognition, New York, NY: Oxford University Press 14: 282–312.

[6] Georgeon, O. L. and Ritter, F. E. (2012) “An intrinsically-motivated schema mechanism to model and simulate emergent cognition.” Cognitive Systems Research 15–16: 73–92.

[7] Grush, R. (2004) “The emulation theory of representation: Motor control, imagery, and perception.” Behavioral and Brain Sciences 27: 377–442.

[8] Halford, G. S., Wilson, W. H. and Phillips, S. (2010) “Relational knowledge: The foundation of higher cognition.” Trends in Cognitive Sciences 14 (11): 497–505.

[9] Laird, J. E. (2012) “The soar cognitive architecture.” Artificial Intelligence and Simulation of Behaviour Quarterly 171 (134): 224–235.

[10] Leonardis, A. and Fidler, S. (2007) “Learning hierarchical representations of object categories for robot vision.” International Symposium on Robotics Research 66: 99–110.

[11] Martensen, B. N. (2013) “Perceptra: A New Approach to Pattern Classification Using a Growing Network of Binary Neurons (Binons).” in West R. and Stewart, T. (Eds.), Proceedings of the 12th International Conference on Cognitive Modeling (ICCM 2013).

[12] Miall, R.C. and Wolpert, D.M. (1996) “Forward Models for Physiological Motor Control.” Neural Networks 9 (8): 1265–1279.

[13] Newell, A. (1973) “Production systems: Models of control structures.” in W. G. Chase (Ed.), Visual Information Processing, New York, Academic Press 463–526.

[14] Oudeyer, P. (2018) “Computational Theories of Curiosity-Driven Learning.” Computing Research Repository (CoRR) arXiv:1802.10546v2 [cs.AI].

[15] Ring, M. (1992) “Two Methods for Hierarchy Learning in Reinforcement Environments.” From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior (SAB’92) 148–155.

[16] Stewart, T. C. and West, R. L. (2006) “Deconstructing ACT-R.” in Proceedings of the Seventh International Conference on Cognitive Modeling 298–303.