Friday, July 22, 2011

A Fundamental Flaw in Numenta's HTM Design, Part II

Part I, II

Abstract

In Part I, I wrote that, in Rebel Cortex (RC), only the bottom level of the memory hierarchy handles concurrent inputs and that all sequences have a maximum capacity of seven nodes. I claimed that Numenta's HTM design is fundamentally flawed because, unlike the RC memory model, every level in Numenta's memory hierarchy handles both concurrent and sequential signals. I revealed the surprising source of my knowledge about the brain's memory architecture and I promised to answer two questions: a) why should only the bottom level of the memory hierarchy receive concurrent signals; and b) why should all sequences have a maximum capacity of seven nodes? So here are my answers, as promised.

Visual Recognition Is a Sequential and Predictive Process

Most visual recognition researchers have a more or less static view of vision. This is true even among those who profess a belief in the fundamental temporal nature of sensory learning and recognition. The usual assumption is that visual processors at the bottom level of the memory hierarchy recognize only small areas of the image and that, as one goes up the hierarchy, bigger and bigger areas are recognized in terms of lower level patches, and so on. At the top of the hierarchy, the entire scene is recognized all at once. Here is how Jeff Hawkins and Dileep George of Numenta describe (pdf) their hierarchical model:
The level 1 modules have small receptive fields compared to the size of the total image, i.e., these modules receive their inputs from a small patch of the visual field. Several such level 1 modules tile the visual field, possibly with overlap. A module at level 2 is connected to several adjoining level 1 modules below. Thus a level 2 module covers more of the visual field compared to a level 1 module. However, a level 2 module gets it information only through a level 1 module. This pattern is repeated in the hierarchy. Thus the receptive field sizes increase as one goes up the hierarchy. The module at the root of the tree covers the entire visual field, by pooling inputs from its child modules.
This seems to make perfect sense and it would appear that this is the way it should work in the brain's visual cortex. But is this really what happens when we look at a scene? Do we really see an entire visual field in terms of smaller receptive fields? My experience is that it is simply not true. In fact, there is every reason to doubt the claim that the size of the visual receptive field changes at all as one goes up the hierarchy. Let's take the above paragraph as an example. Try as we may, we find it impossible to see a sentence all at once, let alone the entire paragraph or the entire computer screen. Even though our peripheral vision allows us to sense a big picture, our visual acuity is limited to a very small part of the visual field at a time. The size of this small part is invariant. It is a fallacy that we see a big pattern in terms of dozens or hundreds of small patterns.

The above begs the question, how do we understand a visual scene if we can't combine small pieces into bigger pieces? I believe that visual understanding, like everything else, is entirely dependent on the temporal expectations that we have of our environment. In this light, visual recognition is not unlike speech or music recognition. A spoken sentence or a musical tune is a sequence of sequences. We never sense a whole sentence or a whole song at once but only snippets at a time. Certainly, at the bottom level of the auditory hierarchy, we find many sequences of learned patterns, each of which consists of a set of concurrent signals generated by various audio sensors tuned to different frequencies. All concurrent patterns reside at the bottom level. This is true of both the visual and auditory cortices.

So yes, of course, there is a hierarchy, but it is not a hierarchy of patterns (see note below) but one of sequences. The main difference between audio and visual recognition is that the latter requires that we frequently move our gaze from one spot of the visual field to another, i.e., from one pattern to another. If Hawkins et al were correct, eye movements would be rarely necessary. Hawkins should know better because the ability to make predictions is the cornerstone of his theory of intelligence.

Note: There is a pattern hierarchy in the brain but it is used for learning and recognition purposes, not for enlarging the receptive field. 

Sequences of Sequences

The two main functions of hierarchical memory are to classify sensory knowledge and to predict the future. Classification is needed to make predictions. It consists of grouping incoming sensory signals into various families and sub families. There are only two kinds of signal families: concurrent and sequential. A concurrent family is what some call a pattern. Hawkins and others refer to it as a spatial pattern but I think it is a seriously confusing misnomer since there is nothing spatial about it. From the point of view of making predictions, a pattern is a single and unique event in time.


In the Rebel Cortex model (see illustration below), a bottom level node (BLN) is just a pattern. Every upper level node (ULN) is a sequence of lower level nodes.

Rebel Cortex Memory Hierarchy

It is easy to understand a sequence of patterns. Every pattern can be seen as a single discrete signal arriving one after another. But what is a sequence of sequences? Can two succeeding sequences overlap? I'll return to this important topic in an upcoming post.

Fundamental Building Blocks of Memory

In order to facilitate the creation of as many combinations of sequences as possible, we should make them as short as possible, i.e., with just two nodes. The problem with short sequences, however, is that they require many levels in the hierarchy. As such, they slow down learning, recognition and prediction. At the other extreme, long sequences would be too coarse and would result in overlooked combinations. It is a good bet to suppose that the human cortex uses seven node sequences. After all, psychology teaches us that human short-term memory has a capacity of about seven items. But why seven? I think seven is a compromise, one that is fine-grained enough without being too taxing on the nervous system. However, I chose to use seven-node sequences in RC for a different reason altogether. My choice had to do with my being a Christian and, more specifically, with my on-going research in decoding certain ancient Biblical metaphorical texts. There's more to come. Stay tuned.

Menorah

See Also:

How Jeff Hawkins Reneged on his Own Principles
Invariant Visual Recognition, Patterns and Sequences
Missing Pieces in Numenta's Memory Model
Rebel Cortex

Monday, July 18, 2011

A Fundamental Flaw in Numenta's HTM Design, Part I

Part I, II

Abstract

A couple of days ago, I reread Numenta's latest document (pdf) on HTM. I thought it was strange that there was no mention of the maximum number of nodes that an HTM sequence is allowed to have. I thought it was strange because a sequence in Rebel Cortex (RC) has a maximum of seven nodes. I tried to understand the reason for the lack of a maximum sequence size in HTM and failed. Then I began to meditate about all the differences between HTM and RC. That's when I noticed another fundamental difference: every level in Numenta's knowledge hierarchy works exactly the same way. This is not true in Rebel Cortex. I explain why below.

Nodes and Sequences of Nodes

There are two types of nodes in RC, bottom level and upper level. A bottom level node (BLN) is a group of concurrent sensory inputs that are connected to the bottom level sequences of the memory hierarchy. A BLN is similar to what Numenta calls a spatial pattern in HTM. There is no limit to the number of concurrent inputs a BLN can have. A bottom level sequence can have up to seven BLNs. An upper level node (ULN), by contrast, is just a sequence of seven lower-level nodes. Again, there are no concurrent inputs in the upper levels of the hierarchy.

Rebel Cortex Memory Hierarchy

The bottom level, level 0, receives its inputs indirectly from the sensory layer by way of the signal separation layer (see RC Document for a description of the SSL). Only the bottom level nodes receive concurrent inputs. The question that comes to mind is, why only the bottom level? The answer is not immediately obvious; otherwise the super-smart and math-savvy intelligentsia at Numenta and elsewhere would have discovered it. Not that I am any smarter, mind you, not by a long shot. I did not discover it either. So before I reveal the answer to the question I posed, let me explain how I came to know that it was the way to go long before I understood the actual reason. Check out what happens when you flip the image vertically.


What you see above is a faithful representation of the metaphorical vision of the golden lampstand (menorah) by the old testament prophet, Zechariah, whose name means 'Yahweh remembers' in Hebrew. Here's how Zechariah described the symbolic lampstand:
Zechariah 4:2
And he said unto me, What seest thou? And I said, I have seen, and, behold, a candlestick all of gold, with its bowl upon the top of it, and its seven lamps thereon; there are seven pipes to each of the lamps, which are upon the top thereof. (New American Standard Bible)
The only thing that is missing from the diagram is the bowl, the meaning of which I will explain in a future post. The design of Zechariah's lampstand is so strange that several translators, including the author of the King James Version, mistakenly assumed that he meant to write that he saw seven pipes, one for each of the seven lamps, which would be the description of a normal seven-branch menorah. Indeed, why would every lamp need seven pipes? Nevertheless, that is what Zechariah described.

Menorah

Here is the English Standard Version:
Zechariah 4:2
And he said to me, "What do you see?" I said, "I see, and behold, a lampstand all of gold, with a bowl on the top of it, and seven lamps on it, with seven lips on each of the lamps that are on the top of it.
There are more translations like these in many languages. They use words like pipes, tubes, channels or conduits to describe the seven ducts attached to each of the seven lamps. I figured out many years ago (around 2002) that Zechariah's vision was a symbolic description of the brain's memory architecture. The clues were unmistakable then. Any lingering doubts that I had in the beginning have long since vanished. Now it's mostly a matter of implementation. I am coming out publicly with all of it because I want there to be a record of it all, or prior art, as intellectual property lawyers would call it. To those folks out there who want to corner the market in true artificial intelligence by acquiring a huge patent portfolio, all I can say is this: it will not work.

Coming Up

I think that Numenta's HTM design is fundamentally flawed. In Part II, I will explain why the upper level nodes of the memory hierarchy must not receive concurrent inputs. I will also explain why memory sequences are limited to seven nodes.

See Also:

How Jeff Hawkins Reneged on his Own Principles
Jeff Hawkins Is Close to Something Big
The Rebel Science Speech Recognition Project

Monday, July 4, 2011

Rebel Cortex

My New AI Project

I started a new AI research project. It's called Rebel Cortex (pdf). After much consideration, I decided that the best way to demonstrate my intelligence hypothesis (and raise some cash) is to implement a visual recognition program. Don't worry; I will not abandon my old computer chess project. It's just that I noticed that several companies are coming out with sophisticated visual recognition products while new Silicon Valley startups (e.g., Vicarious Systems, Inc.) with ambitious AI programs are being funded. I think I can do better. I think I can design a visual recognition system that will blow everybody else out of the water.

I strongly disagree with the notion held by researchers like Dileep George and others, that the solution to human-level visual recognition requires a mathematical model of the cortex. I think that mathematics is a false god, especially in brain research, because it explains nothing. It only confuses. But then again, I got my own unconventional source of knowledge about the brain that mainstream researchers like Dr. George would not touch with a ten-foot pole. In my opinion, what we have here is a race to see who will be the first to crack the AI nut. May the best approach win.

Rebel Cortex will be unlike any other visual recognition system out there. For starters, it will include a circular retina and an integrated motor cortex. Please read the preliminary design document (pdf) for Rebel Cortex and write to me to let me know if you're interested in participating in this project and in what capacity. The document is not yet completed but there is enough to give one an idea of where I am going with this project.

PS. I plan to start a new discussion forum under the same name in a few days. Keep an eye on this blog for periodic updates.

See Also:

Missing Pieces in Numenta's Memory Model
A Fundamental Flaw in Numenta's HTM Design
Invariant Visual Recognition, Patterns and Sequences