Unlike conventional database systems, cognitive agents just want to recall what is most important based upon experience. This is similar to web search engines which seek to provide the results that are likely to be most relevant given the words in the search query.
- Forgetting as intrinsic memory decay or as interference from other memories, or some combination of the two?
- Exponential decay over time provides good fit to lab data
- But can also be ascribed to interference from new memories
- Priming effect on related memories as spreading activation
- We can recall memories from many years ago given the right cues
- New memories lack strong evidence for their lasting value
- Such evidence has to be acquired with experience
- What’s the most effective model for all of these points?
Underwood (1957) showed that memory loss is largely attributable to interference with other memories. Memories can thus be recalled after an interval of many years provided that the interference is small. This reflects experience in selecting memories that have been more valuable.
For ACT-R, the decay of activation is only one component of the activation equation. There is also a context component to activation which works to increase the activation of items based on the current context. Thus, even chunks which have decayed significantly over time can have activations above the threshold if they are strongly related to the current context.
Proposed approach for the chunks specification
- Chunks have parameters for an activation level and a timestamp
- Activation decays over time like a leaky capacity losing its charge
- Recalling or updating a chunk boosts its activation level
- Boost is weaker for closely spaced rehearsals – aka the spacing effect – and is based on the Logistic function
- Decaying wave spreads through linked chunks to boost related concepts
- Stochastic recall - chunks with higher activation levels are more likely to be recalled, but sometimes weaker chunks are recalled in place of stronger chunks
Spreading Activation
- Why is it easier to remember items in a group for groups with fewer items?
- A wave of spreading activation provides one possible explanation
- Activation of one item in the group spreads to other items in the same group following property links in both directions
- The amount of wave activation for each item is inversely related to the number of items in the group
- What is the underlying computational model for pulsed neural networks?
Here is an example:
# items belonging to group animals
item {word dog; group animals}
item {word horse; group animals}
item {word cat; group animals}
- Remembering the item for dog boosts the chunk for the group (animals) and spreads out to boost the other items in that group
- Does this depend on the property (in this case group) being the same?
- How can we implement this efficiently on conventional computers?
One implementation strategy is to have one index mapping from chunk IDs to chunks, and another index from chunk IDs to the set of chunk IDs for chunks that have the given ID as a property value. A further index maps chunk types to the set of IDs for chunks with that type. This requires care to ensure that the indexes are kept up to date in respect to adding and removing chunks from a graph, as well as when the chunk type or chunk properties are updated.
Here is an implementation in JavaScript:
// To mimic human learning and forgetting, the activation
// of a chunk is modelled as a leaky capacitor where charge
// is injected each time the chunk is recalled or updated,
// and then decays over time. Related chunks are primed with
// a fraction of the injected charge being divided across
// linked chunks in a wave of spreading activation until a
// cut-off threshold is reached.
// This algorithm follows links unidirectionally from
// properties to values, and needs to be extended to work
// bidirectionally using an new index that lists chunks
// with a type or property value equal to the given ID
graph.activate = function (chunk) {
// parameters for spreading activation
const base = 1.0;
const fraction = 0.5;
const cutoff = 1E-5;
const tau = 60000; // arbitrarily 1 minute as mS
// The spacing effect is that massed presentations have
// reduced novelty, and are less effective for learning.
// The logistic function is used to mimic the effect,
// mapping the time interval since the chunk was last
// recalled or updated to the boost in its activation,
// see: https://en.wikipedia.org/wiki/Logistic_function
function logistic () {
return (1 + Math.tanh(x/2))/2;
};
function prime (chunk, boost) {
chunk.activation += boost;
// spread activation through linked chunks
if (boost > cutoff) {
// determine list of linked chunks
let chunks = [];
let props = chunk.properties;
for (let name in props) {
if (props.hasOwnProperty(name)) {
let id = props[name];
if (typeof (id) === "string" && id[0] !== '"') {
let c = graph.chunks[id];
if (c)
chunks.push(c)
}
}
}
// prime the linked chunks
if (chunks.length) {
boost = boost*fraction/chunks.length;
for (let i = 0; i < chunks.length; ++i) {
prime (chunks[i], boost);
}
}
}
}
let now = Date.now()
let boost = base;
if (chunk.timestamp)
boost *= logistic(Math.log((now - chunk.timestamp)/tau));
chunk.timestamp = now;
prime(chunk, boost);
}
// used as part of stochastic recall of chunks where
// where stronger chunks are more likely to be selected
// This implementation uses the Box–Muller algorithm
graph.gaussian = function (stdev) {
const epsilon = 1E-20;
const TwoPI = 2 * Math.PI;
let u1, u2;
do {
u1 = Math.random();
u2 = Math.random();
} while (u1 < epsilon);
return stdev*Math.sqrt(-2*Math.log(u1))*Math.cos(TwoPI*u2);
};
Chunk recall first identifies matching chunks and for each match, applies gaussian noise to the chunk's activation level, and selects the matching chunk with the highest resulting score. The selected chunk is activated as above. Selection fails if the score is below a given threshold.
The gaussian distribution is centred around zero and drops off for negative and positive numbers. The graph.gaussian function above on average returns values close to zero, and more rarely large negative or positive numbers.

To apply gaussian noise to an activation level, multiply the level by e raised to the power of the noise value computed from graph.gaussian. The standard deviation should be a system wide constant.
For the memory test task, the successfully recalled items in the test are treated as an iteration (see @do next). Rules then have access to the number of items recalled as well as to the sequence of items. Items may failed to be recalled if their activation level is low, or if the stochastic noise depresses the score below the threshold.
Summary
Human memory is functionally modelled in terms of a graph of chunks where each chunk is associated with an activation level and a timestamp. Activation decays exponentially with time (like a leaky capacitor), but is boosted by recall or update, and via spreading activation in both directions through links between chunks. Recall is stochastic with noise being applied to the chunk activation level before comparison with a cut-off threshold.
Unlike conventional database systems, cognitive agents just want to recall what is most important based upon experience. This is similar to web search engines which seek to provide the results that are likely to be most relevant given the words in the search query.
Underwood (1957) showed that memory loss is largely attributable to interference with other memories. Memories can thus be recalled after an interval of many years provided that the interference is small. This reflects experience in selecting memories that have been more valuable.
For ACT-R, the decay of activation is only one component of the activation equation. There is also a context component to activation which works to increase the activation of items based on the current context. Thus, even chunks which have decayed significantly over time can have activations above the threshold if they are strongly related to the current context.
Proposed approach for the chunks specification
Spreading Activation
Here is an example:
One implementation strategy is to have one index mapping from chunk IDs to chunks, and another index from chunk IDs to the set of chunk IDs for chunks that have the given ID as a property value. A further index maps chunk types to the set of IDs for chunks with that type. This requires care to ensure that the indexes are kept up to date in respect to adding and removing chunks from a graph, as well as when the chunk type or chunk properties are updated.
Here is an implementation in JavaScript:
Chunk recall first identifies matching chunks and for each match, applies gaussian noise to the chunk's activation level, and selects the matching chunk with the highest resulting score. The selected chunk is activated as above. Selection fails if the score is below a given threshold.
The gaussian distribution is centred around zero and drops off for negative and positive numbers. The graph.gaussian function above on average returns values close to zero, and more rarely large negative or positive numbers.
To apply gaussian noise to an activation level, multiply the level by e raised to the power of the noise value computed from graph.gaussian. The standard deviation should be a system wide constant.
For the memory test task, the successfully recalled items in the test are treated as an iteration (see
@do next). Rules then have access to the number of items recalled as well as to the sequence of items. Items may failed to be recalled if their activation level is low, or if the stochastic noise depresses the score below the threshold.Summary
Human memory is functionally modelled in terms of a graph of chunks where each chunk is associated with an activation level and a timestamp. Activation decays exponentially with time (like a leaky capacitor), but is boosted by recall or update, and via spreading activation in both directions through links between chunks. Recall is stochastic with noise being applied to the chunk activation level before comparison with a cut-off threshold.