Age of Acquisition Effect: Evidence From Single-Word Reading and Neural Networks

Introduction: Many studies show that words learned early in life are read more easily than the ones learned later and are less vulnerable to brain damage. Methods: the first part of the current study, 25 primary school students in the 5th grade read the word groups learned initially during a previous grade. The words used in the experiments were 327 Farsi monosyllable words matched on the other factors involved in Farsi word naming. Results: The analysis of covariance (the consistency and frequency as covariates) showed that words learned in earlier grades were read more easily than the ones learned later, showing the known effect of the Age of Acquisition (AoA). In the second part of the study, it was tried to simulate AoA in word naming by a neural network model developed earlier based on connectionist approach. While previous studies used random patterns, in the current study words from primary school books were used. Likewise, words learned early by the model were read better than words learned later. However, there was a failure in replicating previous simulation of AoA in English reading by an algorithm called Quick prop for Farsi. In addition, the model was lesioned by removing some hidden units to see its effect on word reading. As a result, words learned earlier were less vulnerable to damage compared with the ones learned later. Conclusion: The findings showed that words learned earlier, compared to those learned later, were read better and were less vulnerable to damage. These effects are explained by considering the nature of learning in neural networks trained by error back-propagation.


Introduction
f the important variables in word recognition such as consistency, frequency, and word length are equal, it is shown that still the words learned early in life are read better than the ones learned later and are less vulnerable to brain damages. These effects are also shown in other cognitive processes including object naming, face recognition, and spoken word recognition (Morrison & Ellis, 1995;Gerhand & Barry, 1998;Zevin & Seidenberg, 2004;Izura et al., 2011;Wilson, Ellis, & Burani, 2012).
In several cognitive domains, early learning can result in a decrease in plasticity, which limits the ability to acquire new information. Phonological acquisition is a good example (Werker & Tees,1984), i.e. learning the phonological structure of a language lessens the ability to learn new phonetic contrasts (e.g. in a second language).
Likewise, studies show that the ability to learn the morphology and syntax of a language drops steadily after approximately seven years of age (Flege, Yeni-Komshian, & Liu, 1999). However, other faculties such as lexical acquisition do not seem to be equally age-dependent (Markson & Bloom, 1997;McCandliss, Posner, & Givon, 1997). Carroll and White (1973) found a relationship between word learning age and later age processing speed. They showed that object naming latency had a high correlation with the age at which children learn the different object names. Through multiple regression analysis they further found that age of acquisition was the only significant independent variable to predict the naming latency.
The better performance in the case of words learned in early childhood is not due to more exposure (called frequency) in that time and later (called cumulative frequency), but has its own effect. Therefore, for example, words learned in early grades of primary school (e.g. grades 1 and 2) are recognized and read better than the words learned later (e.g. grades 4 and 5). There are other important factors involved in word recognition such as frequency, consistency (Coltheart's N), and length of words (Coltheart, Davelaar, Jonasson, & Besner, 1997;Seidenberg, 1985;Cited in Sohrabi, 1999). The highly frequent and/or consistent words are better recognized than the low frequent and/or inconsistent ones. Similar to many cognitive processes, word reading is simulated and explained well by connectionist model or artificial neural network (Seidenberg & McClelland, 1989;Zorzi, Houghton, & Butterworth, 1998;Sohrabi, 2001;Zevin & Seidenberg, 2004).Thus the current study aimed at investigating word naming in human subjects and simulating the connectionist models.

Plain Language Summary
Both children and artificial computerized brains known as neural networks read words that appeared at early grades better than the words appeared at later grades. This reflects an age acquisition effect. However, when the model uses quick prop algorithm, it fails to show the same effect with Farsi words, unlike the previous studies results on English words. Moreover, when some neurons in the artificial model are destroyed, the words learned earlier are less affected by damage than the words learned later. These effects are explained by the mathematical nature of the learning in artificial brains.

Investigating the age of acquisition effects by human data in primary school
This part of the study aimed at showing an effect of Age of Acquisition (AoA) controlling other important factors in naming words in primary school books in the 5 th grade students.

Materials
The items were 327 words from primary school books. All words were grouped based on their initial appearance in one of the five grades. In addition, two other important factors were also considered: consistency by Coltheart N, frequency by objective measurement in primary school books as a good estimate of words that children of a small town in Iran are exposed to in primary school. All words were monosyllable.

Participants
The study was conducted on 25 subjects of the fifth grade nearly at the end of the academic year. All participants were male and had no problems in speech and vision. At the end of the experiment, playing a computer game and an ice-cream were offered to all participants.

Procedure
All words were printed in an MS-Windows font with size 100 points each in a 3"×6" card. The cards were put in random order. Subjects, sat in front of the experimenter, one by one, in a quiet room and all words were presented to them after the following instruction: "You will see some words from the primary school, each on a card. Read them aloud, as soon as possible, when they are shown one by one". The experiment took about 25 minutes for each subject. The reading errors were recorded by an assistant on a three-level scale: Failure (3), failure with correction (2) and considerable delay (1).

Methods
Neurocomputational modeling was employed to simulate and explain cognitive processes (McClelland, Mc-Naughton, & O'reilly, 1995;Seidenberg & McClelland, 1989;Zorzi et al., 1998;Zevin & Seidenberg, 2004;Sohrabi & West, 2009;Ludvig, Sutton & Kehoe, 2012) and lower and higher level neural functions (Daneshparvar & Daliri, 2012;Soltanzadeh & Daliri, 2014;Friston & Frith, 2015). However, the AoA was not modeled until recent years (Ellis & Lambon-Ralph, 2000). Previously, McClelland et al. (1995) simulated the graded improvement of children's lexicon. They examined the outcomes of adding a novel concept (penguin) after training the network. This was done using either focused or interleaved learning. With focused learning, the system is presented to new knowledge without interleaving with old knowledge, i.e. no further exposure to the earlier training set. Under this condition, information about penguins was learned rapidly but at the cost of the pre-existing knowledge. In other words, the model underwent catastrophic interference. However, in the case of interleaved learning, i.e. the model being exposed to the old material alongside the new, the new information was learned without costing the old. (2000), based on such findings, studied the AoA effect on random patterns through some simulations. They showed that AoA effect was different from cumulative frequency and attributed it to inevitable consequence of losing flexibility in artificial neural networks as it is the case for matured subjects. This is due to error back-propagation algorithm that changes the connections weights at the early training more than later training. Therefore, early training has more effects than late one. In this part of the study, using words from primary school books with their real frequency, some simulations were carried out to compare with those of human data.

Architecture of the model
A distributed connectionist model based on Seidenberg and McClelland (1989) was used adopted for the Farsi language in a previous work (Sohrabi, 2001). In such models neuron-like units are used to connect input (letters) and output (phonemes). In the model, letters and phonemes are distributed among words, for each one a certain pattern of activation is involved. The input layer is 33 letters in the Farsi language with Arabic script and its output layer is 28 phonemes (in contrast to English, Farsi has less phonemes than letters). For better representation of rimes that have important roles in reading Farsi (Sohrabi, 1999), similar to that of English (Coltheart et al., 1997), four slots of units were considered for input and output layers similar to that of Zorzi (1998) and Sohrabi (2001).
Additionally, since Farsi has deep (quasi-regular) orthography (Sohrabi, 1999), a hidden layer including 100 units was used. Thus, a model with an input layer of 132 units and an output layer of 112 units was used in all simulations (Figure 1). At the start of the training, connections were weighted initially by random values between +0.5 and -0.5. Then the model was trained by error back-propagation algorithm (Rumelhart, Hinton, & Williams, 1986). The learning rate was 0.05 in all simulations except for 1, 2, and 3 in which it was 0.01. The momentum value that speeds up the learning was 0.9 in all simulations, but simulations 5, in which it was omitted to see its effect.

Human data
Mean errors for each word made by all subjects were analyzed by analysis of covariance (item analysis) as a dependent factor. Five grades at which the words were initially learned made the five levels of fixed factor.
Consistency and frequency were two covariates found as important factors in reading Farsi words by a multiple regression analysis (Sohrabi, 1999).The frequency, as a covariate, was not significant, but was included in the analyses to control variance; as shown in Tables 1 and 2 and Figure 2. The consistency covariate was significant; therefore, AoA was mainly due to inconsistent words.
As can be observed in Tables 1 and 2, word initially learned in early grades had less errors compared to the ones learned in later grades. The effect of word groups was significant (F 4,320 =7.757; MSE=0.17; P<0.001).
Though the differences between immediate grades were not significant (except for 4 and 5), other contrasts were significant. Additionally, there was a significant linear trend for AoA effect. Controlling other factors, the earlier the word, the less the error. The model based on Seidenberg and McClelland (1989) was used, which was adopted for the Farsi language in a previous work (Sohrabi, 2001).

Simulation 1
This simulation was intended to show an effect of AoA even by controlling the frequency effect. First, the model was trained for 150 epochs on 82 early words and then for 150 epochs on 82 early words (from grade 1) as well as 87 late words (from grade 2). The late words were trained twice, to be equal to the early ones in frequency. The words were trained in a random order. The result showed that early words had significantly less errors than the late ones ( Figure 3).

Simulation 2
This simulation was the same as simulation 1. But here, training the mixed words groups continued twice, as shown in Figure 4. The effect of AoA remained superior after several epochs as shown in human beings after some decades (Ellis & Morrison, 1998).

Simulation 3
The aim of this simulation was to control the possible difference between the selected two groups. Thus, the order to train the two groups reversed. Again, the gain for the early ones was observed ( Figure 5).

Simulation 4
This simulation was similar to simulation 1, but the numbers of epochs in both groups were twice, and the learning rate was 0.05. As shown in Figure 6, the error of early was much less than that of late and independent of the learning rate. Figure 6. The Errors for Early and Late After 300 Epochs With Learning Rate 0.05

Simulation 5
Here, the aim was to control the momentum, the value that speeds up the learning. Without momentum, after 3000 epochs (1500 early and 1500 mixed), AoA effect was observed similar to the other simulations, as shown in Figure 7.

Simulation 6: Relationship between AoA and brain damage
In this simulation, the relationship between brain damage and AoA (Ellis, & Morrison, 1998) was simulated. The trained model in simulation 1 was lesioned at three levels of severity (5%, 10%, and 20%). Two methods of lesioning were used (both had the same effects): Setting weights to 0, and adding small number of noises to hidden neurons. In each one of the 20 samples, a random portion of neurons in the hidden layer was lesioned at each of the three levels of severity. The obtained result showed that errors raised in the late than the early group. And there was an increase in error, as a function of severity of lesion. Thus, as observed in the diseases such as aphasia and dementia, the effect of damage in the late experience was more than that of an early one (Figure 8).

Simulation 7: The nature of AoA effect
This is about the nature of AoA effect. As noted earlier, there is more change in the start of learning by error back-propagation due to initial random weights and using sigmoid activation function. Activation is in its highest level when the inputs are around zero (thus at the beginning of learning and in the case of early items) and there is little space to change as learning progresses (Figure 9). (2000), but not practically shown. Here, with another algorithm called Quick prop (Fahlman, 1987), the simulation 1 was replicated. Since this algorithm prevents the activation of function from moving to its extreme value, it causes flexibility in the network and eliminates AoA ( Figure 10). But Ellis and Rambon-Ralph (2000) showed AoA even using this algorithm and it is, presumably, due to using random patterns instead of real words.

Discussion
In the first part of the current study, the effect of AoA on word reading and controlling other factors was shown in the 5 th grade students. They read words initially learned in the first grades better than the ones learned in the later grades. The consistency covariate was significant and consistent with the assumption of Zevin and Seidenberg (2004). Thus, AoA is mainly due to inconsistent words. Age of acquisition affects the results due to a loss of plasticity related to successful mastering of a task. This phenomenon can occur in several types of learning tasks including reading. It happens due to an interaction between AoA and consistency shown by covariance analysis in the current study.
When a new word is learned after an old one, it can catastrophically affect the learning of the first one if the first one is not presented even for refreshing its learning. But if the first word is presented alongside the second one, it keeps its superiority even after a long time.
As an explanation of AoA by connectionist models, it is noteworthy that in such models, the weights are initially assigned to random values and output units take values of 1 or 0. The weight adjustments that occur during back-propagation with a logistic activation function are proportional to the unit activation (Sohrabi, 2002).Thus, these adjustments mostly occur with the activations in the middle of the logistic function (inputs are around 0), as it happens when small random weights are used for early initialization of the network. Therefore, the plasticity is involved in learning when the early-trained patterns are lost. This effect resembles inflexibility of human brain to new learnings such as infrequent and new words, despite its great flexibility during this age following neurofeedback (Rahmati, Rostami, Zali, Nowicki, & Zare, 2014) and education (Klingberg, 2010;Kranfnick et al., 2011).
When the model was lesioned by removing some hidden units to see its effect on word reading, words learned earlier were less vulnerable to damage compared to the ones learned later. These effects can be explained by considering the nature of learning in neural networks trained by error back-propagation as mentioned above, i e, early learning uses more resources and leaves less for later learning, making it prone to be destroyed by damages.Moreover, there was a failure in replicating previous simulation of AoA in English reading by an algorithm called Quick prop (Fahlman, 1989) for Farsi, The slope of the function is greatest, when its input is around 0 as in the start of training, while it decreases in both positive and negative directions as training proceeds.

Net Input
Activation +5 +5 0 1 presumably due to the flexibility in the network that affects AoA.
This happened in simulating AoA in Farsi, perhaps it is more regular than English. But Ellis and Lambon-Ralph (2000) showed AoA even using this algorithm and it is, presumably, due to using random patterns instead of real words. The problem with using random patterns was also shown by Zevin and Seidenberg (2004). So, the relationship between input and output are arbitrary instead of being quasi-regular as in the case of word naming. This gets even more important knowing that the AoA effect mainly occurs in irregular domains, as in less predictable Chinese characters (Chen, Zhou, Dunlap & Perfetti, 2007), less practiced reading aloud (Zevin & Seidenberg, 2004), and less regular stress in Italian word reading (Wilson et al., 2012).

Compliance with ethical guidelines
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors.

Conflict of interest
The author declared no conflict of interest.