Language skills in student essays: social disparities and later educational attainment

View author details View Less
  • 1 Goethe University Frankfurt, , Germany
Full Access
Get eTOC alerts
Rights and permissions Cite this article

This article examines the role of language skills in socially stratified educational attainment. Using essays written at the age of 11 in a large British cohort study, the National Child Development Study (NCDS), two measures of written language skills are derived: lexical diversity and the number of spelling and grammar errors. Results show that participants from the lower social strata misspelt more words and used a smaller variety of words in their essays than more socially privileged cohort members. Those language skills mediate part of the association between social origin and the highest level of educational attainment achieved. An even higher mediation of about half can be observed if standardised test measures for verbal and non-verbal cognitive abilities are included in the model. The models show that language skills mediate the social origin effect on educational attainment by about a quarter.

Abstract

This article examines the role of language skills in socially stratified educational attainment. Using essays written at the age of 11 in a large British cohort study, the National Child Development Study (NCDS), two measures of written language skills are derived: lexical diversity and the number of spelling and grammar errors. Results show that participants from the lower social strata misspelt more words and used a smaller variety of words in their essays than more socially privileged cohort members. Those language skills mediate part of the association between social origin and the highest level of educational attainment achieved. An even higher mediation of about half can be observed if standardised test measures for verbal and non-verbal cognitive abilities are included in the model. The models show that language skills mediate the social origin effect on educational attainment by about a quarter.

Key messages

  • Student essays are used to measure language skills.

  • Socially disadvantaged students’ essays contain more errors.

  • Essays from socially advantaged cohort members are more lexically diverse.

  • Language skills mediate part of the social origin effect on educational attainment.

Introduction

The intergenerational reproduction of social inequality is strongly driven by educational attainment (Blau and Duncan, 1967). Children from families with high socio-economic status (SES) have higher educational outcomes on average (Machin and Vignoles, 2004; Sirin, 2005; Bukodi and Goldthorpe, 2013). Those higher educational outcomes are found to fully or partly explain the intergenerational transmission of social status (Breen and Karlson, 2014; Sullivan et al, 2018). Social inequality in the highest educational level achieved is often explained by socially unequal educational decision making and socially unequal skills (Boudon, 1974). Studies have reported that differences in cognitive skills can partially explain socially unequal educational attainment (Bukodi et al, 2014; Erikson, 2016; Bourne et al, 2018; Betthäuser et al, 2020). In those studies, mediating skills comprise verbal and non-verbal skills. Differences in language use and skills according to social background are well documented. For Bourdieu and Bernstein, the social use of language is related to educational outcomes and social reproduction (Bernstein, 1971; Bourdieu et al, 1994). Bernstein argues that the transmission of a specific usage of language at home leads to higher educational qualifications of students from the middle class compared to those from working-class backgrounds (Bernstein, 1971; 1975).

Previous studies of the relationship between language skills and socially unequal educational outcomes have focused on short-term consequences (Durham et al, 2007; Spencer et al, 2017). Little research has been conducted on the long-term consequences of socially unequal language skills in mediating the effect of social origin on educational attainment. Empirical research focusing on the role of skills in mediating the relationship between social origin and educational attainment often does not differentiate between verbal and non-verbal cognitive skills. This paper empirically examines the long-term consequences of socially differentiated language skills by analysing the highest educational qualification achieved. Studies based on a large number of observations often use standardised testing procedures to determine children’s language skills. In contrast to this approach, this study also analyses student essays written by cohort members in the British National Child Development Study (NCDS). Expressive language skills in the essays are measured by lexical diversity and the number of spelling and grammar errors. This study seeks to address the following questions: How are the written language skills of children socially stratified? And to what extent do expressive language skills in childhood mediate the influence of social origin on educational attainment?

Children are raised under different socialisation conditions (Becker, 2011; Ermisch, 2008; McNally et al, 2019), which can lead to disparate outcomes in abilities. For example, toddlers in socially disadvantaged families are, on average, exposed to fewer words and a less lexically diverse vocabulary during the phase of language acquisition (Hart and Risley, 1995; Hoff, 2003). This results in a smaller vocabulary and less efficient speech processing for children from lower SES backgrounds (Fernald et al, 2013). Differences continue in later stages of life: social differences in students’ language abilities are apparent in preschool, primary and secondary education (Dämmrich and Triventi, 2018). For example, socio-economic differentials in standardised vocabulary tests have been reported for ages three and five, in young children and adolescents (Becker, 2011; Spencer et al, 2012; Sullivan and Brown, 2013; Sullivan et al, 2017; McAvinue, 2018).

In contrast to standardised tests, social differences in language use can also be observed in children’s written output. Richardson et al (1976) analysed a subsample of the written essays in the NCDS and found social class differences for essay length but not for a mean T-unit length (a way of measuring the length of sentences). Similar results were found by Lawton (1963), who worked with a different sample of essays written by boys ‘matched for verbal and non-verbal intelligence on Raven’s Progressive Matrices and the Mill Hill Vocabulary Scale’ (Lawton, 1963: 120). In addition, Lawton identified more diverse vocabulary use by middle-class boys compared to those with working-class backgrounds.

Socially differentiated use of language has already been addressed in classical theories, for example by Bernstein, Labov and Bourdieu. Bernstein argued that specific forms of language can be observed, for example formal and public language. The latter ‘is a form of language use which can be marked off from other forms by the rigidity of its syntax and the restricted use of formal possibilities for verbal organisation’ (Bernstein, 1961: 169). This type of language use can be observed more frequently in the lower social strata. For Bernstein, children’s use of language is linked to learning processes and educational outcomes and can thus explain socially unequal educational achievements. Labov (1970) criticised Bernstein’s work and argued that the use of language according to social origin should be viewed differentially rather than as deficient. Bourdieu (1977) argues in a similar vein, but also describes a stratification of language products, which he links to socialisation experiences. Both Bourdieu and Bernstein point to the connection between language use and academic success (Bernstein, 1961; 1971; Bourdieu et al, 1994), especially since language skills are a basic requirement for learning. The transfer of knowledge and skills usually takes place through linguistic products – written or oral. In other words ‘the currency of education is language and it is the medium of knowledge transmission’ as noted by Grenfell (2011: 39) in reference to Bourdieu.

Indeed, studies indicate that students with better language skills perform better in school and achieve higher educational qualifications (Strand, 2006; Parsons et al, 2011; Spencer et al, 2017; Schuth et al, 2017). The language abilities of young adolescents in Britain, measured by tests on grammar, receptive vocabulary and expression, correlate with later GSCE grades at age 16 (Spencer et al, 2017). Not only are language skills important at the primary or secondary level: language experience and knowledge before entering school are also significant, as they are positively linked to later skills and school performance (Walker et al, 1994; Durham et al, 2007; Claessens et al, 2009; Gilkerson et al, 2018). However, Parsons et al (2011) showed that poor vocabulary knowledge at age 5 but good reading ability at age 10 results in higher educational attainment by age 34 compared to poor readers at age 10. Hence, strong language skills at the end of primary education are positively correlated with later educational outcomes, even if skills were poor before formal education.

Boudon (1974) proposed splitting the effect of social origin on educational outcome into primary and secondary effects. Primary effects account for social differences in abilities, while secondary effects allocate different transition probabilities. If students have the same abilities, students from higher social strata display a greater likelihood of seeking additional schooling. Decomposition analyses of the effect of social origin on educational attainment revealed that primary effects and secondary effects are important (Karlson and Holm, 2011; Jackson et al, 2007).

Existing studies that focus on the primary effect conclude that cognitive abilities partly explain differences in educational outcomes according to social origin. This applies to studies that deal with the mediation effect on educational qualifications or performance. Language skills in kindergarten can account for most of the effect of SES on educational outcomes in elementary school (Durham et al, 2007). Children from three regions in the US showed better abilities at school if they had better language skills in kindergarten. Maternal education was mainly accountable for better preschool language skills. These language skills led to the mediation of socially unequal performance in primary school. In a sample of two schools in England, Spencer et al (2017) found that teenage language skills are correlated with socio-economic background. Those language abilities were also related to GSCE grades.

Studies focusing on the mediation effect on educational attainment with a large number of observations mostly use a combined measure of verbal and non-verbal cognitive skills (Bukodi et al, 2014; Erikson, 2016; Bourne et al, 2018; Betthäuser et al, 2020). These studies find that cognitive abilities in childhood explain one third to one half of the social origin effect on educational attainment. Thereby, cognitive abilities are measured using scales that combine linguistic and non-linguistic tasks. Language skills are likely to account for part of the impact of social origin on the highest level of education. To analyse the role of language skills in socially unequal educational attainment, this study uses two measurements of expressive language skills derived from essays.

The lexical diversity and the number of spelling and grammar errors are extracted from essays written in a large British cohort study, the NCDS. Relatively few studies have analysed the number of mistakes in written output by social origin. However, efficient spelling is related to reading skills and the development of the phonological and orthographic system (Sawyer and Joyce, 2005). An analysis of children in primary school shows that spelling mistakes are more pronounced in writing by children from lower social backgrounds compared to children from a higher social background (Korat and Levin, 2002). For a standardised spelling test conducted at age 16, Sullivan and Brown (2013) find that adolescents scored better when their parents had achieved higher formal education. For lexical diversity, Lawton (1963) was able to demonstrate that, in writing tasks, children from families with higher SES use more different words than children from disadvantaged backgrounds.

The advantage of quantitatively analysing essays is that writing is a fairly common task familiar to schoolchildren. Metrics taken from the essays could help to understand the influence of language on stratified educational outcomes. Participants were asked to write the essays at age 11, which are used to measure expressive language skills. In addition to the essays, information about the cohort members available from the ongoing longitudinal study is included. The language skills derived from the essays are linked to the educational qualifications obtained by the cohort members.

In summary, previous research has identified that language skills are related to social origin and that both are associated with educational outcomes. Cognitive skills partly mediate the association between social background and educational qualification. However, previous large-scale studies into the mediation role of cognitive abilities on socially unequal educational attainment have not addressed the role of language skills. Nevertheless, socially unequal performance in primary schoolchildren can be traced back to socially unequal language skills before entering school (Durham et al, 2007). This study adds to the literature by using student essays to analyse language skills in primary school and their implications on mediating the relationship between social origin and educational attainment.

This study tests the following hypotheses:

  • Written language skills derived from student essays are socially stratified (H1).

  • Expressive language skills in childhood mediate part of the relationship between social background and educational attainment (H2).

Data and methods

This study uses data from the British NCDS (Power and Elliott, 2006). All births in England, Scotland and Wales from one week in March 1958 were registered and are included in the survey. Initially, the sample included over 17,000 babies. At age 11 the children were asked to write an essay about their imagined life at the age of 25. The question was: ‘Imagine you are now 25 years old. Write about the life you are leading, your interests, your home life and your work at the age of 25. (You have 30 minutes to do this)’. The average essay consists of 200 words (Table A1). Most essays had been in storage since the collection and were not available in digital form. The texts were manually transcribed and all spelling errors were reproduced. In 2018, the essays were made available in the UK Data Archive (Centre for Longitudinal Studies, 2018). At the age of 11, the study participants were in primary school, so pupils had not yet been assigned to different school tracks.

Two example essays are shown below to give the reader a clearer picture of the raw material:

If I were twenty five I would marry and have some children. I would have some pets and ride. I would send my children to school and do the house work, do some cooking and flower arranging & shopping. I would have a large house in the country with lots of ground for ponies and some garden. My husband would go to work in [city] and earn some money. My garden would have tulips, roses, lupins and a fruit cage with blackberries, raspberries, blackcurrants and loganberries. I would have a nice swimming pool with a diving board and a tennis court. I would have a rope at one end of the garden and a sand pit for long Jump and high jump. My house would have six bedrooms and two bath rooms up stairs and a large play room with a table tennis table in it a kitchen a drawing room a dining room down stairs. (Father’s social class: managerial and technical occupation; parents’ education: compulsory + 6 years)

my name is [name] and I am 25 yers old I am batchler and I am a milk man I work for the [company] and I get 15 pound a week I like wark ing on a milk raund. travel about 120 miles eatch day nearley and I have got a lot of customors who have milk of me and I have plenty of sparre time to play with the dog and the budgierigat to and I am sit-ing down all day nearly to so I musht grumble (Father’s social class: partly-skilled occupation; parents’ education: compulsory)

Measurements

This study analyses two measures of language skills derived from the written texts of the cohort members. Lexical diversity and the number of spelling and grammar mistakes in the essays are the metrics used to rate expressive language skills. The errors were extracted using open-source software provided by LanguageTool (LanguageTooler GmbH, 2021). Using the Java API, the proofreading software automatically detected potential spelling and grammar errors. The LanguageTool library identifies the number of potential errors for each essay, which is divided by the total number of words to obtain the percentage of errors in an essay. On average there are 6.6 spelling and grammar errors per 100 words. The rate of identified spelling mistakes (6.0%) is much higher than that for grammar mistakes (0.6%). However, punctuation errors were ignored. Sahu et al (2020) evaluate the software and report a high accuracy (95%) for spelling mistakes and a lower level of accuracy (44%) for syntax errors. These results are consistent with a small analysis on the essays. Although it does not detect all errors, the software provides a good proxy of the number of errors, especially spelling errors.

Lexical diversity is a measure of the number of different words in a text. Of the various formulas available, this study uses the Measure of Textual Lexical Diversity (MTLD) introduced by McCarthy and Jarvis (2010), as it does not correlate with text length. The MTLD assesses a text’s lexical richness and serves as a proxy to measure productive vocabulary. The R package ‘koRpus’ (Michalke, 2019) is used to calculate this measure. Since some essays contain many errors, automatically corrected essays are used to calculate the lexical richness. In addition, a lemmatisation is conducted using the Stanford CoreNLP software (Manning et al, 2014). A lemma is the dictionary form of a word. This means, for example, that conjugated verbs or plural nouns are reduced to the basic form. Lemmatisation and correction are performed to ensure that the different written forms of a lemma are counted only once as a unique word.

To obtain one metric for written language skills, the metrics for errors and lexical diversity are combined using principal component analysis. The component with an Eigenvalue higher than 1 is used as variable for written language skills in the regression models.

Highest educational qualification is measured with an ordered categorical variable constructed by Bukodi (2017). Information about academic and vocational qualifications up to the cohort members’ age of 46–47 (wave 7) is included, so as to consider the highest qualification achieved by study participants who were not interviewed in young adulthood. Overall, the data set includes information on educational qualifications from wave 4 to wave 7. This analysis differentiates between three categories: no formal qualification or sub-secondary education, secondary and tertiary education.

The analysis considers two dimensions of social origin, parental education and social class of the father. The original data set only provides parental education in years of schooling, measured with a categorical variable. In line with Connelly and Gayle (2019), information about the father’s and mother’s education after compulsory schooling is mapped to the highest value of those variables. The parental education variable differentiates between four categories (compulsory schooling only, compulsory schooling + 1 to 3 years, compulsory schooling + 4 to 5 years, compulsory schooling + 6 or more years).

Social class is measured via the father’s occupational status at the time of the age 11 survey. A study by Gregg (2012) coded the job titles mentioned. These codes were mapped on the Registrar General’s Social Classes (RGSC) codes from 1990 and have been stored (Cohort and Longitudinal Studies Enhancement Resources, 2018). In cases where no valid codes from this coding method are available, information about the social class of the father from the original questionnaire at age 11 is used, if available from the questionnaire and missing from the coding method. Social class derived from job title is more reliable than the information about the social class from the original questionnaire, so this approach was preferred. Social class is only included in the analysis via the father’s occupational status since the measurement of maternal status is only available to a limited extent.

Standardised test scores at age 11 are used in the mediation analysis. A general ability test included a verbal and non-verbal subtask. For the non-verbal task, the cohort member was asked to identify the correct shape or symbol for a missing item in a set of shapes or symbols from one of five alternatives (Shepherd, 2012). For the verbal test, respondents were asked to complete a list of words with a word from a list of alternatives that matched logically, semantically or phonologically (Shepherd, 2012). A reading assessment and mathematical test were also administered. The four scales have been combined to two variables, one for non-verbal test scores and one for verbal test scores, using principal component analysis.

In addition, the models include gender, migration status of the cohort member (born in GB vs outside of GB), the language usually spoken at home (English vs other) and region as control variables.

Analytical plan

The first step of the analysis is to demonstrate the effect of social origin on language skills in written output at the age of 11. The Stata add-ons ttesttable (Chávez Juárez, 2012) and coefplot (Jann, 2014) were used to depict group differences for the categories of the social origin variables. Second, the effect of receptive language skills on educational attainment is presented. Third, the mediation effect of expressive language skills on the relationship between social background and educational attainment is analysed. The latter is done with the KHB method (Karlson et al, 2012). This approach enables the coefficients of nested logistic regression models to be compared by rescaling log odds; in addition, mediation analysis can be performed (Kohler et al, 2011).

The main analysis only considers complete cases, due to the KHB method’s limited compatibility for handling imputed values. As stated in the description of the data repository, probably 13,732 of the 15,337 originally surveyed cohort members wrote an essay at age 11 (Centre for Longitudinal Studies, 2018) and the data set includes 10,511 transcribed essays. Many essays are not available because either the microfiche on which they were stored is missing (1,535) or they were never stored on microfiche (1,477) in the first place. Of the cases where an essay is available, missing data includes: the highest education qualification achieved (12%), father’s social class (7%) and parental education (5%). After removing incomplete cases, the final sample contains 8,019 cohort members. In general, the response rate for the longitudinal study declined in adulthood: while 91% responded at the age of 11, the response rate drops to 71% at age 33 (Hawkes and Plewis, 2006). Therefore, missing values are imputed for a robustness check.

Results

Table 1 tabulates the combined metric for written language skills for the social origin variables. Cohort members, whose parents achieved higher levels of education, score higher on that metric. The differences between the categories are significant (Table A4). Similar results are found for social class (Table A3). The differences in the combined measure of expressive language skills result from the social differences in the number of errors and lexical diversity. Table A1 in the appendix gives an overview of the language skill variables derived from the essays. Children with a father grouped in the lowest social class category incorrectly spelt 8% of the words in their essay, on average. In contrast, children with a father in the professional field averaged five errors per 100 words. The lowest average error rate is observed among students with at least one parent who has six or more years of additional education beyond compulsory schooling. An opposite pattern is found for the productive vocabulary size metric MTLD. The highest social class group scored the highest on this measure. In other words, they used more unique lemmas in their essays on average than those from the other social class backgrounds. Cohort members from the higher social strata also wrote longer essays on average.

Table 1:

Descriptive statistics concerning the relationship between social origin, educational qualification achieved (up to age 46/47), gender and written language skills at age 11

Language skills derived from essays (z-std)N%
MeanStd. Dev.
Father’s social class (age 11)
 Unskilled occupations−0.351.056217.74
 Partly-skilled occupations−0.140.991,14014.22
 Skilled occupations (manual)−0.111.013,20639.98
 Skilled occupations (non-manual)0.240.9682510.29
 Managerial and technical occupations0.270.891,55619.40
 Professional occupations and similar0.450.893554.43
 No male head−0.160.973163.94
Parents’ education (birth)
 Compulsory−0.141.005,72771.42
 Compulsory + 1–30.280.911,83122.83
 Compulsory + 4–50.470.851331.66
 Compulsory + 6+0.660.843284.09
Achieved qualification
 No qualification or sub-secondary−0.511.092,25128.07
 Secondary0.050.893,44642.97
 Tertiary0.420.832,32228.96
Gender
 Male−0.141.084,05650.58
 Female0.140.893,96349.42
Total0.001.008,019100

A linear regression model of written language skills can be found in the appendix (Table A5). Figure 1 displays the coefficients for the social origin variables gender, migration status and language spoken at home. The model reveals significant differences for social origin, as under the control of the other variable of social origin the coefficients for both variables are significant. For example, boys whose parents have a low level of education and a father in an unskilled occupation scored worse than average on this measure. Having at least one highly educated parent or having a father with an occupation in the professional field increases the score. The model also shows that girls write stronger essays than boys. Cohort members that immigrated to GB have, on average, a lower score. However, no significant effect is found for language spoken at home when controlling for the other variables.

A dot plot of the values mentioned in appendix, Table A 5.
Figure 1:

Linear regression model on language skills derived from essays

Citation: Longitudinal and Life Course Studies 13, 2; 10.1332/175795921X16244506861047

Note: Full model in the appendix, Table A5.

An ordinal logistic regression model is used to examine the influence of social background on the highest academic qualification achieved. The log odds are reported for the dependent variable indicating the highest academic and vocational qualification achieved (Table 2). Model 1 contains the control variables gender, migration status, language spoken at home and region, and the independent variables for social class and parental education. Unsurprisingly, respondents from advantaged social strata are more likely to obtain a higher qualification compared to the reference categories (unskilled occupation and compulsory schooling only).

Table 2:

Ordinal logistic regression for the dependent variable educational attainment

Model 1Model 2Model 3
Log oddsSELog oddsSELog oddsSE
Gender
 Female−0.078(0.044)−0.266***(0.045)−0.205***(0.045)
Parents’ education (birth)
 Compulsoryref.ref.ref.
 Compulsory + 1–30.914***(0.057)0.692***(0.057)0.452***(0.057)
 Compulsory + 4–51.310***(0.183)1.008***(0.184)0.717***(0.184)
 Compulsory + 6+1.735***(0.135)1.327***(0.135)0.948***(0.135)
Father’s social class (age 11)
 Unskilled occupationsref.ref.ref.
 Partly-skilled occupations0.285**(0.099)0.155(0.099)0.081(0.099)
 Skilled occupations (manual)0.492***(0.088)0.337***(0.088)0.225*(0.088)
 Skilled occupations (non-manual)1.142***(0.108)0.816***(0.108)0.549***(0.108)
 Managerial and technical occupations1.185***(0.099)0.871***(0.099)0.564***(0.099)
 Professional occupations and similar1.609***(0.147)1.271***(0.147)0.915***(0.147)
 No male head0.223(0.138)0.113(0.138)0.031(0.138)
Migration
 CM born outside of GB0.070(0.185)0.195(0.185)0.384*(0.185)
English usually spoken at home
 Not spoken−0.068(0.136)0.019(0.136)0.072(0.136)
Essay: language skills (z-std)0.680***(0.025)0.175***(0.030)
Test: verbal skills (z-std)0.870***(0.031)
Controlled for regionyesyesyes
Cutpoints
 cut1−0.148(0.098)−0.523***(0.099)−0.675***(0.099)
 cut22.255***(0.102)1.880***(0.101)1.728***(0.101)
Mc Fadden’s adj. R20.050.100.14
Cragg and Uhler’s R20.130.220.30
McKelvey and Zavoina’s R20.120.220.30
BIC16,554.1615,811.9615,014.44
Observations8,0198,0198,019

Notes: KHB corrected log odds; * p < .05, ** p < .01, *** p < .00.

Model 2 includes the language skills metric as derived from the essays. A positive effect can be found between the written language skills and qualification level later in life. The log odds for the social origin variables decrease with the language skill variables in the model. However, most coefficients remain significant, which indicates that there is not a total mediation. To illustrate the mediation effect, Table 3 shows mediation percentages. The values for model 2 indicated that around one quarter of the effect of social origin on educational attainment is due to expressive language skills. For example, the log odds from model 1 for having at least one highly educated parent reduce by 24.48% if the model includes the essay measure for language skills. The simple averages of the mediating percentages add up to 24.62 for parental education and 31.55 for father’s social class.

Table 3:

Mediation analysis, percentage mediated by language skills

Model 2Model 3Model 3Model 3
% Mediated% Mediated% Mediated language tests% Mediated essays
Parents’ education (birth)
 Compulsoryref.ref.ref.ref.
 Compulsory + 1–325.1950.72**44.456.27
 Compulsory + 4–524.18**45.32***39.395.93
 Compulsory + 6+24.48***45.55***39.486.07
Average24.6247.2041.116.09
Father’s social class (age 11)
 Unskilled occupationsref.ref.ref.ref.
 Partly-skilled occupations46.3170.2158.6811.53
 Skilled occupations (manual)32.5154.1546.048.11
 Skilled occupations (non-manual)29.65**52.22***44.847.38
 Managerial and technical occupations27.38**52.55***45.736.82
 Professional occupations and similar21.92**43.41***37.985.43
 No male head46.887.7674.8212.95
Average (without no male head)31.5554.5146.657.85

The scores from standardised verbal ability tests are included in model 3. In doing so, the log odds for the social origin measures decrease further (Table 2). However, the log odds for language skills variables derived from the essays also drop, while the average mediating percentages rise to 47.2 for parental education and 54.5 for social class (Table 3). Therefore, expressive language skills measured via essays in combination with standardised verbal cognitive skills reduce the total effect of social background on the educational outcome to around one half. To depict the influence of the two mediators, Table 3 shows the disentangled mediating percentage separately for the social origin categories. For example, for the social class category ‘skilled occupations (non-manual)’, the total effect is reduced by 52% if the model includes the mediators. Of the 52.2%, 44.8% is due to the standardised verbal test scores. The smaller proportion of 7.4% is due to the written language skills. Similar effects can be found for the other categories of the social origin variables.

Robustness checks

Two robustness checks are performed, first the change of mediation percentage of language skills is analysed when test measures for non-verbal skills are taken into account, and second, a robustness check with imputed values is performed.

In addition to model 3, model 4 includes a score for two non-verbal tests (mathematical and non-verbal cognitive abilities) to test if the estimated mediation percentage of language change when taking non-verbal skills into account. The coefficient for expressive language skills declines slightly, but remains significant (Table A7). A higher decrease can be observed for the coefficient for standardised verbal skills. Figure 2 shows the disentangled mediation effects. Taken together, the language skill variables reduce the effect of social origin on educational attainment by approximately one quarter. A slightly higher reduction is due to non-verbal cognitive skills.

In the graph, the horizontal axis is scaled from 0 to 100 in gaps of 10 units and the vertical axis lists the disentangled mediators. The graph shows the following data. Comp +1-3: essay language skills (z-std), 5.57; test verbal skills (z-std), 20.55; test non-verbal (z-std), 29.57. Comp +4-5: essay language skills (z-std), 5.27; test verbal skills (z-std), 18.25; test non-verbal (z-std), 26.85. Comp +6+: essay language skills (z-std), 5.39; test verbal skills (z-std), 18.24; test non-verbal (z-std), 24.1. Partly-skilled occupations: essay language skills (z-std), 10.45; test verbal skills (z-std), 27.71; test non-verbal (z-std), 44.87. Skilled occupations (manual): essay language skills (z-std), 7.23; test verbal skills (z-std), 21.39; test non-verbal (z-std), 35.48. Skilled occupations (non-manual): essay language skills (z-std), 6.54; test verbal skills (z-std), 20.07; test non-verbal (z-std), 30.93. Managerial and technical occ: essay language skills (z-std), 6.06; test verbal skills (z-std), 21.17; test non-verbal (z-std), 32.77. Professional etc. occupations: essay language skills (z-std), 4.8; test verbal skills (z-std), 17.51; test non-verbal (z-std), 27.06. No male head: essay language skills (z-std), 11.28; test verbal skills (z-std), 33.96; test non-verbal (z-std), 46.17.
Figure 2:

Disentangled mediators for the effect of social origin on educational attainment

Citation: Longitudinal and Life Course Studies 13, 2; 10.1332/175795921X16244506861047

Note: Full model in the appendix, Table A7.

Since the KHB Stata add-on does not fully support the use of imputed values, only models 2 and 3 are estimated with imputed data. Imputation is performed through chained equations. The dependent variable, all independent variables, predictors of educational attainment at age 50 and non-response are included in the imputation command to generate 60 complete data sets (Mostafa et al, 2021). The target population excludes those who migrated or died before turning 33 (N = 16,062). In doing so, the mediation percentages remain similar to the complete cases analysis (Table 4). However, effects are more often significant, probably due to higher sample size.

Table 4:

Mediation analysis on imputed data, percentage mediated by language skills

Model I2Model I3
% Mediated% Mediated
Parents’ education (birth)
 Compulsoryref.ref.
 Compulsory + 1–326.83**54.54***
 Compulsory + 4–523.92**50.04***
 Compulsory + 6+25.27***48.60***
Average25.3451.06
Father’s social class (age 11)
 Unskilled occupationsref.ref.
 Partly-skilled occupations39.9463.14*
 Skilled occupations (manual)32.24***54.13**
 Skilled occupations (non-manual)29.57***55.12***
 Managerial and technical occupations28.13***56.17***
 Professional occupations and similar24.24***46.46***
 No male head61.2594.41
Average (without no male head)30.8255.01

To conclude, the analysis indicates that expressive language skills partly mediate the effect of social origin. The percentage varies depending on the comparison. Roughly speaking, the confounding percentages are about one quarter for expressive language skills if not controlled for standardised verbal and non-verbal cognitive skills. Under the control of standardised cognitive skills (verbal and non-verbal), language skills derived from essays at age 11 still have a significant effect for the highest educational qualification achieved later in life. The measurements of language skills together, more precisely the standardised tests and the measure from the essay, mediate around one quarter of the total effect of social origin on educational levels (Figure 2).

Conclusion

This study reveals significant differences in expressive language skills by social origin at age 11. A different approach from that in most research on social stratification was used to measure language skills: analysing a unique data set of essays written by children in a British longitudinal study. Examining spelling and grammar errors as well as lexical diversity has shown that there are differences in the use of written language in childhood by social origin. Participating schoolchildren from lower social strata misspelt more words and used a smaller variety of words in their essays than those who are socially more privileged. A relationship between the measures of language abilities derived from written texts and the educational qualifications later obtained was found in the data. The initial hypothesis argued that this relationship could mediate the effect of social origin on educational attainment. Results from the mediation analysis show that written language skills partially mediate (approximately one quarter) the effect of social background on the highest educational qualification achieved. Including skills from a standardised test in the model raises the mediation percentage for verbal and non-verbal skills together with the language skills derived from the essays to approximately 50%. Taken together, the greatest effect of mediation results from the variables measuring non-verbal skills. As other studies have shown, cognitive abilities in childhood partly mediate the effect of social origin on educational outcomes (Bukodi et al, 2014; Erikson, 2016; Bourne et al, 2018; Betthäuser et al, 2020). This study demonstrates that expressive language skills derived from essays at age 11 have a lower impact on this association than verbal and non-verbal cognitive skills measured with a standardised test. However, language skills are related to non-verbal cognitive skills (Deák, 2014), as can also be observed in the NCDS data (Table A6). Despite the subordinate role in mediating the effect of social origin, written language skills are important for educational success.

In contrast to Durham et al (2007), who could explain socially unequal performance in primary school by social differences before school entry, this study reveals that expressive language skills mediate part of the social origin effects when considering educational attainment later in life. Approximately half of the social origin effect is explained by including both verbal and non-verbal abilities in the model, leaving half unexplained. In addition to the primary effects of social origin, there are also secondary and institutional effects, which this study does not investigate. Such effects could be more important for the most disadvantaged students, as the analysis reveals that language skills explain a larger part of the educational reproduction of socially worse-off cohort members.

The most serious limitation of this study is related to the survey conditions at age 11. The instructions asked participants to write the essay in 30 minutes. However, reports indicate that some participants did not adhere to the time frame and may even have finished the essay at home. This is why the analysis disregards the length of the texts. Furthermore, if the children only wrote the essay in school, the test conditions may have varied in terms of the time of day or help from the teachers. Richardson et al (1976) briefly discuss the differing test conditions.

As similar mediation percentages of cognitive abilities are found for the association between social origin and educational attainment in the 1990s British cohort (Bourne et al, 2018), I assume that the results presented for the mediating role of language skills can also probably be found in a cohort more recent than that of 1958, although the mediation percentages may be different.

In summary, these findings suggest that language skills are one mechanism besides others that explain socially unequal educational attainment. Attempts to promote language skills should begin in early childhood, since social differences in language skills occur at an early age (Becker, 2011). This may reduce the role of language skills in mediating the effect of social origin on educational attainment.

Acknowledgement

I would like to thank the participants of the ISA RC28 Spring Meeting 2019 for their helpful comments and suggestions.

Data availability statement

The analysis is based on data available from the UK Data Archive. Data are available at http://doi.org/10.5255/UKDA-SN-8313-1 with the permission of the UK Data Archive.

Conflict of interest

The author declares that there is no conflict of interest.

References

  • Becker, B. (2011) Social disparities in children’s vocabulary in early childhood: does Pre-school education help to close the gap?, The British Journal of Sociology, 62(1): 6988. doi: 10.1111/j.1468-4446.2010.01345.x

    • Search Google Scholar
    • Export Citation
  • Bernstein, B. (1961) Social structure, language and learning, Educational Research, 3(3): 16376. doi: 10.1080/0013188610030301

  • Bernstein, B. (1971) Class, Codes and Control. Volume I: Theoretical Studies Towards a Sociology of Language, London: Routledge & Paul.

  • Bernstein, B. (1975) Class, Codes and Control, Volume III: Towards a Theory of Educational Transmissions, London: Routledge & Paul.

  • Betthäuser, B.A., Bourne, M. and Bukodi, E. (2020) Understanding the mobility chances of children from Working-class backgrounds in Britain: how important are cognitive ability and locus of control?, The British Journal of Sociology, 71(2): 34965. doi: 10.1111/1468-4446.12732

    • Search Google Scholar
    • Export Citation
  • Blau, P.M. and Duncan, O.D. (1967) The American Occupational Structure, New York: Wiley. https://hds.hebis.de/ubffm/Record/HEB034667288. 

    • Search Google Scholar
    • Export Citation
  • Boudon, R. (1974) Education, Opportunity, and Social Inequality: Changing Prospects in Western Society, New York: Wiley.

  • Bourdieu, P. (1977) The economics of linguistic exchanges, Social Science Information, 16(6): 64568. doi: 10.1177/053901847701600601

  • Bourdieu, P., Passeron, J.C. and de Saint Martin, M. (1994) Academic Discourse, R. Teese (trans) Cambridge: Polity.

  • Bourne, M., Bukodi, E., Betthäuser, B. and Goldthorpe, J.H. (2018) ‘Persistence of the social’: the role of cognitive ability in mediating the effects of social origins on educational attainment in Britain, Research in Social Stratification and Mobility, 58: 1121.  doi: 10.1016/j.rssm.2018.09.001

    • Search Google Scholar
    • Export Citation
  • Breen, R. and Karlson, K.B. (2014) Education and social mobility: new analytical approaches, European Sociological Review, 30(1): 10718. doi: 10.1093/esr/jct025

    • Search Google Scholar
    • Export Citation
  • Bukodi, E. (2017) National Child Development Study and 1970 British Cohort Study Educational Qualifications Histories, 1981–2009 [data collection], London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-8127-1.

    • Search Google Scholar
    • Export Citation
  • Bukodi, E. and Goldthorpe, J.H. (2013) Decomposing ‘social origins’: the effects of parents’ class, status, and education on the educational attainment of their children, European Sociological Review, 29(5): 102439. doi: 10.1093/esr/jcs079

    • Search Google Scholar
    • Export Citation
  • Bukodi, E., Erikson, R. and Goldthorpe, J.H. (2014) The effects of social origins and cognitive ability on educational attainment: evidence from Britain and Sweden, Acta Sociologica, 57(4): 293310. doi: 10.1177/0001699314543803

    • Search Google Scholar
    • Export Citation
  • Centre for Longitudinal Studies (2018) National Child Development Study: ‘Imagine you are 25’ Essays (Sweep 2, Age 11), 1969 [data collection], UK Data Service SN 8313, London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-8313-1.

    • Search Google Scholar
    • Export Citation
  • Chávez Juárez, F. (2012) TTESTTABLE: Stata Module to Compute Differences in Means by Groups Including the T-test, Statistical Software Components S457401, Boston, MA: Boston College Department of Economics, revised 15 Feb 2015, https://econpapers.repec.org/software/bocbocode/s457401.htm.

    • Search Google Scholar
    • Export Citation
  • Claessens, A., Duncan, G. and Engel, M. (2009) Kindergarten skills and Fifth-grade achievement: evidence from the ECLS-K, Economics of Education Review, 28(4): 41527. doi: 10.1016/j.econedurev.2008.09.003

    • Search Google Scholar
    • Export Citation
  • Cohort and Longitudinal Studies Enhancement Resources (2018) CLOSER: National Child Development Study Cross-cohort harmonised data [data collection],  UK Data Service, SN: 8342, http://doi.org/10.5255/UKDA-SN-8342-1.

    • Search Google Scholar
    • Export Citation
  • Connelly, R. and Gayle, V. (2019) An investigation of social class inequalities in general cognitive ability in two British birth cohorts, The British Journal of Sociology, 70(1): 90108. doi: 10.1111/1468-4446.12343

    • Search Google Scholar
    • Export Citation
  • Dämmrich, J. and Triventi, M. (2018) The dynamics of social inequalities in Cognitive-related competencies along the early life course – a comparative study, International Journal of Educational Research, 88: 7384.  doi: 10.1016/j.ijer.2018.01.006

    • Search Google Scholar
    • Export Citation
  • Deák, G.O. (2014) Interrelations of language and cognitive development, in P. Brooks and V. Kempe (eds) Encyclopedia of Language Development, Los Angeles: Sage, pp 28491.

    • Search Google Scholar
    • Export Citation
  • Durham, R.E., Farkas, G., Hammer, C.S., Tomblin, J.B. and Catts, H.W. (2007) Kindergarten oral language skill: a key variable in the intergenerational transmission of socioeconomic status, Research in Social Stratification and Mobility, 25(4): 294305. doi: 10.1016/j.rssm.2007.03.001

    • Search Google Scholar
    • Export Citation
  • Erikson, R. (2016) Is it enough to be bright? Parental background, cognitive ability and educational attainment, European Societies, 18(2): 11735. doi: 10.1080/14616696.2016.1141306

    • Search Google Scholar
    • Export Citation
  • Ermisch, J. (2008) Origins of social immobility and inequality: parenting and early child development, National Institute Economic Review, 205(1): 6271. doi: 10.1177/0027950108096589

    • Search Google Scholar
    • Export Citation
  • Fernald, A., Marchman, V.A. and Weisleder, A. (2013) SES differences in language processing skill and vocabulary are evident at 18 months, Developmental Science, 16(2): 23448. doi: 10.1111/desc.12019

    • Search Google Scholar
    • Export Citation
  • Gilkerson, J., Richards, J.A., Warren, S.F., Oller, D.K., Russo, R. and Vohr, B. (2018) Language experience in the second year of life and language outcomes in late childhood, Pediatrics, 142(4): 1–11.  doi: 10.1542/peds.2017-4276

    • Search Google Scholar
    • Export Citation
  • Gregg, P. (2012) Occupational Coding for the National Child Development Study (1969, 1991–2008) and the 1970 British Cohort Study (1980, 2000–2008) [data collection], UK Data Service SN 7023, London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-7023-1.

    • Search Google Scholar
    • Export Citation
  • Grenfell, M. (2011) Theory, in M. Grenfell (ed) Bourdieu, Language and Linguistics, London: Continuum, pp 3763.

  • Hart, B. and Risley, T.R. (1995) Meaningful Differences in the Everyday Experience of Young American Children, Baltimore, MD: Paul H. Brookes Publishing.

    • Search Google Scholar
    • Export Citation
  • Hawkes, D. and Plewis, I. (2006) Modelling Non-response in the National Child Development Study, Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3): 47991. doi: 10.1111/j.1467-985X.2006.00401.x

    • Search Google Scholar
    • Export Citation
  • Hoff, E. (2003) The specificity of environmental influence: socioeconomic status affects early vocabulary development via maternal speech, Child Development, 74(5): 136878. doi: 10.1111/1467-8624.00612

    • Search Google Scholar
    • Export Citation
  • Jackson, M., Erikson, R., Goldthorpe, J.H. and Yaish, M. (2007) Primary and secondary effects in class differentials in educational attainment: the transition to A-level courses in England and Wales, Acta Sociologica, 50(3): 21129. doi: 10.1177/0001699307080926

    • Search Google Scholar
    • Export Citation
  • Jann, B. (2014) Plotting regression coefficients and other estimates, Stata Journal, 14(4): 70837. doi: 10.1177/1536867X1401400402

  • Karlson, K.B. and Holm, A. (2011) Decomposing primary and secondary effects: a new decomposition method, Research in Social Stratification and Mobility, 29(2): 22137. doi: 10.1016/j.rssm.2010.12.005

    • Search Google Scholar
    • Export Citation
  • Karlson, K.B., Holm, A. and Breen, R. (2012) Comparing regression coefficients between Same-sample nested models using logit and probit, Sociological Methodology, 42(1): 286313. doi: 10.1177/0081175012444861

    • Search Google Scholar
    • Export Citation
  • Kohler, U., Karlson, K.B. and Holm, A. (2011) Comparing coefficients of nested nonlinear probability models, Stata Journal, 11(3): 42038. doi: 10.1177/1536867X1101100306

    • Search Google Scholar
    • Export Citation
  • Korat, O. and Levin, I. (2002) Spelling acquisition in two social groups: Mother–child interaction, maternal beliefs and child’s spelling, Journal of Literacy Research, 34(2): 20936. doi: 10.1207/s15548430jlr3402_5

    • Search Google Scholar
    • Export Citation
  • Labov, W. (1970) The logic of nonstandard English, in F. Williams (ed) Language and Poverty: Perspectives on a Theme, London: Academic Press, pp 15389.

    • Search Google Scholar
    • Export Citation
  • LanguageTooler GmbH (2021) LanguageTool: Java API.  https://languagetool.org/dev.

  • Lawton, D. (1963) Social class differences in language development: a study of some samples of written work, Language and Speech, 6(3): 12043. doi: 10.1177/002383096300600302

    • Search Google Scholar
    • Export Citation
  • Machin, S. and Vignoles, A. (2004) Educational inequality: the widening Socio-economic gap, Fiscal Studies, 25(2): 10728. doi: 10.1111/j.1475-5890.2004.tb00099.x

    • Search Google Scholar
    • Export Citation
  • Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014) The Stanford CoreNLP natural language processing toolkit, in K. Bontcheva and J. Zhu (eds) Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Stroudsburg, PA: Association for Computational Linguistics, pp 5560.

    • Search Google Scholar
    • Export Citation
  • McAvinue, L.P. (2018) Oral language and socioeconomic status: the Irish context, Irish Educational Studies, 37(4): 475503. doi: 10.1080/03323315.2018.1521732

    • Search Google Scholar
    • Export Citation
  • McCarthy, P.M. and Jarvis, S. (2010) MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment, Behavior Research Methods, 42(2): 38192. doi: 10.3758/BRM.42.2.381

    • Search Google Scholar
    • Export Citation
  • McNally, S., McCrory, C., Quigley, J. and Murray, A. (2019) Decomposing the social gradient in children’s vocabulary skills at 3 years of age: a mediation analysis using data from a large representative cohort study, Infant Behavior and Development, 57: 1–13.  doi: 10.1016/j.infbeh.2019.04.008

    • Search Google Scholar
    • Export Citation
  • Michalke, M. (2019) koRpus: An R Package for Text Analysis.  https://cran.r-project.org/web/packages/koRpus/index.html.

  • Mostafa, T., Narayanan, M., Pongiglione, B., Dodgeon, B., Goodman, A., Silverwood, R.J. and Ploubidis, G.B. (2021) Missing at random assumption made more plausible: evidence from the 1958 British birth cohort, Journal of Clinical Epidemiology, 136: 4454.  doi: 10.1016/j.jclinepi.2021.02.019

    • Search Google Scholar
    • Export Citation
  • Parsons, S., Schoon, I., Rush, R. and Law, J. (2011) Long‐term outcomes for children with early language problems: beating the odds, Children & Society, 25(3): 20214.

    • Search Google Scholar
    • Export Citation
  • Power, C. and Elliott, J. (2006) Cohort profile: 1958 British birth cohort (National Child Development Study), International Journal of Epidemiology, 35(1): 3441. doi: 10.1093/ije/dyi183

    • Search Google Scholar
    • Export Citation
  • Richardson, K., Calnan, M., Essen, J. and Lambert, L. (1976) The linguistic maturity of 11-year-olds: some analysis of the written compositions of children in the National Child Development Study, Journal of Child Language, 3(1): 99115. doi: 10.1017/S0305000900001331

    • Search Google Scholar
    • Export Citation
  • Sahu, S., Vishwakarma, Y.K., Kori, J. and Thakur, J.S. (2020) Evaluating performance of different grammar checking tools, International Journal of Advanced Trends in Computer Science and Engineering, 9(2): 222733. doi: 10.30534/ijatcse/2020/201922020

    • Search Google Scholar
    • Export Citation
  • Sawyer, D.J. and Joyce, M.T. (2005) Research in spelling: implications for adult basic education, in J. Comings, B. Garner and C. Smith (eds) Review of Adult Learning and Literacy, vol 6, New York: Routledge, pp 71112.

    • Search Google Scholar
    • Export Citation
  • Schuth, E., Köhne, J. and Weinert, S. (2017) The influence of academic vocabulary knowledge on school performance, Learning and Instruction, 49: 15765.  doi: 10.1016/j.learninstruc.2017.01.005

    • Search Google Scholar
    • Export Citation
  • Shepherd, P. (2012) Measures of Ability at Ages 7 to 16: 1958 National Child Development Study User Guide, London: University of London, Institute of Education, Centre for Longitudinal Studies.

    • Search Google Scholar
    • Export Citation
  • Sirin, S.R. (2005) Socioeconomic status and academic achievement: a Meta-analytic review of research, Review of Educational Research, 75(3): 41753. doi: 10.3102/00346543075003417

    • Search Google Scholar
    • Export Citation
  • Spencer, S., Clegg, J. and Stackhouse, J. (2012) Language and disadvantage: a comparison of the language abilities of adolescents from two different socioeconomic areas, International Journal of Language & Communication Disorders, 47(3): 27484. https://doi.org/10.1111/j.1460-6984.2011.00104.x

    • Search Google Scholar
    • Export Citation
  • Spencer, S., Clegg, J., Stackhouse, J. and Rush, R. (2017) Contribution of spoken language and Socio-economic background to adolescents’ educational achievement at age 16 years, International Journal of Language & Communication Disorders, 52(2): 18496. https://doi.org/10.1111/1460-6984.12264

    • Search Google Scholar
    • Export Citation
  • Strand, S. (2006) Comparing the predictive validity of reasoning tests and national end of key stage 2 tests: which tests are the ‘best’?, British Educational Research Journal, 32(2): 20925. doi: 10.1080/01411920600569073

    • Search Google Scholar
    • Export Citation
  • Sullivan, A. and Brown, M. (2013) Social Inequalities in Cognitive Scores at Age 16: The Role of Reading, CLS Working Paper ’13/10, London: Centre for Longitudinal Studies, Institute of Education.

    • Search Google Scholar
    • Export Citation
  • Sullivan, A., Moulton, V. and Fitzsimons, E. (2017) The Intergenerational Transmission of Vocabulary, CLS Working Paper 2017/14, London: Centre for Longitudinal Studies, Institute of Education.

    • Search Google Scholar
    • Export Citation
  • Sullivan, A., Parsons, S., Green, F., Wiggins, R.D. and Ploubidis, G. (2018) The path from social origins to top jobs: social reproduction via education, The British Journal of Sociology, 69(3): 77698. doi: 10.1111/1468-4446.12314

    • Search Google Scholar
    • Export Citation
  • Walker, D., Greenwood, C., Hart, B. and Carta, J. (1994) Prediction of school outcomes based on early language production and socioeconomic factors, Child Development, 65(2): 60621. doi: 10.2307/1131404

    • Search Google Scholar
    • Export Citation

Appendix

Table A1:

Additional descriptive statistics concerning the relationship between social origin, educational qualification achieved (up to age 46–47), gender and written texts at age 11

Grammar and spelling errors (% of the words written)MTLD – lexical diversityEssay length
MeanStd. Dev.MeanStd. Dev.MeanStd. Dev.
Father’s social class (age 11)
 Unskilled occupations8.087.0833.789.74186.41103.27
 Partly-skilled occupations7.316.2336.2410.18196.56102.48
 Skilled occupations (manual)7.056.4236.2710.64195.56108.41
 Skilled occupations (non-manual)5.875.2340.0110.88215.30108.78
 Managerial and technical occupations5.524.9539.9910.76206.54101.42
 Professional occupations and similar5.074.7742.1510.78224.55116.26
 No male head7.005.9435.3910.42193.40111.15
Parents’ education (birth)
 Compulsory7.186.3735.9810.36194.04105.13
 Compulsory + 1–35.554.9840.2210.84213.61111.35
 Compulsory + 4–55.144.7942.6610.50218.8098.46
 Compulsory + 6+4.344.4544.4611.22229.1699.16
Achieved qualification
 No qualification or sub-secondary9.187.5233.209.91178.55106.56
 Secondary6.185.3737.3910.28201.56104.86
 Tertiary4.934.3641.5010.75219.71105.77
Gender
 Male7.876.7937.2611.25176.6893.83
 Female5.424.8737.5610.27224.59113.49
Total6.666.0437.4010.78200.36106.73
Table A2:

Further descriptive statistics regarding the sample

M / %Std. Dev.MinMax
Essay language score (z-std)01−5.413.89
Verbal test scores (z-std)01−2.802.51
Non-verbal test scores (z-std)01−2.492.50
Migration
 CM born in GB98.43
 CM born outside GB1.57
English spoken at home
 English spoken97.18
 English not spoken2.82
Region
 North8.95
 North West6.04
 E & W Riding9.69
 North Midlands5.79
 Midlands9.48
 East10.79
 South East20.05
 South7.06
 South West6.26
 Wales4.21
 Scotland11.68

Note: Table 2 presents the descriptive statistics regarding the variables parental education, social class of the father and qualification attained by the cohort members.

Table A3:

Mean comparison test for language skills derived from essays for all possible combinations of father’s social class categories

Unskilled occupationsPartly-skilled occupationsSkilled occupations (manual)Skilled occupations (non-manual)Managerial and technical occupationsProfessional occupations and similarNo male head
Unskilled occupations0
Partly-skilled occupations−0.213***0
Skilled occupations (manual)−0.246***−0.0330
Skilled occupations (non-manual)−0.589***−0.376***−0.343***0
Managerial and technical occupations−0.628***−0.415***−0.382***−0.0380
Professional occupations and similar−0.802***−0.589***−0.556***−0.213***−0.174***0
No male head−0.197***0.0170.0490.393***0.431***0.606***0

Notes: Numbers present the differences in means; t-test significant levels * p < .1; ** p < .05; *** p < .01.

Table A4:

Mean comparison test for language skills derived from essays for all possible combinations of parental education

CompulsoryCompulsory + 1–3Compulsory + 4–5Compulsory + 6+
Compulsory0
Compulsory + 1–3−0.421***0
Compulsory + 4–5−0.605***−0.184**0
Compulsory + 6+−0.796***−0.375***−0.191**0

Notes: Numbers present the differences in means; t-test significant levels * p < .1; ** p < .05; *** p < .01.

Table A5:

Linear regression model on language skills

Essay: language skills (z-std)SE
Gender (ref: Male)
Female0.28***(0.02)
Father’s social class (age 11)
 Unskilled occupationsref.
 Partly-skilled occupations0.19***(0.05)
 Skilled occupations (manual)0.23***(0.04)
 Skilled occupations (non-manual)0.48***(0.05)
 Managerial and technical occupations0.46***(0.05)
 Professional occupations and similar0.50***(0.07)
 No male head0.16*(0.07)
Parents’ education (birth)
 Compulsoryref.
 Compulsory + 1–30.33***(0.03)
 Compulsory + 4–50.44***(0.08)
 Compulsory + 6+0.60***(0.06)
Migration background
Born outside GB−0.18*(0.09)
Household language
English not spoken−0.13(0.07)
Region
South Eastref.
North0.01(0.04)
North West0.04(0.05)
E & W. Riding0.04(0.04)
North Midlands−0.00(0.05)
Midlands−0.02(0.04)
East0.06(0.04)
South0.02(0.05)
South West0.04(0.05)
Wales−0.21***(0.06)
Scotland0.18***(0.04)
Constant−0.55***(0.05)
Observations8,019
R20.097

Notes: Standard errors in parentheses; * p < .05, ** p < .01, *** p < .001.

Table A6:

Correlation of the variables written language skills, verbal and non-verbal test scores

Essay language score (z-std)Verbal test scores (z-std)Non-verbal test scores (z-std)
Essay language score (z-std)1.00
Verbal test scores (z-std)0.631.00
Non-verbal test scores (z-std)0.550.841.00
Table A7:

Ordinal logistic regression for the dependent variable educational attainment

Model 4
Log oddsSE
Gender
 Female−0.132**(0.045)
Parents’ education (birth)
 Compulsoryref.
 Compulsory + 1–30.401***(0.058)
 Compulsory + 4–50.650***(0.184)
 Compulsory + 6+0.907***(0.135)
Father’s social class (age 11)
 Unskilled occupationsref.
 Partly-skilled occupations0.048(0.099)
 Skilled occupations (manual)0.177*(0.088)
 Skilled occupations (non-manual)0.478***(0.109)
 Managerial and technical occupations0.474***(0.099)
 Professional occupations and similar0.815***(0.147)
 No male head0.019(0.138)
Migration
 CM born outside of GB0.426*(0.185)
English usually spoken at home
 Not spoken0.067(0.136)
Essay: language skills (z-std)0.156***(0.030)
Test: verbal skills (z-std)0.404***(0.045)
Test: non-verbal (z-std)0.583***(0.042)
Controlled for regionyes
Cutpoints
 cut1−0.708***(0.099)
 cut21.695***(0.101)
Mc Fadden’s adj. R20.15
Cragg and Uhler’s R20.32
McKelvey and Zavoina’s R20.32
BIC14,828.34
Observations8,019

Notes: KHB corrected log odds; * p < .05, ** p < .01, *** p < .00.

  • View in gallery

    Linear regression model on language skills derived from essays

  • View in gallery

    Disentangled mediators for the effect of social origin on educational attainment

  • Becker, B. (2011) Social disparities in children’s vocabulary in early childhood: does Pre-school education help to close the gap?, The British Journal of Sociology, 62(1): 6988. doi: 10.1111/j.1468-4446.2010.01345.x

    • Search Google Scholar
    • Export Citation
  • Bernstein, B. (1961) Social structure, language and learning, Educational Research, 3(3): 16376. doi: 10.1080/0013188610030301

  • Bernstein, B. (1971) Class, Codes and Control. Volume I: Theoretical Studies Towards a Sociology of Language, London: Routledge & Paul.

  • Bernstein, B. (1975) Class, Codes and Control, Volume III: Towards a Theory of Educational Transmissions, London: Routledge & Paul.

  • Betthäuser, B.A., Bourne, M. and Bukodi, E. (2020) Understanding the mobility chances of children from Working-class backgrounds in Britain: how important are cognitive ability and locus of control?, The British Journal of Sociology, 71(2): 34965. doi: 10.1111/1468-4446.12732

    • Search Google Scholar
    • Export Citation
  • Blau, P.M. and Duncan, O.D. (1967) The American Occupational Structure, New York: Wiley. https://hds.hebis.de/ubffm/Record/HEB034667288. 

    • Search Google Scholar
    • Export Citation
  • Boudon, R. (1974) Education, Opportunity, and Social Inequality: Changing Prospects in Western Society, New York: Wiley.

  • Bourdieu, P. (1977) The economics of linguistic exchanges, Social Science Information, 16(6): 64568. doi: 10.1177/053901847701600601

  • Bourdieu, P., Passeron, J.C. and de Saint Martin, M. (1994) Academic Discourse, R. Teese (trans) Cambridge: Polity.

  • Bourne, M., Bukodi, E., Betthäuser, B. and Goldthorpe, J.H. (2018) ‘Persistence of the social’: the role of cognitive ability in mediating the effects of social origins on educational attainment in Britain, Research in Social Stratification and Mobility, 58: 1121.  doi: 10.1016/j.rssm.2018.09.001

    • Search Google Scholar
    • Export Citation
  • Breen, R. and Karlson, K.B. (2014) Education and social mobility: new analytical approaches, European Sociological Review, 30(1): 10718. doi: 10.1093/esr/jct025

    • Search Google Scholar
    • Export Citation
  • Bukodi, E. (2017) National Child Development Study and 1970 British Cohort Study Educational Qualifications Histories, 1981–2009 [data collection], London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-8127-1.

    • Search Google Scholar
    • Export Citation
  • Bukodi, E. and Goldthorpe, J.H. (2013) Decomposing ‘social origins’: the effects of parents’ class, status, and education on the educational attainment of their children, European Sociological Review, 29(5): 102439. doi: 10.1093/esr/jcs079

    • Search Google Scholar
    • Export Citation
  • Bukodi, E., Erikson, R. and Goldthorpe, J.H. (2014) The effects of social origins and cognitive ability on educational attainment: evidence from Britain and Sweden, Acta Sociologica, 57(4): 293310. doi: 10.1177/0001699314543803

    • Search Google Scholar
    • Export Citation
  • Centre for Longitudinal Studies (2018) National Child Development Study: ‘Imagine you are 25’ Essays (Sweep 2, Age 11), 1969 [data collection], UK Data Service SN 8313, London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-8313-1.

    • Search Google Scholar
    • Export Citation
  • Chávez Juárez, F. (2012) TTESTTABLE: Stata Module to Compute Differences in Means by Groups Including the T-test, Statistical Software Components S457401, Boston, MA: Boston College Department of Economics, revised 15 Feb 2015, https://econpapers.repec.org/software/bocbocode/s457401.htm.

    • Search Google Scholar
    • Export Citation
  • Claessens, A., Duncan, G. and Engel, M. (2009) Kindergarten skills and Fifth-grade achievement: evidence from the ECLS-K, Economics of Education Review, 28(4): 41527. doi: 10.1016/j.econedurev.2008.09.003

    • Search Google Scholar
    • Export Citation
  • Cohort and Longitudinal Studies Enhancement Resources (2018) CLOSER: National Child Development Study Cross-cohort harmonised data [data collection],  UK Data Service, SN: 8342, http://doi.org/10.5255/UKDA-SN-8342-1.

    • Search Google Scholar
    • Export Citation
  • Connelly, R. and Gayle, V. (2019) An investigation of social class inequalities in general cognitive ability in two British birth cohorts, The British Journal of Sociology, 70(1): 90108. doi: 10.1111/1468-4446.12343

    • Search Google Scholar
    • Export Citation
  • Dämmrich, J. and Triventi, M. (2018) The dynamics of social inequalities in Cognitive-related competencies along the early life course – a comparative study, International Journal of Educational Research, 88: 7384.  doi: 10.1016/j.ijer.2018.01.006

    • Search Google Scholar
    • Export Citation
  • Deák, G.O. (2014) Interrelations of language and cognitive development, in P. Brooks and V. Kempe (eds) Encyclopedia of Language Development, Los Angeles: Sage, pp 28491.

    • Search Google Scholar
    • Export Citation
  • Durham, R.E., Farkas, G., Hammer, C.S., Tomblin, J.B. and Catts, H.W. (2007) Kindergarten oral language skill: a key variable in the intergenerational transmission of socioeconomic status, Research in Social Stratification and Mobility, 25(4): 294305. doi: 10.1016/j.rssm.2007.03.001

    • Search Google Scholar
    • Export Citation
  • Erikson, R. (2016) Is it enough to be bright? Parental background, cognitive ability and educational attainment, European Societies, 18(2): 11735. doi: 10.1080/14616696.2016.1141306

    • Search Google Scholar
    • Export Citation
  • Ermisch, J. (2008) Origins of social immobility and inequality: parenting and early child development, National Institute Economic Review, 205(1): 6271. doi: 10.1177/0027950108096589

    • Search Google Scholar
    • Export Citation
  • Fernald, A., Marchman, V.A. and Weisleder, A. (2013) SES differences in language processing skill and vocabulary are evident at 18 months, Developmental Science, 16(2): 23448. doi: 10.1111/desc.12019

    • Search Google Scholar
    • Export Citation
  • Gilkerson, J., Richards, J.A., Warren, S.F., Oller, D.K., Russo, R. and Vohr, B. (2018) Language experience in the second year of life and language outcomes in late childhood, Pediatrics, 142(4): 1–11.  doi: 10.1542/peds.2017-4276

    • Search Google Scholar
    • Export Citation
  • Gregg, P. (2012) Occupational Coding for the National Child Development Study (1969, 1991–2008) and the 1970 British Cohort Study (1980, 2000–2008) [data collection], UK Data Service SN 7023, London: University of London, Institute of Education, Centre for Longitudinal Studies, http://doi.org/10.5255/UKDA-SN-7023-1.

    • Search Google Scholar
    • Export Citation
  • Grenfell, M. (2011) Theory, in M. Grenfell (ed) Bourdieu, Language and Linguistics, London: Continuum, pp 3763.

  • Hart, B. and Risley, T.R. (1995) Meaningful Differences in the Everyday Experience of Young American Children, Baltimore, MD: Paul H. Brookes Publishing.

    • Search Google Scholar
    • Export Citation
  • Hawkes, D. and Plewis, I. (2006) Modelling Non-response in the National Child Development Study, Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3): 47991. doi: 10.1111/j.1467-985X.2006.00401.x

    • Search Google Scholar
    • Export Citation
  • Hoff, E. (2003) The specificity of environmental influence: socioeconomic status affects early vocabulary development via maternal speech, Child Development, 74(5): 136878. doi: 10.1111/1467-8624.00612

    • Search Google Scholar
    • Export Citation
  • Jackson, M., Erikson, R., Goldthorpe, J.H. and Yaish, M. (2007) Primary and secondary effects in class differentials in educational attainment: the transition to A-level courses in England and Wales, Acta Sociologica, 50(3): 21129. doi: 10.1177/0001699307080926

    • Search Google Scholar
    • Export Citation
  • Jann, B. (2014) Plotting regression coefficients and other estimates, Stata Journal, 14(4): 70837. doi: 10.1177/1536867X1401400402

  • Karlson, K.B. and Holm, A. (2011) Decomposing primary and secondary effects: a new decomposition method, Research in Social Stratification and Mobility, 29(2): 22137. doi: 10.1016/j.rssm.2010.12.005

    • Search Google Scholar
    • Export Citation
  • Karlson, K.B., Holm, A. and Breen, R. (2012) Comparing regression coefficients between Same-sample nested models using logit and probit, Sociological Methodology, 42(1): 286313. doi: 10.1177/0081175012444861

    • Search Google Scholar
    • Export Citation
  • Kohler, U., Karlson, K.B. and Holm, A. (2011) Comparing coefficients of nested nonlinear probability models, Stata Journal, 11(3): 42038. doi: 10.1177/1536867X1101100306

    • Search Google Scholar
    • Export Citation
  • Korat, O. and Levin, I. (2002) Spelling acquisition in two social groups: Mother–child interaction, maternal beliefs and child’s spelling, Journal of Literacy Research, 34(2): 20936. doi: 10.1207/s15548430jlr3402_5

    • Search Google Scholar
    • Export Citation
  • Labov, W. (1970) The logic of nonstandard English, in F. Williams (ed) Language and Poverty: Perspectives on a Theme, London: Academic Press, pp 15389.

    • Search Google Scholar
    • Export Citation
  • LanguageTooler GmbH (2021) LanguageTool: Java API.  https://languagetool.org/dev.

  • Lawton, D. (1963) Social class differences in language development: a study of some samples of written work, Language and Speech, 6(3): 12043. doi: 10.1177/002383096300600302

    • Search Google Scholar
    • Export Citation
  • Machin, S. and Vignoles, A. (2004) Educational inequality: the widening Socio-economic gap, Fiscal Studies, 25(2): 10728. doi: 10.1111/j.1475-5890.2004.tb00099.x

    • Search Google Scholar
    • Export Citation
  • Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014) The Stanford CoreNLP natural language processing toolkit, in K. Bontcheva and J. Zhu (eds) Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Stroudsburg, PA: Association for Computational Linguistics, pp 5560.

    • Search Google Scholar
    • Export Citation
  • McAvinue, L.P. (2018) Oral language and socioeconomic status: the Irish context, Irish Educational Studies, 37(4): 475503. doi: 10.1080/03323315.2018.1521732

    • Search Google Scholar
    • Export Citation
  • McCarthy, P.M. and Jarvis, S. (2010) MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment, Behavior Research Methods, 42(2): 38192. doi: 10.3758/BRM.42.2.381

    • Search Google Scholar
    • Export Citation
  • McNally, S., McCrory, C., Quigley, J. and Murray, A. (2019) Decomposing the social gradient in children’s vocabulary skills at 3 years of age: a mediation analysis using data from a large representative cohort study, Infant Behavior and Development, 57: 1–13.  doi: 10.1016/j.infbeh.2019.04.008

    • Search Google Scholar
    • Export Citation
  • Michalke, M. (2019) koRpus: An R Package for Text Analysis.  https://cran.r-project.org/web/packages/koRpus/index.html.

  • Mostafa, T., Narayanan, M., Pongiglione, B., Dodgeon, B., Goodman, A., Silverwood, R.J. and Ploubidis, G.B. (2021) Missing at random assumption made more plausible: evidence from the 1958 British birth cohort, Journal of Clinical Epidemiology, 136: 4454.  doi: 10.1016/j.jclinepi.2021.02.019

    • Search Google Scholar
    • Export Citation
  • Parsons, S., Schoon, I., Rush, R. and Law, J. (2011) Long‐term outcomes for children with early language problems: beating the odds, Children & Society, 25(3): 20214.

    • Search Google Scholar
    • Export Citation
  • Power, C. and Elliott, J. (2006) Cohort profile: 1958 British birth cohort (National Child Development Study), International Journal of Epidemiology, 35(1): 3441. doi: 10.1093/ije/dyi183

    • Search Google Scholar
    • Export Citation
  • Richardson, K., Calnan, M., Essen, J. and Lambert, L. (1976) The linguistic maturity of 11-year-olds: some analysis of the written compositions of children in the National Child Development Study, Journal of Child Language, 3(1): 99115. doi: 10.1017/S0305000900001331

    • Search Google Scholar
    • Export Citation
  • Sahu, S., Vishwakarma, Y.K., Kori, J. and Thakur, J.S. (2020) Evaluating performance of different grammar checking tools, International Journal of Advanced Trends in Computer Science and Engineering, 9(2): 222733. doi: 10.30534/ijatcse/2020/201922020

    • Search Google Scholar
    • Export Citation
  • Sawyer, D.J. and Joyce, M.T. (2005) Research in spelling: implications for adult basic education, in J. Comings, B. Garner and C. Smith (eds) Review of Adult Learning and Literacy, vol 6, New York: Routledge, pp 71112.

    • Search Google Scholar
    • Export Citation
  • Schuth, E., Köhne, J. and Weinert, S. (2017) The influence of academic vocabulary knowledge on school performance, Learning and Instruction, 49: 15765.  doi: 10.1016/j.learninstruc.2017.01.005

    • Search Google Scholar
    • Export Citation
  • Shepherd, P. (2012) Measures of Ability at Ages 7 to 16: 1958 National Child Development Study User Guide, London: University of London, Institute of Education, Centre for Longitudinal Studies.

    • Search Google Scholar
    • Export Citation
  • Sirin, S.R. (2005) Socioeconomic status and academic achievement: a Meta-analytic review of research, Review of Educational Research, 75(3): 41753. doi: 10.3102/00346543075003417

    • Search Google Scholar
    • Export Citation
  • Spencer, S., Clegg, J. and Stackhouse, J. (2012) Language and disadvantage: a comparison of the language abilities of adolescents from two different socioeconomic areas, International Journal of Language & Communication Disorders, 47(3): 27484. https://doi.org/10.1111/j.1460-6984.2011.00104.x

    • Search Google Scholar
    • Export Citation
  • Spencer, S., Clegg, J., Stackhouse, J. and Rush, R. (2017) Contribution of spoken language and Socio-economic background to adolescents’ educational achievement at age 16 years, International Journal of Language & Communication Disorders, 52(2): 18496. https://doi.org/10.1111/1460-6984.12264

    • Search Google Scholar
    • Export Citation
  • Strand, S. (2006) Comparing the predictive validity of reasoning tests and national end of key stage 2 tests: which tests are the ‘best’?, British Educational Research Journal, 32(2): 20925. doi: 10.1080/01411920600569073

    • Search Google Scholar
    • Export Citation
  • Sullivan, A. and Brown, M. (2013) Social Inequalities in Cognitive Scores at Age 16: The Role of Reading, CLS Working Paper ’13/10, London: Centre for Longitudinal Studies, Institute of Education.

    • Search Google Scholar
    • Export Citation
  • Sullivan, A., Moulton, V. and Fitzsimons, E. (2017) The Intergenerational Transmission of Vocabulary, CLS Working Paper 2017/14, London: Centre for Longitudinal Studies, Institute of Education.

    • Search Google Scholar
    • Export Citation
  • Sullivan, A., Parsons, S., Green, F., Wiggins, R.D. and Ploubidis, G. (2018) The path from social origins to top jobs: social reproduction via education, The British Journal of Sociology, 69(3): 77698. doi: 10.1111/1468-4446.12314

    • Search Google Scholar
    • Export Citation
  • Walker, D., Greenwood, C., Hart, B. and Carta, J. (1994) Prediction of school outcomes based on early language production and socioeconomic factors, Child Development, 65(2): 60621. doi: 10.2307/1131404

    • Search Google Scholar
    • Export Citation
  • 1 Goethe University Frankfurt, , Germany

Content Metrics

May 2022 onwards Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 151 151 151
PDF Downloads 130 130 130

Altmetrics

Dimensions