New generations of respondents: assessing the representativity of the HILDA Survey’s child sample

Author: Nicole Watson1,2
View author details View Less
  • 1 University of Melbourne, , Australia
  • | 2 University of Queensland, , Australia
Full Access
Get eTOC alerts
Rights and permissions Cite this article

An important aspect of an indefinite life household panel study is to provide a sample of children who become new generations of respondents over time. The representativity of children and young adults in the Household, Income and Labour Dynamics in Australia (HILDA) Survey is assessed after 16 waves. Estimates from the HILDA Survey are compared to official data sources of the Australian Bureau of Statistics (ABS) and include demographic, education, employment, income and residential mobility variables. Both cross-section and longitudinal estimates are assessed. Overall, the HILDA Survey estimates are relatively close to the ABS estimates with the exception of the year of arrival of recent immigrants, having foreign-born parents, having a certificate level qualification, type of relationship in household, having zero income, the main source of income, and residential mobility. Most of these exceptions can be explained by differences in questionnaire design, respondent recall error, linkage error, and differences in the amount of missing data. The estimate of particular concern is the proportion of immigrants arriving in the last five years, which is underestimated in the HILDA Survey due to undercoverage of recent immigrants. This could be addressed by regular refreshment samples of recent immigrants.

Abstract

An important aspect of an indefinite life household panel study is to provide a sample of children who become new generations of respondents over time. The representativity of children and young adults in the Household, Income and Labour Dynamics in Australia (HILDA) Survey is assessed after 16 waves. Estimates from the HILDA Survey are compared to official data sources of the Australian Bureau of Statistics (ABS) and include demographic, education, employment, income and residential mobility variables. Both cross-section and longitudinal estimates are assessed. Overall, the HILDA Survey estimates are relatively close to the ABS estimates with the exception of the year of arrival of recent immigrants, having foreign-born parents, having a certificate level qualification, type of relationship in household, having zero income, the main source of income, and residential mobility. Most of these exceptions can be explained by differences in questionnaire design, respondent recall error, linkage error, and differences in the amount of missing data. The estimate of particular concern is the proportion of immigrants arriving in the last five years, which is underestimated in the HILDA Survey due to undercoverage of recent immigrants. This could be addressed by regular refreshment samples of recent immigrants.

Key messages

  • A vital part of an indefinite life panel study are the children who later become respondents.

  • This paper compares the HILDA Survey to official sources cross-sectionally and longitudinally.

  • New generations of children and young adults are found to be representative.

  • Some concern identified for undercoverage of recent immigrants.

Introduction

Household panel studies begin with a representative cross-sectional sample and a set of following rules that determine who is followed and interviewed over time. Most household panel studies have an indefinite life design which means that, subject to funding, they can continue indefinitely (Rose, 2000). A sample of new births is added each wave as babies are born to existing sample members. The dynamics of the sample, when appropriately weighted to account for the effects of attrition and the following rules, should reflect dynamics in the population in terms of household structure changes, births, deaths and emigration.

An important aspect of an indefinite life household panel study is to provide a sample of children who grow up in the study that are, as far as possible, representative of the population. This is not often directly assessed. These children only become eligible to be interviewed when they turn an appropriate age, such as age 15 or 16. The child’s parents may have left the study well before the child can decide for themselves, or if the parents continue to participate faithfully over time, the child may decline to participate when given the opportunity. There are two groups of children of specific interest: (1) the children in responding households in wave 1; and (2) the babies that are born into the sample. The first group of children have a known probability of selection in wave 1 whereas the second group of children have a probability of selection assigned to them via one or both of their parents. Further, after 16 waves, the information available for these two groups is different: children in the first group will have a wide range of data collected during their interviews whereas the second group will have much more limited data available.

To assess the representativeness of these two child cohorts in a household panel study, we must first ask ‘what does it mean to be representative?’ In the 2017 Longitudinal Studies Strategic Review undertaken for the UK Economic and Social Research Council, the authors highlight two questions that need to be asked regarding representativeness: ‘Of what population is the sample representative?’ and ‘For what characteristics of the target population can the sample data provide information?’ (Davis-Kean et al, 2017: 21). The first question requires the target population for the inferences using longitudinal data to be defined. This is the population of individuals that could have been selected into the sample (though not necessarily with equal probability). The second question requires that weighted sample estimates are valid estimates of the characteristics of the population.

There are two common ways researchers demonstrate sample representativeness: by comparing response rates for subpopulations and by comparing survey estimates to external sources (Benzeval et al, 2020). Analysis of initial wave non-response (for example, Nathan, 1999; Wooden et al, 2002; Zimmermann et al, 2003) and subsequent attrition (for example, Fitzgerald et al, 1998; Thomas et al, 2001; Lipps, 2007; Uhrig, 2008; Watson and Wooden, 2009) is relatively routine. However, this type of analysis of response indicates the scope for bias in estimates but does not provide an estimate of the bias once suitable weights are applied. Furthermore, it also focuses on adults so tells us little about the potential biases in the child cohorts other than perhaps how their parents have responded over time. Specific studies of the representativeness of child cohorts in household panel studies are needed to understand potential biases for these new and emerging generations of respondents.

Of particular importance for a longitudinal study is its longitudinal representativeness; however, this can be difficult to measure due to a lack of available longitudinal data sources with which to compare population estimates, especially in the absence of population registers. Fortunately, this is beginning to change with the emergence of linked administrative data that include a time dimension and linked census data. Measuring the cross-sectional representativeness of a longitudinal survey is more straightforward due to the availability of a wide range of cross-sectional survey data. Ideally, both the longitudinal and cross-sectional representativeness should be assessed.

Comparisons of a broad range of population estimates in a longitudinal survey with external data sources exist but often focus on the entire sample (such as Duncan and Hill, 1989; Lynn, 2006; Andreski et al, 2009; Borkowska, 2019). It relatively uncommon for these comparisons to focus on the children or young adults unless the study is of a particular child/​youth cohort. Nevertheless, three such studies exist, all focused on the longest running household panel study, the US Panel Study of Income Dynamics (PSID) which started in 1968. In the first study, Fitzgerald and colleagues (1998) assessed the representativeness of young adults (aged 20–38) in the 1989 PSID (who were children when the study began) against the Current Population Survey (CPS). They found only a few percentage points of difference in education, marital status and region. More importantly, Hispanics were under-represented in the PSID by half (4% in the PSID compared to 8% in the CPS) and mean wages and salaries were 5% higher in the PSID than the CPS. In a subsequent analysis, Fitzgerald (2011) continued to focus on this original child sample and compared them with the National Health Interview Survey (NHIS) at three points in time: 1986, 1999 and 2007. Differences in education and marital status persisted, but differences in income disappeared over time and differences in employment appeared. By 2007 (when they were aged 39–55), the proportion of Hispanic in the PSID was a third of that reported in the NHIS. In contrast to these two studies, Duffy and Sastry (2012) assessed the representativeness of a group of children born after the study began. They compare children (aged 0–17) in the 2007 PSID against the American Community Survey (the replacement for the long-form in the decennial census). They found generally good representation, apart from Asian and Hispanic children and those born to foreign-born parents.

Aside from these broad-ranging comparisons, there are, of course, examples in the literature where specific variables from a longitudinal survey are compared with one or more external sources to demonstrate the validity of the data. Examples of variables examined in this fashion include expenditure (Wilkins and Sun, 2010; Andreski et al, 2014), income (Jenkins, 2010), labour market transitions (Fitzpatrick, 2017), neighbourhood type (Petersen and Rabe, 2013), psychological distress (Wooden, 2009), physical activity (Polidano et al, 2020) and residential mobility (Buck, 2000; Watson, 2020).

In this paper, the representativeness of two child cohorts from the Household, Income and Labour Dynamics in Australia (HILDA) Survey are examined by comparing a range of demographic, education, employment, income and residential mobility estimates to four external sources. The two cohorts are young adults aged 15–29 in 2016 (aged 0–14 in 2001 when the study began) and children aged 0–14 in 2016 (those born since the study began). Four external data sources are used in the comparisons, all collected by the Australian Bureau of Statistics (ABS): the 2016 Census; the Australian Census Longitudinal Dataset (ACLD) (linking the 2006, 2011 and 2016 Census data); the 2014 General Social Survey (GSS); and the 2017 Childhood Education and Care Survey (CEaCS). The GSS and CEaCS are expected to provide good benchmark estimates for the two child cohort populations as the data collection methods are similar to those used in the HILDA Survey. The ACLD is used to provide longitudinal benchmark estimates that are not available elsewhere. The Census is expected to provide good key demographic benchmark estimates. Other Census estimates may not be as good as the GSS or CEaCS estimates due to differences in collection methodologies. All survey estimates are weighted and standard errors account for the respective complex sample designs. The differences between HILDA Survey estimates and ABS estimates are tested for statistical significance, and corresponding effect sizes are calculated.

Data

A summary of the design features of the various data sources used in this comparison of estimates is provided in Table 1. Each of these data sources is described in turn.

Table 1:

Design features of different data sources used

HILDA SurveyCensus 2016ACLD 2006–16GSS 2014CEaCS 2017
TypeLongitudinal surveyCensusProbabilistically linked Census fileCross-sectional surveySubset (7/8) of Labour Force Survey sample (which has a rotating panel design)
Mode of data collectionInterviewer administered. 90% face-to-face, 10% telephoneHousehold-completed (hardcopy or online)Household-completed (hardcopy or online)Interviewer administered. Face-to-faceInterviewer administered. Face-to-face or telephone
Who is interviewedEach person aged 15+ in the household. Adult (usually parent or guardian) reporting on behalf of children in the householdOne or more persons answers the questions for each person in the household, so have some proxy reportsOne or more persons answers the questions for each person in the household, so have some proxy reports (could be different people answering each Census)Randomly selected person aged 15+ from the householdParent or guardian of up to two randomly selected children from the household
Population exclusionsPeople in very remote parts of Australia and those in non-private dwellingsNone*None*People in very remote parts of Australia, discrete Indigenous communities and those in non-private dwellingsPeople in Indigenous communities and those in non-private dwellings
Primary sources of errorInitial wave non-response; attritionCensus undercount; proxy reporting errorCensus undercount; proxy reporting error; linkage error (missing links, false links)Non-responseNon-response

Note: * The Place of Usual Residence database is used. Australian residents who are visiting other locations in Australia on Census Night are put back into their usual Mesh-Block (but not to a specific dwelling or family) and overseas visitors are excluded. The other surveys reported here also use the concept of usual residence (so visitors are excluded).

The HILDA Survey is a nationwide household panel study that began in 2001 (Department of Social Services and Melbourne Institute of Applied Economic and Social Research, 2019; Summerfield et al, 2019). Interviews are conducted annually with all household members aged 15 years and older. Most interviews (95%) are completed between August and November and include core questions on families, employment and income. The HILDA Survey sample is representative of people living in private dwellings excluding those living in very remote areas or in institutions. The household response rate in wave 1 was 66%, resulting in 7,682 responding households that contained 19,914 individuals. These individuals (termed continuing sample members, CSMs) form the basis of the sample that is followed and interviewed over time. Other people who join the household of a CSM are temporarily added to the sample. Babies of a CSM, the other parent of these babies and recent immigrants are converted to CSMs. A population-wide top-up sample was added in 2011 which included, importantly, immigrants arriving in Australia after 2001 (Watson and Wooden, 2013). The household response rate for the initial wave of the top-up sample was 69%, resulting in 2,153 responding households and 5,462 individuals being added to the sample. Reinterview rates (that is, the percentage of respondents interviewed in one wave that are reinterviewed in the next, excluding those who have died or moved abroad) are high in both samples, rising from 87% in wave 2 to over 96% from wave 9 for the main sample and rising from 92% in wave 12 to over 95% from wave 15 for the top-up sample. In 2016 (wave 16) there were 9,750 responding households containing 23,505 individuals.

Another way to examine the reinterview rates is to calculate the proportion of the 19,914 wave 1 sample members that are interviewed each wave, excluding people who have died or moved abroad. Figure 1 shows these proportions by five-year age groups. The age cohort ranges from age 0–4 to aged 75 and over and are measured at wave 1. In wave 1, 92% of adults in responding households provided an interview. Those aged 15–29 were less likely to provide an interview in wave 1 than the older age groups. These young adults also have much steeper declines in response over time compared to older sample members.1 Of particular interest to this paper is the response rates for the children aged 0–14 in wave 1 who become of an age to be interviewed in later waves (that is, the first three groups). Their initial response rate is lower due to attrition of their parents, but they then show less sample loss after their initial interview than those initially aged 15–29. This suggests that growing up in a household where a child’s parents are regularly interviewed has a positive effect on the participation of younger sample members.

Figure 1:
Figure 1:

Proportion of wave 1 sample members interviewed over 18 years, by five-year age cohorts

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

Note: The cohort a person belongs to is defined by their age at wave 1 with five-year age groups, ranging from age 0 to 4 (referred to as ‘c0_4’ in the legend) to aged 75 and over (‘c75plus’).

The Australian Census of Population and Housing (referred to as the ‘Census’) occurs every five years on the second Tuesday in August (‘Census Night’) and is run by the ABS. For the purposes of this comparison, the 2016 Census is used. Online completion of the Census was strongly encouraged in 2016, with 59% of households completing the form online compared to 34% in 2011 (Harding et al, 2017). The remaining households completed the paper form. One or more people filled out the form on behalf of all members of the household. The net undercount was 1% and the ABS have adjusted the Census numbers for this (ABS, 2017).

The ABS also produce the ACLD which is a 5% random sample of the 2006 Census linked to subsequent Censuses. For the purposes of this analysis, three waves of the 2006 ACLD panel is used. This panel comprises the 2006 sample linked to the 2011 Census and then to the 2016 Census. A combination of deterministic and probabilistic linkages were used, with 75% of the data linked via a deterministic link on the basis of sex, date of birth, very small geographical areas, country of birth and (for the 2011–16 linkages only) hash codes of first and last name (ABS, 2019b). Probabilistic linkage allows linkages to be made despite missing or inconsistent data if there is enough agreement on other variables. Up to 12 passes were made through the data to identify the best possible links based on a range of characteristics. The ABS estimate a false link rate of approximately 5%. Of the sample records from the 2006 Panel, 77% were linked to the 2011 Census and 80% of these pairs were then linked to the 2016 Census, resulting in 605,618 links (ABS, 2019c). Weights were created to adjust for the sample design, Census net undercount and missed links (Chipperfield et al, 2017).

The GSS is a repeated cross-sectional survey conducted every four years by the ABS on a range of social topics similar to those included in the HILDA Survey. The estimates drawn from the GSS relate to 2014 where the fieldwork was conducted from March to June. The GSS sample is representative of people living in private dwellings excluding those living in very remote parts of Australia and discreet Indigenous communities. Smaller states and low socio-economic areas were oversampled. One person aged 15 or older was randomly selected to be interviewed in each household. A response rate of 80% was achieved from in-scope dwellings yielding a total sample of 12,932 responses (ABS, 2015).

The CEaCS is a repeated cross-sectional survey conducted every three years in June by the ABS and asks questions about the care arrangements and early childhood education for children 0–12. The CEaCS sample is a subsample of the monthly Labour Force Survey sample. People living in non-private dwellings or Indigenous communities are excluded (ABS, 2018). Parents provided information with respect to up two randomly selected children aged 0–12. For the 2017 CEaCS, a household response rate of 88% was achieved, resulting in 7,411 children records from 4,813 households.

Methods

To assess the representativeness of the HILDA Survey’s child sample, a range of estimates for two cohorts are compared to ABS estimates. The two cohorts are children aged 15–29 in 2016 (aged 0–14 years old in 2001) and children aged 0–14 years in 2016 (those born after the sample was selected in 2001). Most of the estimates compared are cross-sectional (due to the greater availability of cross-sectional data) and some are longitudinal. For the comparisons to CEaCS estimates, the younger cohort age range is further restricted to 0–12 years to match the age range covered by CEaCS.

The four populations of interest in this comparison of HILDA Survey estimates with ABS estimates are:

  • 2016 cross-sectional population. Most of the estimates described later relate to this cross-sectional population and, depending on the variable estimated, are restricted to people aged 0–12 years, 0–14 years or 15–29 years.

  • 2011–16 longitudinal population. For people aged 5–29 years in 2016;

  • 2006–16 longitudinal population. For people aged 15–29 years in 2016; and

  • 2001–16 longitudinal population. For people aged 15–29 years in 2016.

Table 2 provides the list of variables compared between the HILDA Survey and the four ABS data sources. The variables measured cross-sectionally are denoted by ‘C’ and those measured longitudinally are denoted by ‘L’. The cross-sectional estimates for 2016 include:

  • core demographic variables – sex, age, state, remoteness area, whether parents were foreign-born, Indigenous status, relationship in household, marital status and immigrant’s year of arrival to Australia;

  • education variables – year of school, type of school, highest school qualification and highest post-school qualification;

  • employment variables – employment status and hours worked; and

  • income variables – amount of income and principal source of income.

Table 2:

Variables compared across data sources

HILDA SurveyCensusACLDGSSCEaCS
For people aged 0–14 or 15–29 in 2016
 SexCC
 AgeCC
 StateCC
 Remoteness areaCC
 Whether parents were foreign-bornCC
 Indigenous statusCC
For people aged 0–12 years in 2016
 Year at school attendingCC
 Type of school attendedCC
For people aged 15–29 years in 2016
 Relationship in householdCCC
 Marital statusCCC
 Year of arrivalCCC
 Highest school level completedCCC
 Post-school educationCCC
 Employment statusCCC
 Hours workedCCC
 IncomeCC
 Main source of incomeCC
 Change in remoteness area in 10 yearsLL
 Moved interstate in 10 yearsLL
 Move out of homeLL
 Moved in last 15 yearsLL
For people aged 5–29 years in 2016
 Moved in last 5 yearsLCC

Notes: C=cross-sectional estimate, L=longitudinal estimate.

The variables that measure change over a particular time period include whether moved in the last five years (2011–16); change over a ten-year period in remoteness area, state and leaving home (2006–16); and whether moved in the last 15 years (2001–16). The first longitudinal estimate (moving in the last five years) is compared to cross-sectional estimates as Census respondents recall where they lived five years ago and GSS respondents recall how long they have lived at their current address. For the HILDA Survey, the equivalent measure is calculated from a set of yearly move indicators.

Note that the estimates from the GSS and CEaCS are measured in 2014 and 2017 respectively. These estimates are compared to the 2016 HILDA estimates under the assumption that little would have changed in the intervening period (that is, those aged 15–29 in 2016 are very similar to those aged 15–29 in 2014, and so on).

The question wording and the percentage of missing responses for each data source is provided in Tables A1 and A2 respectively in the Appendix.

The set of variables in Table 2 were chosen to reflect the variables commonly used by HILDA Survey data users and cover a range of different domains. The variables are also constrained by what type of variables were available for the various ABS sources and whether the HILDA Survey collected the same (or similar) concept. With the exception of the key demographic variables (age, sex, state, and remoteness area) where the Census provides the best benchmark, the ABS estimates are obtained from each data source where it is available.

All ABS estimates are extracted via TableBuilder, an online interactive interface that the ABS use to enable registered users to specify customised aggregated tables (available at https://abs.gov.au). Standard errors are provided for the GSS and CEaCS but are not available for the ACLD. While the ACLD is a 5% sample of the Census and standard errors are technically relevant, they would be very small given the particularly large sample involved.

The HILDA Survey estimates are calculated using Stata’s survey commands (svy) that adjusts for complex survey designs. The cross-sectional estimates for children aged 0–12 or 0–14 years are weighted by the 2016 cross-sectional enumerated person weight (that is, requiring them to be part of a responding household in 2016). The cross-sectional estimates of sex, age, state, remoteness area and relationship in household for adults aged 15–29 use the 2016 cross-sectional enumerated person weight, whereas the remaining estimates use the 2016 cross-sectional responding person weight as they use information collected in the individual interview. The relevant longitudinal enumerated person weight (for 2001–16, 2006–16 or 2011–16) are applied to produce the estimates of moving house over a 15-, 10- and 5-year period. These longitudinal enumerated person weights require the individual to be part of a responding household each year in the relevant period. Standard errors account for the complex sample design of the HILDA Survey and are calculated using the Taylor series linearisation variance estimation (Wolter, 2007).

Differences between the HILDA Survey estimates and the Census or ACLD are statistically significant if the Census or ACLD estimates fall outside the 95% confidence interval for the HILDA Survey estimate. When comparing two survey estimates, x and y, from different samples, the standard error of the estimate of the difference, is calculated as:
M1

This standard error is used to calculate a 95% confidence interval of the difference between HILDA Survey estimate and the GSS or CEaCS estimate. When this confidence interval of the difference does not include zero, then the difference between the two estimates is statistically significant.

Even when estimates are significantly different from each other, it does not mean that the difference is important. To provide a guide to the relative difference between two estimates, the absolute standardised difference (ASD) describes the difference between estimates in units of the standard deviation and is calculated as:
M2

where x and y are estimates from the HILDA Survey and ABS data source respectively, and s is the standard deviation of y from the ABS data source (on the premise that the ABS estimate is superior to the HILDA Survey or a pooled version of the standard deviation based on both sources). As all of the estimates compared in this paper are proportions, denoted in the following by px and py, . It has been suggested that an effect size of 0.1 can be considered negligible (Austin, 2009) and that an effect size of 0.2 can be considered small (Cohen, 1988). That said, effect sizes should be assessed in context (Cohen, 1988; Rosenthal, 1996; Ferguson, 2016). Taking a relatively conservative approach in this paper, differences in proportions that have an ASD of less than 0.1 will be considered very small.

Results

Demographic variables

The difference between the HILDA Survey and Census estimates for sex, age, state, remoteness area, foreign-born status of parents and Indigenous status for the two cohorts of interest for 2016 are shown in Figure 2. A positive difference indicates the HILDA Survey estimate is higher than the Census estimate and a negative difference indicates it is lower. Almost all of the HILDA Survey estimates are consistent with the Census estimates. This is expected for the first three variables as the HILDA Survey estimates for sex, age and state are calibrated to the Estimated Residential Population (ERP) during the weighting process (Watson, 2012). The ERP is underpinned by the Census figures along with the latest information available on births, deaths, and overseas and interstate migration (ABS, 2019a).

Figure 2:
Figure 2:

Difference between HILDA Survey and Census estimates of sex, age, state, remoteness area, whether parents are foreign-born, and Indigenous status, by 15-year age group

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

Note: The Census estimates for state, remoteness area and Indigenous status exclude people living in very remote areas.

The HILDA Survey estimate of the proportion of people aged 15–29 years who have both of their parents born in Australia is significantly higher than the Census (Census: 0.503; HILDA Survey: 0.561; ASD = 0.12) and the estimate of the proportion of people with both parents foreign-born is significantly lower (Census: 0.353; HILDA Survey: 0.270; ASD = 0.17).

The last estimate presented in Figure 2 is the proportion of each cohort who are Indigenous. For people aged 0–14 in the HILDA Survey, it is assumed they are Indigenous if either of their parents report being Indigenous, and for people aged 15–29 their Indigenous status is collected in their first interview. The HILDA Survey estimate for the people aged 0–14 matches the Census estimate but it is higher for the older cohort (Census: 0.034; HILDA Survey: 0.048). Nevertheless, the ASD for this older cohort is 0.07, suggesting this difference is relatively small.

In Figure 3, the differences between the HILDA Survey estimates and the Census and GSS estimates of relationship in household, marital status and year of arrival for people aged 15–29 are provided. There are substantial differences between the Census and GSS estimates for most relationship types and, for the most part, the HILDA estimates tend to be between the two ABS estimates. The exceptions are that the HILDA Survey estimates, compared to those from either the GSS or the Census, are significantly higher for non-dependent children (GSS: 0.201; Census: 0.233; HILDA Survey: 0.283) and lower for unrelated persons (GSS: 0.114; Census: 0.147; HILDA Survey: 0.032). The ASD of the estimates for non-dependent children is 0.20 for the HILDA Survey to GSS comparison and 0.12 for the HILDA Survey to Census comparison, and the ASD of the estimates for unrelated persons are larger still (0.26 for the GSS to HILDA comparison and 0.33 for the Census to HILDA comparison).

Figure 3:
Figure 3:

Difference between HILDA Survey estimates and Census and GSS estimates of relationship in household, marital status and year of arrival, persons aged 15–29

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

The married and de facto estimates from the GSS and the HILDA Survey align with each other but, for those who are not married or de facto, the HILDA Survey estimate is significantly lower with a modest ASD of 0.12. The Census estimates for being in a de facto relationship and being neither de facto or married are also significantly different to the HILDA Survey estimates, though only the estimate for de facto relationships has an ASD above the 0.1 threshold at 0.14.

The year of arrival estimates for people aged 15–29 align reasonably well between the HILDA Survey and the Census and GSS. An obvious difference is for the proportion of people who arrived one to five years ago (Census: 0.102; GSS: 0.112; HILDA Survey: 0.042; ASD = 0.20 for the Census comparison and 0.22 for the GSS comparison). Only the GSS estimate of the proportion of people aged 15–29 born in Australia is statistically significantly lower than the HILDA Survey estimate (GSS: 0.751; HILDA Survey: 0.812; ASD = 0.14) which corresponds to having fewer people that are foreign-born in the sample.

Education variables

The top half of Figure 4 show differences between the HILDA Survey and CEaCS estimates for year of school and type of school attended for children aged 0–12. The HILDA Survey information about a child’s education is collected every four years as part of the education module and is included in the 2016 (wave 16) questionnaires. These questions are asked of one of the child’s parents (or guardian). The HILDA Survey estimates align well with the CEaCS estimates for year of school and type of school. While the estimate for the youngest group of school children is statistically significantly higher in the HILDA Survey, the ASD is 0.05 suggesting the difference is very small.

Figure 4:
Figure 4:

Difference between HILDA Survey estimates and Census, GSS and CEaCS estimates of year of school and school type (for children aged 0–12), and highest year of school completed and highest post-school qualification completed (for people aged 15–29)

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

The bottom half of Figure 4 show differences between the HILDA Survey estimates and those from the Census and GSS for the highest year of school completed and the highest post-school qualification completed for people aged 15–29. While not all individuals in this age group have completed all of their education, these education variables measure the highest level they have completed so far. The HILDA Survey estimates align closely with the GSS estimates for the two education variables. The Census estimates are also reasonably close. There are two estimates where the GSS estimate is significantly different from the HILDA Survey estimate but the Census estimate is even further away from the GSS estimate. These are the proportion who have completed a post-school certificate (Census: 0.178; GSS: 0.266; HILDA Survey: 0.232) and the proportion who have not completed any post-school education (Census: 0.566; GSS: 0.453; HILDA Survey: 0.518). The ASDs for these differences are all below 0.14.

Employment and income variables

The next set of estimates for people aged 15–29, presented in Figure 5, relate to employment status and, for those who work, the hours that are worked. The HILDA Survey estimates are very similar to the GSS estimates but show somewhat more people in part-time work. The Census estimates relate to the hours worked in the last week, whereas the HILDA Survey and GSS estimates are for hours usually worked in a week. Also, the questions in the Census are typically answered by one person on behalf of the others in the household so the responses are probably less reliable than asking each person individually (as is done in the GSS and the HILDA Survey). The questions asked in the GSS and HILDA Survey to determine employment status are also much more comprehensive than those in the Census as they seek to tease out some of the nuances. The ASD for the GSS and HILDA Survey comparisons are 0.11 or less and for the Census and HILDA Survey comparisons they are 0.13 or less.

Figure 5:
Figure 5:

Difference between HILDA Survey estimates and Census and GSS estimates of employment status, hours worked, of income and principal source of income, persons aged 15–29

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

Figure 5 also shows, for the 15–29 year age group, the comparison of income categories for the HILDA Survey and the Census. Compared to the Census, the HILDA Survey has fewer people with no income, and more people with low incomes (up to $7,799 per year) or medium incomes ($7,800 to $33,799 per year). The largest ASD is 0.17 for the no income category and it is 0.11 or less for the other differences.

The last set of estimates in Figure 5 shows the comparison of the main source of income for the HILDA Survey and the GSS. Compared to the GSS, the HILDA Survey estimates a higher proportion of people with an undefined income source which is classified as nil, negative or unknown personal income amounts (GSS: 0.176; HILDA Survey: 0.233), and fewer people with government pensions and benefits (GSS: 0.164; HILDA Survey: 0.120) and other sources of income (GSS: 0.056; HILDA Survey: 0.028). The ASD for the comparison of undefined income is 0.15 whereas it is 0.12 for the other two sets of estimates with significant differences.

Residential mobility estimates

The final series of estimates presented in Figure 6 focus on residential mobility by five-year age groups (as measured in 2016). The first three sets of estimates examine change between 2006 and 2016 in terms of moving from regional areas to major cities, moving interstate and moving out of the parental home. The estimates are very similar between the HILDA Survey and the ACLD, though there are three statistically significant differences. The HILDA Survey estimates are higher for the proportion of people aged 25–29 moving from regional to city areas over the ten-year period compared to the ACLD estimate (ACLD: 0.085; HILDA Survey: 0.110; ASD = 0.09), along with a higher proportion of people in the same age group moving out of their parental home (ACLD: 0.101; HILDA Survey: 0.122; ASD = 0.13). The ACLD has a lower proportion of people aged 10–14 leaving their parental home than the HILDA Survey (ACLD: 0.021; HILDA Survey: 0.008; ASD = 0.09). Only one of these differences has the ASD above the 0.1 threshold.

Figure 6:
Figure 6:

Difference between HILDA Survey estimates and ACLD and GSS estimates of moving into cities, interstate, and out of home in last 10 years (2006 to 2016), and moving in the last 5 years (2011 to 2016) and the last 15 years (2001 to 2016), by 5-year age group

Citation: Longitudinal and Life Course Studies 2022; 10.1332/175795921X16349086588358

Note: The ACLD estimates for moving from regional/​remote areas into major cities exclude people living in very remote areas in 2006.

The five-year mobility rates from the HILDA Survey are almost always significantly higher than the Census (with an ASD of 0.17 or higher) but the HILDA Survey and GSS estimates are in alignment. Similarly, the 15-year mobility rates from the HILDA Survey are significantly higher than those from the ACLD (ASD = 0.12 or higher).

Discussion

To assess the representativeness of two cohorts of children in the HILDA Survey, a range of demographic, education, employment, income and residential mobility estimates have been compared to four official data sources from the ABS. Most of the estimates are cross-sectional, though some are longitudinal, due to the limited official longitudinal data available. The two cohorts examined are those who were under 15 when the initial sample were established (aged 15–29 in 2016) and those born after the study began (aged 0–14 in 2016).

In general, the HILDA Survey estimates for both cohorts align closely with the ABS estimates. While a number of statistically significant differences have been identified, half have an ASD of less than 0.1 suggesting the differences are very small. There are seven variables that have larger deviations from the ABS sources. The reason for each of these differences and potential areas of further exploration are discussed.

Parent’s country of birth. The HILDA Survey estimate of the proportion of people aged 15–29 in 2016 from immigrant families is lower than that reported in the Census. This stems from an under-representation of immigrants from non-English speaking countries in wave 1 (Wooden et al, 2002) and fewer people arriving in the last five years (discussed later). It is not clear why the younger cohort is not similarly affected. The answer may lie in differences in the amount of missing data: 5.3% of children aged 0–14 and 7.0% of young adults aged 15–29 are missing their parent’s country of birth in the Census compared to 16.0% and 0.8% respectively in the HILDA Survey. The reason for the much higher rate of missingness of parent’s country of birth for children aged 0–14 years in the HILDA Survey is that for this variable to be calculated the children need to have lived with both of their parents (not necessarily at the same time) and each parent had to provide their country of birth in their initial interview.

Relationship in household. For those aged 15–29, the estimate of the proportion of non-dependent children (aged 15–29) is higher and the proportion of unrelated people is lower in the HILDA Survey than the Census and GSS. Part of the differences in the household relationship estimates may be explained by how the different data sources collect information on relationships. The Census collects the relationships of one person (assigned as Person 1) to every other person in the household along with the relationship of the children to Person 1’s spouse or partner (assigned as Person 2), whereas in the HILDA Survey the relationships from each person to every other person is collected. It is therefore important in the Census who is chosen as Person 1 and Person 2 in the household. Households are instructed to select Person 1 as the person who ‘has meaningful relationships to the majority of the people in the dwelling’ (ABS, 2017, Online Form, p 1). For the GSS, the interviewer guides the selection of a suitable household reference person, seeks the relationship of that person to every other person in the household, and for those identified as parents of the household reference person asks how the parents are related to others in the household. With the Census and GSS questions it may be impossible to capture the complexities of some of the relationships in the household. According to the HILDA Survey, in 2016, 17% of people aged 15–29 live in households that are structurally more complex than a couple with children (of any age), a lone parent with children (of any age), or a single-person household. Indeed, rederiving the relationship in household variable using restricted relationship information from a suitably selected Person 1 and 2 results in estimates from the HILDA Survey that are closer to the Census estimates but does not eliminate differences between these two data sources. For example, the proportion of adults that are unrelated increases from 0.032 to 0.043 which is still far from the GSS and Census estimates of 0.114 and 0.147 respectively. A further complication in the Census is that persons away from their usual household on Census Night for reasons other than shift work would not be captured in the relationship information as the Census Form is only meant to capture ‘all people … in this household on Census night’ (ABS, 2017, Online Form, p 1). People who were away on Census Night should have completed a Census Form elsewhere.

Year of arrival. The HILDA Survey underestimates the proportion of people who arrived in Australia in the last five years. Differences between the 2016 HILDA Survey estimates and the ABS estimates are expected for immigrants arriving in the last five years as the HILDA Survey does not have a natural mechanism to incorporate a representative sample of recent immigrants into the sample. The top-up sample that was added in 2011 included a representative sample of immigrants arriving in the first ten years of the study. The chances of adding people to the sample (via the following rules) who have arrived since 2011 are limited. Another refreshment sample is needed to improve the representativity of the immigrants arriving since 2011 (as discussed in Watson and Lynn, 2021).

Post-school qualifications. Some of the differences observed for the proportion of people with certificates or no post-school qualifications may be due to differences in how the questions were asked rather than differences in the sample composition. The GSS question is ‘What is the level of the highest qualification that you have completed?’ whereas the HILDA Survey collects an update each year of the qualifications completed since last interview via ‘Still looking at SHOWCARD A17, what qualification(s) did you complete since [PREVINTERVIEWDATE]?’ The responses to the HILDA Survey question are combined with previous responses to determine the highest qualification completed. It is possible that some certificates have been missed this way if the respondent does not correctly recall when the qualification was completed and placed it before the last interview. Indeed, in the HILDA Survey rotating education module included in 2016, a question about the highest post-school qualification was asked in order to collect subsequent information about the highest qualification. Replacing the information in the historical variable with this information changes the HILDA Survey estimate for post-school certificate to from 0.232 to 0.257 and for no further post-school education from 0.518 to 0.464. The two new estimates are not significantly different from the GSS estimate. It is suggested that the data producers compare the post-school qualifications collected in the rotating education module to the history questions to identify and correct missing certificate qualifications where possible.

Income. The HILDA Survey estimate for the proportion of people aged 15–29 with no income is less than that in the Census. The way the income data is collected in the Census is quite different from the HILDA Survey. In the Census, one person is reporting on behalf of others in their household and records the income band for each individual (15 years of age and over) in the household. In the HILDA Survey, many questions are asked of each individual about a wide range of income sources. While it would have been better to compare the HILDA Survey’s income estimates with the GSS or the Survey of Income and Housing, the relevant information was not available in TableBuilder. Nevertheless, the comparison with the Census is still useful and the differences observed are consistent with the differences in the method of collection.

Source of income. Differences in the source of income for people aged 15–29 between the HILDA Survey and the GSS are largest for the ‘undefined’ category due to differences in the proportion of missing income in the two surveys (5.6% for the GSS and 9.5% for the HILDA Survey). This will have flow-on effects to differences in estimates for the other sources of income. More detailed comparisons of income to other sources, such as the Australian Taxation Office Longitudinal Information Files (Polidano et al, 2020), would be helpful in understanding differences in income estimates.

Residential mobility. The HILDA Survey estimates for moving house in the last five years are higher than the Census estimates (but are similar to the GSS estimates) and the HILDA Survey estimates for moving in the last 15 years are higher than the ACLD estimates. These findings are consistent with a previous comparison of the HILDA Survey estimates of residential mobility with the GSS, 2011 Census and 2006–11 ACLD. In that analysis it was shown that the five-year mobility rates from the HILDA Survey closely aligned with the GSS but were significantly higher than the Census estimates for people in their 20s and 30s (Watson, 2020). Also, the ten-year mobility rates in the HILDA Survey were significantly higher than the ACLD estimates by between 5 and 10 percentage points across most of the age distribution. The higher estimate for the older age group may reflect under-reporting of moves in the Census or matching difficulties in the ACLD for people who move.

Overall, the HILDA Survey sample of children and young adults in 2016 is representative of the population. To give an indication of the overall level of agreement for estimates where the HILDA Survey is compared with both the Census and the GSS, the intra-class correlation (ICC) between the HILDA Survey estimates and the Census estimates is 0.981 and between the HILDA Survey estimates and the GSS estimates it is 0.986.2 Reasons for the differences in estimates have already been discussed, but one area of particular concern is immigrants arriving in Australia since the last top-up sample. This finding is consistent with the three studies assessing the representativeness of two child cohorts in the PSID (Fitzgerald et al, 1998; Fitzgerald, 2011; Duffy and Sastry, 2012). They found that, after 21 and 39 years, the Hispanic population was under-represented in the older cohort (aged 0–16 in 1968) and that Asian and Hispanic children and those born to foreign-born parents were under-represented in the younger cohort (aged 0–17 in 2007). These findings are, of course, not germane to the children of household panel studies. In a comparison of wave 8 of UK’s Understanding Society to the Census, Borkowska (2019) identified an under-representation of certain ethnic minorities, more than that identified in wave 1. Lynn (2006) similarly found an under-representation of ethnic minorities in wave 11 of the British Household Panel Survey that was greater than in the initial wave. While some of these differences may be due to attrition it is clear that recent immigrants are subject to undercoverage in household panel studies without regular refreshment samples.

This comparison of the two child cohorts from the HILDA Survey to external data sources provides the first comparison across a broad range of estimates, including both cross-sectional and longitudinal estimates. It compares the HILDA Survey estimates against multiple external sources (rather than just one) and delves into the potential reason for the differences. The findings reinforces the need for longitudinal survey data users to be aware of the potential for differences in population estimates due to differences in questionnaire design, respondent recall, the amount of missing data, linkage errors in linked data, and undercoverage of recent immigrants. A limitation of this study is that it contains a limited number of comparisons of longitudinal estimates due to the lack of available external longitudinal data sources.

Comparisons to a range of data sources, including linked administrative data with a time dimension, and across a variety of variables are encouraged. This will extend the evidence base for the quality of household panel studies, not just for the children and young adults but also for other population subgroups.

Notes

1

Also obvious in this graph is the steep decline in the response rates for the elderly sample members (aged 75 years or older in wave 1) due in part to unreported deaths.

2

The ICC measures agreement between individual estimates (ratings) from different data sources (raters) using a two-way mixed model where the data sources (raters) are fixed. The ICC between the Census estimates and the GSS for the common set of variables is 0.966.

Acknowledgement

The author is grateful for the research assistance provided by Yihua Jin who prepared some initial programs. The author also appreciates the helpful feedback provided by Janeen Baxter, Michele Haynes, John Henstridge and the anonymous reviewers on earlier versions of this paper.

Funding

Not applicable.

Data availability

The HILDA Survey was initiated and is funded by the Australian Government Department of Social Services (DSS) and is managed by the Melbourne Institute of Applied Economic and Social Research (Melbourne Institute). The findings and views reported in this paper, however, are those of the author and should not be attributed to the Australian Government, DSS or the Melbourne Institute. The data are available through Dataverse at the Australian Data Archive (https://dataverse.ada.edu.au/).

Experimentation on humans and animals statement

Not applicable.

Conflict of interest

The author declares there is no conflict of interest.

References

  • ABS (Australian Bureau of Statistics) (2015) General Social Survey: summary results, Australia, 2014. Cat. No. 4159.0, https://www.abs.gov.au/statistics/people/people-and-communities/general-social-survey-summary-results-australia/2014.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2017) Census of Population and Housing: Understanding the census and census data, Australia, 2016. Cat. No. 2900.0, https://www.abs.gov.au/ausstats/abs@.nsf/mf/2900.0.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2018) Childhood education and care, Australia, June 2017. Cat. No. 4402.0, https://www.abs.gov.au/ausstats/abs@.nsf/PrimaryMainFeatures/4402.0?OpenDocument.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019a) Australian demographic statistics, September 2018. Cat. No. 3101.0, https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3101.0Sep%202018?OpenDocument.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019b) Information paper: Australian Census Longitudinal Dataset, methodology and quality assessment, 2006–2016. Cat. No. 2080.5, https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/by%20Subject/2080.5~2006-2016~Main%20Features~Data%20Linking%20Methodology~10.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019c) Microdata: Australian Census Longitudinal Dataset, ACLD. Cat. No. 2080.0, https://www.abs.gov.au/ausstats/abs@.nsf/mf/2080.0.

    • Search Google Scholar
    • Export Citation
  • Andreski, P., Li, G., Samancioglu, M.Z. and Schoeni, R. (2014) Estimates of annual consumption expenditures and its major components in the PSID in comparison to the CE, American Economic Review, 104(5): 1325. doi: 10.1257/aer.104.5.132

    • Search Google Scholar
    • Export Citation
  • Andreski, P., McGonagle, K. and Schoeni, R. (2009) An Analysis of the Quality of the Health Data in the Panel Study of Income Dynamics, PSID, Ann Arbor: University of Michigan, Technical Series paper 09-02, https://psidonline.isr.umich.edu/publications/Papers/tsp/2009-02_Quality_Health_Data_PSID_.pdf.

    • Search Google Scholar
    • Export Citation
  • Austin, P.C. (2009) Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research, Communications in Statistics – Simulation and Computation, 38(6): 122834. doi: 10.1080/03610910902859574

    • Search Google Scholar
    • Export Citation
  • Benzeval, M., Bollinger, C.R., Burton, J., Crossley, T.F. and Lynn, P. (2020) The Representativeness of Understanding Society, Understanding Society, Colchester: University of Essex, https://www.understandingsociety.ac.uk/research/publications/526039.

    • Search Google Scholar
    • Export Citation
  • Borkowska, M. (2019) Improving Population and Sub-group Coverage: Who Is Missing and What Can Be Done About It?, Understanding Society, Colchester: University of Essex, Working Paper no. 2019-15, https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2019-15.pdf.

    • Search Google Scholar
    • Export Citation
  • Buck, N. (2000) Using panel surveys to study migration and residential mobility, in D. Rose (ed) Researching Social and Economic Change: The Uses of Household Panel Studies, London: Routledge, pp 25072.

    • Search Google Scholar
    • Export Citation
  • Chipperfield, J., Brown, J.J. and Watson, N. (2017) The Australian Census Longitudinal Dataset: using record linkage to create a longitudinal sample form a series of cross-sections, Australian and New Zealand Journal of Statistics, 59(1): 116. doi: 10.1111/anzs.12177

    • Search Google Scholar
    • Export Citation
  • Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn, New York: Routledge.

  • Davis-Kean, P., Chambers, R.L., Davidson, L.L., Kleinert, C., Ren, Q. and Tang, S. (2017) Longitudinal Studies Strategic Review: 2017 Report to the Economic and Social Research Council, Swindon: Economic and Social Research Council, https://esrc.ukri.org/files/news-events-and-publications/publications/longitudinal-studies-strategic-review-2017/.

    • Search Google Scholar
    • Export Citation
  • Department of Social Services, and Melbourne Institute of Applied Economic and Social Research (2019) The Household, Income and Labour Dynamics in Australia (HILDA) Survey, General Release 18 (Waves 1–18), https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/IYBXHM.

    • Search Google Scholar
    • Export Citation
  • Duffy, D. and Sastry, N. (2012) An Assessment of the National Representativeness of Children in the 2007 Panel Study of Income Dynamics, PSID, Ann Arbor: University of Michigan, Technical Series paper 12-01, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.5684&rep=rep1&type=pdf.

    • Search Google Scholar
    • Export Citation
  • Duncan, G.J. and Hill, D.H. (1989) Assessing the quality of household panel data: the case of the panel study of income dynamics, Journal of Business & Economic Statistics, 7(4): 44152.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C.J. (2016) An effect size primer: a guide for clinicians and researchers, Professional Psychology: Research and Practice, 40(5): 5328. doi: 10.1037/a0015808

    • Search Google Scholar
    • Export Citation
  • Fitzgerald, J.M. (2011) Attrition in models of intergenerational links using the PSID with extensions to health and to sibling models, The BE Journal of Economic Analysis & Policy, 11(3): art 2, doi: 10.2202/1935-1682.2868.

    • Search Google Scholar
    • Export Citation
  • Fitzgerald, J., Gottschalk, P. and Moffitt, R. (1998) An analysis of sample attrition in panel data: the Michigan Panel Study of Income Dynamics, Journal of Human Resources, 33(2): 25199. doi: 10.2307/146433

    • Search Google Scholar
    • Export Citation
  • Fitzpatrick, D. (2017) Self-employment dynamics in Australia and the importance of state dependence, Economic Record, 93(S1): 14470. doi: 10.1111/1475-4932.12337

    • Search Google Scholar
    • Export Citation
  • Harding, S., Jackson Pulver, L., McDonald, P., Morrison, P., Trewin, D. and Voss, A. (2017) Report on the Quality of 2016 Census Data, Census Independent Assurance Panel to the Australian Statistician, Canberra: Australian Bureau of Statistics, https://www.abs.gov.au/websitedbs/d3310114.nsf/Home/Independent+Assurance+Panel.

    • Search Google Scholar
    • Export Citation
  • Jenkins, S.P. (2010) Comparisons of BHPS and HBAI distributions of net household income: 1994–2006, https://filestore.iser.essex.ac.uk/d/3a16c2/.

    • Search Google Scholar
    • Export Citation
  • Lipps, O. (2007) Attrition in the Swiss household panel, Methoden, Daten, Analysen (mda), 1(1): 4568.

  • Lynn, P. (ed) (2006) Quality Profile: British Household Panel Survey: Version 2.0 Waves 1–13, Institute for Social and Economic Research, Colchester: University of Essex, https://www.iser.essex.ac.uk/files/bhps/quality-profiles/BHPS-QP-01-03-06-v2.pdf.

    • Search Google Scholar
    • Export Citation
  • Nathan, G. (1999) A Review of Sample Attrition and Representativeness in Three Longitudinal Surveys, London: Office for National Statistics, https://openlibrary.org/works/OL6986074W/A_review_of_sample_attrition_and_representativeness_in_three_longitudinal_surveys.

    • Search Google Scholar
    • Export Citation
  • Petersen, J. and Rabe, B. (2013) Understanding Society: A Geographical Profile of Respondents, Understanding Society, Colchester: University of Essex, Working Paper no. 2013-01, https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2013-01.pdf.

    • Search Google Scholar
    • Export Citation
  • Polidano, C., Carter, A., Chan, M., Chigavazira, A., To, H., Holland, J., Nguyen, S., Vu, H. and Wilkins, R. (2020) The ATO Longitudinal Information Files (ALife): A New Resource for Retirement Policy Research, Canberra: Australian National University, https://taxpolicy.crawford.anu.edu.au/sites/default/files/publication/taxstudies_crawford_anu_edu_au/2020-04/complete_ttpi_submission_cpedit_ato_april_2020_0.pdf.

    • Search Google Scholar
    • Export Citation
  • Rose, D. (2000) Household panel studies: an overview, in D. Rose (ed) Researching Social and Economic Change: The Uses of Household Panel Studies, London: Routledge, pp 335.

    • Search Google Scholar
    • Export Citation
  • Rosenthal, J.A. (1996) Qualitative descriptors of strength of association and effect size, Journal of Social Service Research, 21(4): 3759. doi: 10.1300/J079v21n04_02

    • Search Google Scholar
    • Export Citation
  • Summerfield, M., Bright, S., Hahn, M., La, N., Macalalad, N., Watson, N., Wilkins, R. and Wooden, M. (2019) HILDA User Manual: Release 18, Melbourne: Melbourne Institute, https://melbourneinstitute.unimelb.edu.au/__data/assets/pdf_file/0008/3247289/HILDA-User-Manual-Release-18.0.pdf.

    • Search Google Scholar
    • Export Citation
  • Thomas, D., Frankenberg, E. and Smith, J.P. (2001) Lost but not forgotten: attrition and follow-up in the Indonesia Family Life Survey, Journal of Human Resources, 36(3): 55692. doi: 10.2307/3069630

    • Search Google Scholar
    • Export Citation
  • Uhrig, S.C.N. (2008) The Nature and Causes of Attrition in the British Household Panel Study, Institute for Social and Economic Research, Colchester: University of Essex, https://www.iser.essex.ac.uk/research/publications/working-papers/iser/2008-05.pdf.

    • Search Google Scholar
    • Export Citation
  • Watson, N. (2012) Longitudinal and Cross-sectional Weighting Methodology for the HILDA Survey, Melbourne: Melbourne Institute, HILDA Project Technical Paper, No. 2/12, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-technical-papers/htec212.pdf.

    • Search Google Scholar
    • Export Citation
  • Watson, N. (2020) Measuring geographic mobility: comparison of estimates from longitudinal and cross-sectional data, Survey Research Methods, 14(1): 118.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Lynn, P. (2021) Refreshment sampling for longitudinal surveys, in P. Lynn (ed) Advances in Longitudinal Survey Methodology, Chichester: Wiley, pp 125.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Wooden, M. (2009) Identifying factors affecting longitudinal survey response, in P. Lynn (ed) Methodology of Longitudinal Surveys, Chichester: Wiley, pp 15781.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Wooden, M. (2013) Adding a top-up sample to the household, income and labour dynamics in Australia survey, Australian Economic Review, 46(4): 48998. doi: 10.1111/1467-8462.12027

    • Search Google Scholar
    • Export Citation
  • Wilkins, R. and Sun, C. (2010) Assessing the Quality of the Expenditure Data Collected in the Self-Completion Questionnaire, Melbourne: Melbourne Institute, HILDA Project Discussion Paper, No. 1/10, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-discussion-papers/hdps110.pdf.

    • Search Google Scholar
    • Export Citation
  • Wolter, K.M. (2007) Introduction to Variance Estimation, 2nd edn, New York: Springer.

  • Wooden, M. (2009) Use of the Kessler Psychological Distress Scale in the HILDA Survey, Melbourne: Melbourne Institute, HILDA Project Discussion Paper, No. 2/09, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-discussion-papers/hdps209.pdf.

    • Search Google Scholar
    • Export Citation
  • Wooden, M., Freidin, S. and Watson, N. (2002) The Household, Income and Labour Dynamics in Australia (HILDA) Survey: wave 1, The Australian Economic Record, 35(3): 33948. doi: 10.1111/1467-8462.00252

    • Search Google Scholar
    • Export Citation
  • Zimmermann, E., Budowski, M., Gabadinho, A., Scherpenzeel, A., Tillmann, R. and Wernli, B. (2003) Swiss Household Panel 2004–2007, Proposal Submitted to the Swiss National Science Foundation, Working Paper 2_03, Neuchâtel: Swiss Household Panel, https://www.researchgate.net/publication/228554719_The_Swiss_Household_Panel.

    • Search Google Scholar
    • Export Citation

Appendix

Table A1 provides the question wording for the variables compared across data sources. The percentage of missing responses for each of these variables is shown in Table A2 where this is available for the different data sources.

Table A1:

Question wording for different data sources

VariableQuestion text
SexHILDA Survey: Interviewer record sex, confirming with respondent. Census: Is … male or female?
AgeHILDA Survey: What is …’s date of birth? [If date of birth not known, approximate age is recorded] Census: What is …’s date of birth? If date of birth not known, please give age.
StateHILDA Survey: From address of dwelling. Census: From address of dwelling.
Remoteness areaHILDA Survey: Coded from address of dwelling. Census: Coded from address of dwelling.
Whether parents were foreign-bornHILDA Survey: In which country was your father born? In which country was your mother born? Census: In which country was …’s father born? In which country was …’s mother born?
Indigenous statusHILDA Survey: Are you of Aboriginal or Torres Strait Islander origin? Census: Is … of Aboriginal or Torres Strait Islander origin?
Relationship in householdHILDA Survey: How are the different members of the household related to each other? <Person j> is …’s? Census: What is …’s relationship to <Person 1>? GSS: What is … relationship to <Person 1>? [For mother or father of Person 1] … to anyone else in the household, for example as a (husband/wife), partner or child?
Year at school attendingHILDA Survey: What year of school [is/was] … attending in 2016? CEaCS: What year or grade is … currently enrolled in at school?
Type of school attendedHILDA Survey: Looking at SHOWCARD Q22, which of these categories best describes the type of school … [is/was] attending in 2016? CEaCS: Is the school that … attends a government or state school? A catholic non-government school? An independent non-government school?
Marital statusHILDA Survey: Looking at SHOWCARD H4, which of these best describes your current marital status? And by ‘married’ we mean in a registered marriage? Census: What is …’s present marital status? GSS: What is … marital status?
Year of arrivalHILDA Survey: In which country were you born? In what year did you first come to Australia to live for 6 months or more (even if you have spent time abroad since)? Census: In which country was … born? In what year did … first arrive in Australia to live here for one year or more? GSS: In which country … born? In which year did … arrive in Australia (for one year or more)?
Highest school level completedHILDA Survey: Looking at SHOWCARD 1, what was the highest year of school you completed? Census: Has … completed any educational qualification (including a trade certificate)? What was the highest year of primary or secondary school … has completed? GSS: Have you completed a trade certificate, diploma, degree or any other educational qualification? What was the highest year of primary or secondary school you completed?
Post-school educationHILDA Survey: Looking at SHOWCARD 4, what qualifications have you completed? Census: What is the level of the highest qualification … has completed? GSS: What is the level of the highest qualification that you have completed?
Employment statusHILDA Survey: Derived variable from 11 questions to determine technical classification of employment status. Census: Last week, did … have a job of any kind? Did … actively look for work at any time in the last four weeks? If … had found a job, could … have started work last week? GSS: Derived variable from at 11 questions to determine technical classification of employment status.
Hours workedHILDA Survey: Including any paid or unpaid overtime, how many hours per week do you usually work in all your jobs? Census: Last week, how many hours did … work in all jobs? GSS: How many hours do you usually work each week in …?
IncomeHILDA Survey: Derived from 28 income questions. Census: What is the total of all income … usually receives?
Main source of incomeHILDA Survey: Source with highest amount reported. GSS: What is your main source of income?
Change in remoteness area in 10 yearsHILDA Survey: Derived from remoteness area of usual residence collected 10 years apart. ACLD: Derived from remoteness area of usual residence on three matched census records.
Moved interstate in 10 yearsHILDA Survey: Derived from state of usual residence collected 10 years apart. ACLD: Derived from state of usual residence on three matched census records.
Move out of homeHILDA Survey: Derived from whether living with at least one parent or not (based relationships in household). ACLD: Derived from whether living with at least one parent or not (based relationships in household).
Moved in last 15 yearsHILDA Survey: Derived from 14 yearly moved flags which compare the address of usual residence from one year to the next. ACLD: Derived from usual residence five years ago on three matched census records.
Moved in last 5 yearsHILDA Survey: Derived from 14 yearly moved flags which compare the address of usual residence from one year to the next. Census: Where did … usually live five years ago (at 9 August 2011)? GSS: How long have you lived in this dwelling?
Table A2:

Percentage of missing responses

HILDA SurveyCensusACLDGSSCEaCS
For people aged 0–14 or 15–29 in 2016
 Sex0.00.0
 Age0.00.0
 State0.00.0
 Remoteness area0.00.0
 Whether parents were foreign-bornC: 16.0 A: 0.8C: 5.3 A: 7.0
 Indigenous statusC: 1.6 A: 0.1C: 5.2 A: 6.0
 Relationship in householdC: 0.0 A: 0.0C: 4.1 A: 4.3n.a.
For people aged 0–12 years in 2016
 Year at school attending0.5n.a.
 Type of school attended0.4n.a.
For people aged 15–29 years in 2016
 Marital status0.00.0n.a.
 Year of arrival0.07.5n.a.
 Highest school level completed0.27.6n.a.
 Post-school education0.09.0n.a.
 Employment status0.16.2n.a.
 Hours worked0.10.7n.a.
 Income9.58.8
 Main source of income10.00.0
 Change in remoteness area in 10 years0.00.0
 Moved interstate in 10 years0.00.0
 Move out of home0.00.0
 Moved in last 15 years0.01.6
For people aged 5–29 years in 2016
 Moved in last 5 years0.06.42.6

Notes: C=Youngest cohort of children aged 0–14, A=Oldest cohort of adults aged 15–29.

n.a. = not available. For most variables, the tabulations in TableBuilder for the GSS and the CEaCS exclude missing values, so it is not possible to provide the percentage of cases with missing values.

1 = Missing income included in ‘undefined’ category together with no income and negative income in comparison of estimates between HILDA Survey estimate and ABS source.

  • View in gallery

    Proportion of wave 1 sample members interviewed over 18 years, by five-year age cohorts

  • View in gallery

    Difference between HILDA Survey and Census estimates of sex, age, state, remoteness area, whether parents are foreign-born, and Indigenous status, by 15-year age group

  • View in gallery

    Difference between HILDA Survey estimates and Census and GSS estimates of relationship in household, marital status and year of arrival, persons aged 15–29

  • View in gallery

    Difference between HILDA Survey estimates and Census, GSS and CEaCS estimates of year of school and school type (for children aged 0–12), and highest year of school completed and highest post-school qualification completed (for people aged 15–29)

  • View in gallery

    Difference between HILDA Survey estimates and Census and GSS estimates of employment status, hours worked, of income and principal source of income, persons aged 15–29

  • View in gallery

    Difference between HILDA Survey estimates and ACLD and GSS estimates of moving into cities, interstate, and out of home in last 10 years (2006 to 2016), and moving in the last 5 years (2011 to 2016) and the last 15 years (2001 to 2016), by 5-year age group

  • ABS (Australian Bureau of Statistics) (2015) General Social Survey: summary results, Australia, 2014. Cat. No. 4159.0, https://www.abs.gov.au/statistics/people/people-and-communities/general-social-survey-summary-results-australia/2014.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2017) Census of Population and Housing: Understanding the census and census data, Australia, 2016. Cat. No. 2900.0, https://www.abs.gov.au/ausstats/abs@.nsf/mf/2900.0.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2018) Childhood education and care, Australia, June 2017. Cat. No. 4402.0, https://www.abs.gov.au/ausstats/abs@.nsf/PrimaryMainFeatures/4402.0?OpenDocument.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019a) Australian demographic statistics, September 2018. Cat. No. 3101.0, https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3101.0Sep%202018?OpenDocument.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019b) Information paper: Australian Census Longitudinal Dataset, methodology and quality assessment, 2006–2016. Cat. No. 2080.5, https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/by%20Subject/2080.5~2006-2016~Main%20Features~Data%20Linking%20Methodology~10.

    • Search Google Scholar
    • Export Citation
  • ABS (Australian Bureau of Statistics) (2019c) Microdata: Australian Census Longitudinal Dataset, ACLD. Cat. No. 2080.0, https://www.abs.gov.au/ausstats/abs@.nsf/mf/2080.0.

    • Search Google Scholar
    • Export Citation
  • Andreski, P., Li, G., Samancioglu, M.Z. and Schoeni, R. (2014) Estimates of annual consumption expenditures and its major components in the PSID in comparison to the CE, American Economic Review, 104(5): 1325. doi: 10.1257/aer.104.5.132

    • Search Google Scholar
    • Export Citation
  • Andreski, P., McGonagle, K. and Schoeni, R. (2009) An Analysis of the Quality of the Health Data in the Panel Study of Income Dynamics, PSID, Ann Arbor: University of Michigan, Technical Series paper 09-02, https://psidonline.isr.umich.edu/publications/Papers/tsp/2009-02_Quality_Health_Data_PSID_.pdf.

    • Search Google Scholar
    • Export Citation
  • Austin, P.C. (2009) Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research, Communications in Statistics – Simulation and Computation, 38(6): 122834. doi: 10.1080/03610910902859574

    • Search Google Scholar
    • Export Citation
  • Benzeval, M., Bollinger, C.R., Burton, J., Crossley, T.F. and Lynn, P. (2020) The Representativeness of Understanding Society, Understanding Society, Colchester: University of Essex, https://www.understandingsociety.ac.uk/research/publications/526039.

    • Search Google Scholar
    • Export Citation
  • Borkowska, M. (2019) Improving Population and Sub-group Coverage: Who Is Missing and What Can Be Done About It?, Understanding Society, Colchester: University of Essex, Working Paper no. 2019-15, https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2019-15.pdf.

    • Search Google Scholar
    • Export Citation
  • Buck, N. (2000) Using panel surveys to study migration and residential mobility, in D. Rose (ed) Researching Social and Economic Change: The Uses of Household Panel Studies, London: Routledge, pp 25072.

    • Search Google Scholar
    • Export Citation
  • Chipperfield, J., Brown, J.J. and Watson, N. (2017) The Australian Census Longitudinal Dataset: using record linkage to create a longitudinal sample form a series of cross-sections, Australian and New Zealand Journal of Statistics, 59(1): 116. doi: 10.1111/anzs.12177

    • Search Google Scholar
    • Export Citation
  • Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn, New York: Routledge.

  • Davis-Kean, P., Chambers, R.L., Davidson, L.L., Kleinert, C., Ren, Q. and Tang, S. (2017) Longitudinal Studies Strategic Review: 2017 Report to the Economic and Social Research Council, Swindon: Economic and Social Research Council, https://esrc.ukri.org/files/news-events-and-publications/publications/longitudinal-studies-strategic-review-2017/.

    • Search Google Scholar
    • Export Citation
  • Department of Social Services, and Melbourne Institute of Applied Economic and Social Research (2019) The Household, Income and Labour Dynamics in Australia (HILDA) Survey, General Release 18 (Waves 1–18), https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/IYBXHM.

    • Search Google Scholar
    • Export Citation
  • Duffy, D. and Sastry, N. (2012) An Assessment of the National Representativeness of Children in the 2007 Panel Study of Income Dynamics, PSID, Ann Arbor: University of Michigan, Technical Series paper 12-01, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.5684&rep=rep1&type=pdf.

    • Search Google Scholar
    • Export Citation
  • Duncan, G.J. and Hill, D.H. (1989) Assessing the quality of household panel data: the case of the panel study of income dynamics, Journal of Business & Economic Statistics, 7(4): 44152.

    • Search Google Scholar
    • Export Citation
  • Ferguson, C.J. (2016) An effect size primer: a guide for clinicians and researchers, Professional Psychology: Research and Practice, 40(5): 5328. doi: 10.1037/a0015808

    • Search Google Scholar
    • Export Citation
  • Fitzgerald, J.M. (2011) Attrition in models of intergenerational links using the PSID with extensions to health and to sibling models, The BE Journal of Economic Analysis & Policy, 11(3): art 2, doi: 10.2202/1935-1682.2868.

    • Search Google Scholar
    • Export Citation
  • Fitzgerald, J., Gottschalk, P. and Moffitt, R. (1998) An analysis of sample attrition in panel data: the Michigan Panel Study of Income Dynamics, Journal of Human Resources, 33(2): 25199. doi: 10.2307/146433

    • Search Google Scholar
    • Export Citation
  • Fitzpatrick, D. (2017) Self-employment dynamics in Australia and the importance of state dependence, Economic Record, 93(S1): 14470. doi: 10.1111/1475-4932.12337

    • Search Google Scholar
    • Export Citation
  • Harding, S., Jackson Pulver, L., McDonald, P., Morrison, P., Trewin, D. and Voss, A. (2017) Report on the Quality of 2016 Census Data, Census Independent Assurance Panel to the Australian Statistician, Canberra: Australian Bureau of Statistics, https://www.abs.gov.au/websitedbs/d3310114.nsf/Home/Independent+Assurance+Panel.

    • Search Google Scholar
    • Export Citation
  • Jenkins, S.P. (2010) Comparisons of BHPS and HBAI distributions of net household income: 1994–2006, https://filestore.iser.essex.ac.uk/d/3a16c2/.

    • Search Google Scholar
    • Export Citation
  • Lipps, O. (2007) Attrition in the Swiss household panel, Methoden, Daten, Analysen (mda), 1(1): 4568.

  • Lynn, P. (ed) (2006) Quality Profile: British Household Panel Survey: Version 2.0 Waves 1–13, Institute for Social and Economic Research, Colchester: University of Essex, https://www.iser.essex.ac.uk/files/bhps/quality-profiles/BHPS-QP-01-03-06-v2.pdf.

    • Search Google Scholar
    • Export Citation
  • Nathan, G. (1999) A Review of Sample Attrition and Representativeness in Three Longitudinal Surveys, London: Office for National Statistics, https://openlibrary.org/works/OL6986074W/A_review_of_sample_attrition_and_representativeness_in_three_longitudinal_surveys.

    • Search Google Scholar
    • Export Citation
  • Petersen, J. and Rabe, B. (2013) Understanding Society: A Geographical Profile of Respondents, Understanding Society, Colchester: University of Essex, Working Paper no. 2013-01, https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2013-01.pdf.

    • Search Google Scholar
    • Export Citation
  • Polidano, C., Carter, A., Chan, M., Chigavazira, A., To, H., Holland, J., Nguyen, S., Vu, H. and Wilkins, R. (2020) The ATO Longitudinal Information Files (ALife): A New Resource for Retirement Policy Research, Canberra: Australian National University, https://taxpolicy.crawford.anu.edu.au/sites/default/files/publication/taxstudies_crawford_anu_edu_au/2020-04/complete_ttpi_submission_cpedit_ato_april_2020_0.pdf.

    • Search Google Scholar
    • Export Citation
  • Rose, D. (2000) Household panel studies: an overview, in D. Rose (ed) Researching Social and Economic Change: The Uses of Household Panel Studies, London: Routledge, pp 335.

    • Search Google Scholar
    • Export Citation
  • Rosenthal, J.A. (1996) Qualitative descriptors of strength of association and effect size, Journal of Social Service Research, 21(4): 3759. doi: 10.1300/J079v21n04_02

    • Search Google Scholar
    • Export Citation
  • Summerfield, M., Bright, S., Hahn, M., La, N., Macalalad, N., Watson, N., Wilkins, R. and Wooden, M. (2019) HILDA User Manual: Release 18, Melbourne: Melbourne Institute, https://melbourneinstitute.unimelb.edu.au/__data/assets/pdf_file/0008/3247289/HILDA-User-Manual-Release-18.0.pdf.

    • Search Google Scholar
    • Export Citation
  • Thomas, D., Frankenberg, E. and Smith, J.P. (2001) Lost but not forgotten: attrition and follow-up in the Indonesia Family Life Survey, Journal of Human Resources, 36(3): 55692. doi: 10.2307/3069630

    • Search Google Scholar
    • Export Citation
  • Uhrig, S.C.N. (2008) The Nature and Causes of Attrition in the British Household Panel Study, Institute for Social and Economic Research, Colchester: University of Essex, https://www.iser.essex.ac.uk/research/publications/working-papers/iser/2008-05.pdf.

    • Search Google Scholar
    • Export Citation
  • Watson, N. (2012) Longitudinal and Cross-sectional Weighting Methodology for the HILDA Survey, Melbourne: Melbourne Institute, HILDA Project Technical Paper, No. 2/12, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-technical-papers/htec212.pdf.

    • Search Google Scholar
    • Export Citation
  • Watson, N. (2020) Measuring geographic mobility: comparison of estimates from longitudinal and cross-sectional data, Survey Research Methods, 14(1): 118.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Lynn, P. (2021) Refreshment sampling for longitudinal surveys, in P. Lynn (ed) Advances in Longitudinal Survey Methodology, Chichester: Wiley, pp 125.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Wooden, M. (2009) Identifying factors affecting longitudinal survey response, in P. Lynn (ed) Methodology of Longitudinal Surveys, Chichester: Wiley, pp 15781.

    • Search Google Scholar
    • Export Citation
  • Watson, N. and Wooden, M. (2013) Adding a top-up sample to the household, income and labour dynamics in Australia survey, Australian Economic Review, 46(4): 48998. doi: 10.1111/1467-8462.12027

    • Search Google Scholar
    • Export Citation
  • Wilkins, R. and Sun, C. (2010) Assessing the Quality of the Expenditure Data Collected in the Self-Completion Questionnaire, Melbourne: Melbourne Institute, HILDA Project Discussion Paper, No. 1/10, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-discussion-papers/hdps110.pdf.

    • Search Google Scholar
    • Export Citation
  • Wolter, K.M. (2007) Introduction to Variance Estimation, 2nd edn, New York: Springer.

  • Wooden, M. (2009) Use of the Kessler Psychological Distress Scale in the HILDA Survey, Melbourne: Melbourne Institute, HILDA Project Discussion Paper, No. 2/09, https://melbourneinstitute.unimelb.edu.au/assets/documents/hilda-bibliography/hilda-discussion-papers/hdps209.pdf.

    • Search Google Scholar
    • Export Citation
  • Wooden, M., Freidin, S. and Watson, N. (2002) The Household, Income and Labour Dynamics in Australia (HILDA) Survey: wave 1, The Australian Economic Record, 35(3): 33948. doi: 10.1111/1467-8462.00252

    • Search Google Scholar
    • Export Citation
  • Zimmermann, E., Budowski, M., Gabadinho, A., Scherpenzeel, A., Tillmann, R. and Wernli, B. (2003) Swiss Household Panel 2004–2007, Proposal Submitted to the Swiss National Science Foundation, Working Paper 2_03, Neuchâtel: Swiss Household Panel, https://www.researchgate.net/publication/228554719_The_Swiss_Household_Panel.

    • Search Google Scholar
    • Export Citation
  • 1 University of Melbourne, , Australia
  • | 2 University of Queensland, , Australia

Content Metrics

May 2022 onwards Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 8 8 8
PDF Downloads 5 5 5

Altmetrics

Dimensions