Contents
-
'Family Ethnicity': Is it a useful concept and, if so, can we develop meaningful measures?
-
The Epidemiology of Serious Non-Fatal Work-Related Traumatic Injury - A Demonstration Project
-
Methods for Creating Synthetic Data
-
Towards High Quality Administrative Data: A Case Study - New Zealand Police
-
New Approaches to Small Area Estimation of Unemployment
-
Small-Domain Estimation of Māori Expenditure Patterns
The contents of these files are in Adobe Acrobat Reader format. If you do not have the Adobe Acrobat Reader you may download the reader to view or print them.
1.
'Family Ethnicity': Is it a useful concept and, if so, can we develop meaningful measures? (PDF, 696KB)
Paul Callister, Robert Didham, Jamie Newell, & Deborah Potter
Abstract: The concept of an ethnic family is commonly used in everyday conversation and in the media. Given both the interest in family ethnicity, but also the challenges in measuring it, Statistics New Zealand recognised that further exploration of the concept would be useful. The project builds upon, or complements, both reviews of official statistics being undertaken by Statistics New Zealand as well as research undertaken collaboratively with Statistics New Zealand.
This research was undertaken in two stages. Stage 1 of the research focused only on 2001 data and just considered couples with dependent children. Stage 2 brought in some 2006 data to illustrate how the methods we explored would work using more up-to-date information. The second part of the investigation incorporated feedback from the SPRE conference. In this second stage we held a meeting with end users to discuss our fuller research results. Overall, the analysis demonstrates there are strengths and weakness of each classification system. However, if we were to be seeking some ‘gold standard’ for data collection , the diversity of ethnic affiliations within New Zealand families leads us to suggest that, overall, measures of family ethnicity which incorporate the responses of all individuals are likely to be the most suitable for informing research and policy.
Keywords: Family ethnicity; concept; measures; classifying
2.
The Epidemiology of Serious Non-Fatal Work-Related Traumatic Injury - A Demonstration Project (PDF, 840KB)
Colin Cryer, Ari Samaranayaka, Daniel Russell, Gabrielle Davie, & John Langley
Abstract: Many government agencies are interested in reliable statistics to describe the size and nature of, and trends in, the work-related traumatic injury problem. There have been problems identifying and describing work-related traumatic injury in New Zealand on an ongoing basis. An aim of this project was to present an accurate picture of the epidemiology of serious (threat-to-life) work-related injuries using a linked data set. This work also permitted an investigation of the accuracy of key ACC data. The outcomes include the first accurate epidemiological description of serious threat-to-life workrelated traumatic injury in New Zealand, as produced from the linked dataset; an assessment of the suitability of ACC data on its own for presenting the epidemiology of serious non-fatal work-related traumatic injury; and a presentation of the epidemiology of serious disabling work-related traumatic injury based on ACC data alone.
Keywords: ACC; NZHIS; serious (threat to life) work-related traumatic injury; linked-data; New Zealand; epidemiology
3.
Methods for Creating Synthetic Data (PDF, 1165KB)
Patrick Graham, Jim Young and Richard Penny
Abstract: Synthetic data has been proposed as a method for allowing official statistics agencies to honour confidentiality commitments while facilitating researcher access to data. Because fully synthetic data does not contain records of real individuals, confidentiality concerns are much reduced compared to the release of the data actually collected. Synthetic datasets are draws from the posterior predictive distribution of responses for a new sample, given the data from the observed study sample. Underpinning the generation of synthetic data is a model for the distribution of the observable data and specification of this model is therefore a critical step in creating synthetic databases. In an earlier report we proposed hierarchical Bayesian modelling as a framework for generating (imputing) synthetic data because hierarchical Bayes models provide some protection against model misspecification. In this chapter I evaluate the use of hierarchical Bayes imputation models for creating synthetic data, for the case of categorical data, by means of a simulation study.
When the prior imputation model is correctly specified both hierarchical Bayes and conventional Poisson log-linear imputation models lead to synthetic data based confidence intervals with nominal to conservative coverage at the cost of slightly inflated interval length compared to intervals based on real (i.e. non-synthetic) data. These results hold whether or not the correctly specified prior imputation model agrees with the analysis model. However, over-coverage appears substantial when a correctly specified prior imputation model is simpler than the analysis model. When the prior imputation model is misspecified, hierarchical Bayes imputation models out-perform conventional Poisson log-linear imputation models in terms of confidence interval coverage and bias of point estimates, by a substantial margin for some estimands. However, the difference between the performance of synthetic data derived from hierarchical Bayes and conventional imputation models decreases as the prior imputation model structure approaches the analysis model structure. An imputer can never be sure that a given imputation model is correctly specified and cannot expect to foresee all analyses that an external analyst may attempt on a given dataset. Consequently, hierarchical Bayesian modelling provides an appealing framework for imputation modelling, because it provides some robustness to model misspecification and to discrepancies between the imputation and analysis models.
Keywords: Bayesian model; multiply imputed synthetic data; synthetic tables
4.
Towards High Quality Administrative Data: A Case Study - New Zealand Police (PDF, 858KB)
Gavin Knight
Abstract: Much had been written about principles and standards for designing surveys to ensure good quality statistical information results. Less has been written about standards for administrative data. Whereas it is thought that many of the same principles may apply, the terminology is often different and contextual differences exist that may require changes in the form, if not the substance design standards. For example, a survey questionnaire is usually designed to be completed by a sampled respondent just once, whereas a form used to capture data for an operational IT system may be filled out many times a day by the same person in order to record information required by that person to perform their job. Efficiency and relevance may therefore have different implications for the design of such forms. This paper documents a project undertaken as a case-study on New Zealand Police that sought to identify principles to assist with designing good quality administrative data. Recommendations are made, based on these principles.
Keywords: Administrative data; quality; form; New Zealand Police
5.
New Approaches to Small Area Estimation of Unemployment (PDF, 3MB)
Stephen Haslett, Alasdair Noble, & Felibel Zapala
Abstract: Small area or small domain estimation is a technique which uses a statistical model to improve accuracy in comparison with survey-based subpopulation estimates derived directly from survey data. The focus of this research study is small area estimation of unemployment as defined by the International Labour Organisation (ILO). The small area models used are essentially of two types, one involving modelling counts which uses the connection between ILO and register-based unemployment (in the form of numbers receiving an unemployment benefit), and the other involving both these counts and the relative risk of being unemployed (the risk being measured via time series projections of the New Zealand census population). All models are first fitted at a much finer level than the required small area estimates, and then the relevant estimates are aggregated. Models developed and used are based on Markov Chain Monte Carlo (MCMC) methods. The prior information used for priors is very accurately known from historical data.
The project has developed considerably improved methods for producing accurate estimates of ILO unemployment at Territorial Local Authority level which is a finer geographic level than are currently able to be published from HLFS by Statistics New Zealand. Small area estimates have been produced for each quarter, last quarter 2001 to first quarter 2006.
The project has demonstrated the applicability of the generalized linear model (GLM) approach to small area estimation of ILO unemployment for Statistics New Zealand HLFS survey data, and should consequently provide much more reliable information about the impact of changing population characteristics on unemployment in New Zealand. Small area methods, by providing statistics at this finer level through analytical methods rather than additional data collection, can also reduce respondent burden in comparison with using survey data alone and contribute to the range and increased use of Official Statistics.
Keywords: Small area estimation; small domain estimation; unemployment
6.
Small-Domain Estimation of Māori Expenditure Patterns (PDF, 1MB)
Stephen Haslett, Dr Geoffrey Jones, & Jamas Enright
Abstract: This report develops a range of statistical models based on the methods of Elbers, Lanjouw and Lanjouw (2003) that provide more accurate estimation of New Zealand expenditure patterns by ethnicity and region, with a particular focus on Maori expenditure patterns. The motivation for this study is that direct estimates using only the survey data from the Household Expenditure Survey (HES) are not sufficiently accurate for publication.
The small-area estimation technique used in this study for combining survey data from HES 2001 with auxiliary data derived from the 2001 census means that for some categories of expenditure more accurate model based estimates can be derived. For these CPI categories, and for additional CPI categories in particular regions this allows the production of regional estimates of Maori expenditures with greater precision than those derived from the survey data alone.
In this study, an existing ELL methodology for small-area estimation has been adapted and extended to adjust for known methodological complications and to allow multivariate modelling of various categories of individual-level expenditure simultaneously. This multivariate extension has the advantage that, on the log expenditure scale at least, predictions for each category of Consumer Price Index (CPI) expenditure can be fitted separately, but nevertheless retain the necessary correlation structure at individual level through careful use of a multivariate bootstrap.
Although the focus of the study is on Māori expenditure patterns, the small-domain technique (because it borrows strength across ethnicities and regions) also produces expenditure pattern information for other ethnic groups. This has allowed New Zealand expenditure patterns both overall and within particular one-digit CPI categories to be estimated by ethnicity and region.
Keywords: Small-domain estimation; small area estimation, Māori; expenditure patterns