Official Statistics Research funded 10 projects in the 2005/06 year.
Impacts of global recoding to preserve confidentiality | Optimising confidentiality in administrative data | Investigation of the noise method for confidentiality protection of outputs | Improved methodology for constructing, evaluating and analysing synthetic datasets | Dataset generation | Small-Domain Estimation for Māori Expenditure Patterns | New Approaches to Small Area Estimation for Unemployment Data | Improvements in the validity of ICD based Injury Severity Scores | Utilisation of Official Statistics in the Auckland Region | Investigation of User Requirements to Improve Wage Measures
OS Research will be inviting the public to scheduled Officials Statistics System Seminars throughout 2006 and 2007, as well as coordinating expert workshops. In the workshops, researchers will come together with government departments and help them apply the research findings to their data. It is anticipated that all of the research and reports that come from the projects will be available after successful review in the Official Statistics Research Series.
For any seminar or expert workshop inquiries, please contact OS Research at: osresearch@stats.govt.nz.
1. Impacts of global recoding to preserve confidentiality on information loss and statistical validity of subsequent data analysis
This project is a first stage in a large research agenda. Previous research has shown the necessity of having table cells with average cell numbers of the order of 5 or more to avoid occurrence of unique or sensitive cells. For many data sets this can assisted by aggregating categories (a global recoding). Using such recodes is the universal practise in statistical agencies, but little attention has been paid to the quantitative magnitude of their effects and to devising recodes which will have the least impact on observable associations within the data. Basic theoretical conditions for collapsing variables have been known for thirty years. It is essential to improve our empirical understanding of their effects since in practise the conditions are never met perfectly. The objective of the project is to provide guidance in constructing recodes when they are necessary to preserve confidentiality or for other reasons.
Team members:
- Masterworks Software, Ltd
- Statistics Research Associates
- Carnegie-Mellon University, United States
- Statistics New Zealand.
Back to top
2. Optimising confidentiality in administrative data: comparison of methods and techniques
Based on real administrative data sets, Inland Revenue Department will apply and compare methods and techniques for their effect on data quality while providing required confidentiality protection. The candidate methods will insure we follow international best practice. The research will focus on administrative tabular data, and specifically on small-area and regional data sets. We will also suggest measures to assess resulting quality of data reported from the information loss perspective.
Team members:
- Inland Revenue Department
- Statistics New Zealand.
Back to top
3. Investigation of the noise method for confidentiality protection of outputs
This project seeks to use international practice to come up with a method for adding noise to microdata to ensure protection to aggregate results produced. Then experienced New Zealand researchers will be consulted to reach an acceptable level of modification that is also acceptable within Statistics New Zealand to be adequately protecting respondents values under the Statistics Act.
Team members:
- Treasury COVEC Cornell University (US) Statistics New Zealand.
Back to top
4. Improved methodology for constructing, evaluating and analysing synthetic datasets
Synthetic datasets provide a promising means for official statistical agencies and other data suppliers to release disaggregated survey and census information to external analysts, while respecting ethical and legal obligations regarding the confidentiality of respondents’ data. This project will investigate several methodological issues associated with the construction and analysis of synthetic datasets, particularly synthetic unit record files (SURF). While recent developments in the synthetic data literature emphasise model based, multiple imputation approaches to the construction of synthetic datasets, the dimensionality of typical survey datasets and the desirability of adhering closely to observed data, challenge conventional statistical modeling strategies. This project will investigate the feasibility of applying modern modeling approaches, within a multiple imputation framework, to the construction of synthetic datasets. The modeling approaches to be considered include hierarchical Bayes, Bayesian networks and Bayesian nonparametrics. We will also investigate refinements to the multiple imputation framework. A third contribution of the project will be to investigate the development of quantitative measures of the risk that an individual’s responses could actually be identified from a SURF. This will both assist the evaluation of the synthetic data methods developed within the project and provide measures for communicating disclosure risks to users and respondents. The research will proceed via case-studies, involving creation of synthetic versions of survey datasets and subsequent analysis from the perspective of an external analyst, as well as simulation studies to evaluate inferential validity. The project will significantly advance understanding of synthetic data methodology and the potential for synthetic data methods to contribute to improved access to the official statistical system. The results of the project will provide guidance to data-suppliers concerning the construction of synthetic datasets and guidance to analysts of synthetic data concerning the validity of analytical strategies. Software developed in the course of the project for creation and analysis of synthetic datasets will be made freely available to interested parties.
Team members:
- Christchurch School of Medicine and Health Sciences,
- University of Otago University Hospital Basel, Basel, Switzerland
- Statistics New Zealand.
Back to top
5. Dataset generation: creating synthetic datasets to mimic confidential data sources
This project seeks to extend current work on how synthetic data sets can be generated by application of advanced statistical modeling to mimic existing confidential data. Recent advances in statistical modeling will be combined with iterative proportional fitting to produce a routine method for generating synthetic datasets from any existing dataset or from detailed information about the dataset distributions. These synthetic datasets would mimic the real datasets in their marginal distributions but would consist entirely of generated unit record files, thereby minimising the confidentiality issues associated with accessing the real data. Such synthetic datasets could then be used to improve access to and use of data by allowing the dataset to be distributed to approved users beyond the datalab or other secure facility. This would enable the dataset to be used for training and testing purposes for those seeking access to the real data in the secure datalab facility.
Team members: University of Auckland NATSEM, University of Canberra Statistics New Zealand.
Back to top
6. Small-Domain Estimation for Māori Expenditure Patterns
Small-area or small-domain estimation for Māori of expenditure variables selected by the Māori Statistics Unit: This research is a continuation of an existing OSRDAC project that has established the feasibility of using small-area or small-domain (sae) techniques for producing more accurate expenditure statistics for Māori, without increasing survey sample sizes or over-sampling. The project methodology will use extensions of established sae techniques. The anticipated outcome is improved and publishable expenditure statistics for Māori at a finer level than is currently possible.
Team members:
- Massey University Statistics New Zealand.
Back to top
7. New Approaches to Small Area Estimation for Unemployment Data
Small area estimation for unemployment data. This research follows on from published research done at Statistics New Zealand by S. Haslett, C. Zingel and A. Green, who used SPREE (Structure Preserving Estimation) techniques to combine survey data (HLFS) and auxiliary data (MSD Work and Income data). These methods were able to improve the accuracy of estimates to levels for which the sample sizes in the survey data are otherwise too small. SPREE can be specified as a Generalised Linear Model, and recent New Zealand based research on unemployment has shown there are other useful small area models in this class. The proposed research would give further improvements to Haslett, Zingel and Green by extending the range of models researched at Statistics New Zealand and applying them to more finely grained data (using the broader Generalised Linear Model approach). The anticipated outcome is improved and publishable estimates of unemployment at a finer geographic level than is currently possible.
Team members:
- Massey University Ministry of Social Development Statistics New Zealand.
Back to top
8. Improvements in the validity of ICD based Injury Severity Scores (ICISS)
on which NZ Injury Prevention Strategy official statistical indicators are based
Many Government agencies are using statistics to measure their performance in the reduction of injury over time. Statistics NZ is also in the process of establishing a new programme of injury statistics. Research suggests that the indicators that most Government departments are currently using to monitor trends in non-fatal injury are potentially misleading because of their inability to remove service-provision and –access effects from the underlying trend data. Langley J, Stephenson S, Cryer C. Measuring road traffic safety performance: monitoring trends in nonfatal injury. Traffic Injury Prevention 2003: 4: 291-296 Valid measurement of injury severity is critical to producing valid indicators, as well as for the production of valid information from the analysis of injury data to inform policy and injury prevention practice. This project aims to investigate improvement in the measurement of injury severity using an enhancement of the International Classification of Disease (ICD)-based Injury Severity Score (ICISS). ICISS is a threat-to-life severity score that can be calculated from routinely collected data. This project will address the questions, can the predictive ability of ICISS be enhanced through the use of integrated hospitalisation and mortality data sources to calculate ICISS scores and does taking account of comorbidity improve the predictive ability of the ICISS scores? Logistic regression models using ICISS as the predictor variable and survival as the outcome variable will be used to assess the performance of the ICISS severity scores. The models will be assessed in terms of their discrimination and calibration. The main outcome of this project would be either the production of a more valid measurement of injury severity or the assurance that the currently used method of injury severity measurement is satisfactory. Answering the two aims of this research will enable the production of more trustworthy injury statistics for the purposes of measuring the impact of policy and practice in reducing injury in New Zealand.
Team members:
- Injury Prevention Research Unit, University of Otago Statistics New Zealand.
Back to top
9. Utilisation of Official Statistics in the Auckland Region
The conceptual approach adopted is a comparative organisational study of information and knowledge which would seek to trace the (spatial and temporal) pathways of official social and economic statistics from data-collection, processing and aggregation, into the different phases of utilisation in particular user organisations. Empirically, the study would investigate very carefully the conceptual and logistical frameworks within which various user communities in the Auckland region visualise and use official statistics and other related information sources. By relating their statistical practises to their goals and aspirations, the study would also derive an understanding of what their data needs are. The minimum design is to interview an adequate (but small) sample of people at several layers of each of 4-5 of the 7 TLAs in the Auckland region and also the ARC, with a subsequent research interest in rounding out the regional ‘statistical’ picture by later extending the project to DHBs, Auckland regional offices of Ministries, Commercial and Voluntary sector users.
Team members:
- Auckland University of Technology
- Statistics New Zealand
Back to top
10. Investigation of User Requirements to Improve Wage Measures
This project is intended to research and assess user requirements to improve understanding and utilisation of wage measures and where appropriate to propose a wider, enhanced series of wage measures for short and long term action. This project ultimately aims to improve the quality and effectiveness of wage measures to ensure user confidence and to guarantee the continued relevance of these outputs in policy and decision making.
Team members:
- Infometrics Consulting
- Reserve Bank of NZ
- Department of Labour
- Statistics New Zealand.