Data Processing & Data Analysis
Data Processing Requirements
- A nutrient composition database is required to translate foods, beverages, and supplements reported into nutrient intakes (Learn More about Food Composition Databases for Food Frequency Questionnaires and Screeners). A database, such as the Food Patterns Equivalents Database (FPED), is required to translate foods and beverages reported into amounts of guidance-based food groups (e.g., dark-green vegetables, whole grains, and added sugars) (For more information, read a factsheet on FPED products and associated data files or its application to dietary analysis.).
- Software specific to each FFQ is required to derive nutrient and food group intakes from the respondent-reported information (Learn More about Software for Food Frequency Questionnaires and Screeners).
- Many FFQs include algorithms and software to automate the calculation of estimated daily intakes of nutrients and/or food groups.
- Daily nutrient or FPED estimates should be examined to identify [glossary term:] outliers (Learn More about Outliers). Although outliers may indicate coding or processing errors, which should be identified and fixed if possible, it is more likely they indicate reporting errors, which may be addressed in the analysis.
Data Analysis Considerations
For more details on the following issues when considering whether to use an FFQ to answer a particular research question, see Choosing an Approach for Dietary Assessment.
General Considerations
- Strategies for handling missing information in the frequency and/or portion size questions should be applied consistently.
- Missing frequency information can be handled with several possible [glossary term:] imputation methods, including assumption of zero intake, single imputation with the population's [glossary term:] mode or [glossary term:] median value, and model-based imputation [15-16].
- Missing portion size information generally is handled by [glossary term:] imputation of standard [glossary term:] mean or median values specific to an FFQ.
- Strategies have been developed for the NCI Diet History Questionnaire and are implemented in its software.
- Nutrition data often have [glossary term:] skewed distributions (truncated at zero and skewed toward high intakes). Therefore, [glossary term:] transformation of nutrient data is usually required for statistical testing and modeling. Log transformation is often used, but other transformations that may better approximate [glossary term:] normal distributions should be considered.
- Beyond correcting obvious coding or processing errors, [glossary term:] extreme values, or outliers, should be examined and accounted for. Outliers can be handled in various ways in analyses.
- Some analyses use energy-adjusted dietary intake variables (e.g., percentage energy from fat). [glossary term:] Energy adjustment is beneficial when [glossary term:] underreporting of energy is expected, or when the density rather than absolute amount of a food/nutrient is of biological interest. Because of [glossary term:] systematic error in FFQs, energy adjustment is particularly useful.
- Depending on the study design, [glossary term:] nuisance effects, such as [glossary term:] mode of administration and [glossary term:] sequence, can be taken into consideration through modeling.
- If the FFQ has been administered more than once, [glossary term:] within-person random error can be corrected by statistical modeling.
Guidance for Specific Research Objectives
- If your research objective is to solely estimate the mean intakes of a group, and you have conducted an [glossary term:] internal calibration sub-study using a less biased instrument, statistical adjustment can be performed to reduce [glossary term:] bias in data from FFQs. Alternatively, data from an external source (called an [glossary term:] external calibration study) can be used. Energy adjustment also should be applied to reduce bias (Learn More about Calibration).
- If your research objective is to estimate the [glossary term:] usual dietary intake distributions for a group (for example, for the purpose of examining percentiles or estimating the proportion above or below some threshold), distributions estimated from an FFQ (and a screener) are narrower than true distributions (Learn More about Usual Dietary Intake). Thus, prevalence estimates in the tails of the distribution are biased. However, procedures have been developed for using information from an internal calibration sub-study in which 24HR, food records, or [glossary term:] recovery biomarkers are administered that may correct for this bias. Alternatively, data from an external source (called an external calibration study) can be used. More research is needed to test these new methods.
- If your research objective is to analyze the [glossary term:] association between diet as an independent variable and another variable (e.g., diet at baseline and later onset of cancer), analysis of energy-adjusted values is thought to mitigate some of the bias inherent in an FFQ (Learn More about Energy Adjustment).
- If your research objective is to analyze the association of an independent variable (e.g., socioeconomic status) and diet as the dependent variable, variables known to affect quality of report (e.g., body mass index) should be included as covariates in analyses.
- If your research objective is to analyze changes in diet as a result of an intervention (e.g., to evaluate the effectiveness of an educational program to encourage fruit and vegetable intake), objective data alone (e.g., [glossary term:] biomarker) may yield results with the least bias.
If you have conducted an internal calibration sub-study, resulting [glossary term:] regression calibration equations can be applied to FFQ estimates and used in the analyses, which may lead to greater [glossary term:] precision in the estimates of the associations (Learn More about Regression Calibration). Alternatively, data from an external source (called an external calibration study) can be used.
If you have conducted an internal calibration sub-study using less biased measures, such as 24-hour recalls or recovery biomarkers, statistical techniques can be used to improve FFQ estimates. Alternatively, data from external calibration study can be used.
If less biased data are available from an internal calibration sub-study, regression calibration equations should be estimated for each treatment group, and if relevant, each time period and applied to an FFQ estimates. This calibration would yield less bias in the means. However, differential response bias still may be problematic. If social desirability questions have also been collected, the resulting score may be useful to at least partially control for [glossary term:] differential response bias (Learn More about Social Desirability).