Data Processing & Data Analysis
Data Processing Requirements
- Each food and beverage reported on a 24HR requires coding and data processing. Many 24HR systems have automated coding.
- A nutrient composition database is required to translate foods, beverages, and supplements reported into nutrient intakes (Learn More about Food Composition Databases for 24-hour Dietary Recalls and Food Records). A database, such as the Food Patterns Equivalents Database (FPED), is required to translate foods and beverages reported into amounts of guidance-based food groups (e.g., dark-green vegetables, whole grains, and added sugars).
- Software is required to derive nutrient estimates for each food, beverage, and supplement (Learn More about Software for 24-hour Dietary Recalls and Food Records).
- Daily nutrient and/or FPED estimates should be examined to identify [glossary term:] outliers. Outliers sometimes indicate coding or processing errors that should be fixed.
Data Analysis Considerations
For greater detail on the following issues when considering whether to use a 24HR to answer a particular research question, see Choosing an Approach for Dietary Assessment.
General Considerations
- Nutrition data often have [glossary term:] skewed distributions (truncated at zero and skewed toward high intakes). Therefore, [glossary term:] transformation of nutrient data is usually required for statistical testing and modeling. Log transformation is often used, but other transformations that may better approximate [glossary term:] normal distributions should be considered.
- Beyond correcting obvious coding or processing errors, [glossary term:] extreme values, or outliers, should be examined and accounted for. Outliers can be handled in various ways in analyses (Learn More about Outliers).
- Some analyses use energy-adjusted dietary intake variables (e.g., percentage of energy from fat). [glossary term:] Energy adjustment is particularly useful when overall [glossary term:] underreporting of energy is anticipated, or when the density rather than absolute amount of a food/nutrient is of biological interest (Learn More about Energy Adjustment).
- Depending on the study design, [glossary term:] nuisance effects, such as day of the week for each intake day (Learn More about Day-of-Week Effect), season in which the 24HR is administered (Learn More about Season Effect), sequence, and [glossary term:] mode of administration, can be taken into consideration through modeling.
- Some surveys, particularly [glossary term:] cross-sectional surveys, include sampling weights that are meant to allow generalization to the population from which the sample was drawn. Sampling weights account for both sampling design and response rates. Thus, their use is recommended for analyses that estimate the [glossary term:] means of the population. Their use in analyses of relationships between diet and other factors is less settled among statistical experts, and, therefore, it may be advisable to compare the consistency of results between weighted and unweighted analyses.
- If [glossary term:] recovery biomarkers are collected in a subsample of participants (called an [glossary term:] internal calibration sub-study) at relevant time points, they can be used as [glossary term:] reference instruments in [glossary term:] calibration to correct for [glossary term:] measurement error (or bias) for some nutrients. Alternatively, data from an external source (called an [glossary term:] external calibration study) can be used (Learn More about Calibration).
Guidance for Specific Research Objectives
- If your objective is to estimate solely the mean intakes of a group, defining the mean for a proportion or [glossary term:] ratio appropriate to the particular objective requires further consideration. For example, data from multiple repeated administrations of 24HRs can be used to estimate average per person or population ratios and proportions (Learn More about Ratios and Proportions). In addition, for studies that have collected multiple 24HRs, it is unnecessary to adjust for within-person random error.
- If your objective is to estimate [glossary term:] usual dietary intake distributions for a group (e.g., to examine percentiles or to estimate the proportion above or below some threshold), clarification of whether the focus is on habitual intake over the long run or intake on a given day (i.e., acute intake) is required (Learn More about Usual Dietary Intake).
- If your focus is the distribution of habitual intake over the long run, statistical modeling must be conducted to account for [glossary term:] day-to-day variation, a source of [glossary term:] within-person random error in intake, using the repeat administrations collected from the sample or subsample.
- If your focus is intake on a given day for a group (i.e., acute intake), then distributions of dietary intake, including the mean, can be estimated without considering day-to-day variation.
Several methods have been developed to appropriately analyze 24HR data to estimate usual distributions of intake, including the [glossary term:] National Cancer Institute (NCI) method [16-18].
- If your research objective is to analyze the [glossary term:] association between diet as an [glossary term:] independent variable and another variable (e.g., between diet at baseline and onset of cancer), and the 24HR is the main instrument, statistical modeling of data from multiple administrations of the 24HRs will account for within-person random error, allowing greater [glossary term:] precision in the estimates of associations, and thus increasing statistical [glossary term:] power.
- If your research objective is to analyze the association of an independent variable (e.g., socioeconomic status) and diet as the dependent variable, statistical modeling to remove within-person random erro is not necessary. However, averaging 24HRs across multiple days may increase the precision of the diet estimate and thus the statistical power to detect associations. In addition, variables known to affect quality of report (e.g., body mass index) should be included as [glossary term:] covariates in analyses.
- If your research objective is to analyze a change in diet as a result of an intervention, the potential for [glossary term:] differential response bias must be considered. To avoid the effects of this potential bias, an objective measure, such as measurement of serum carotenoids as a marker for fruit and vegetable intake, could be considered. In such a case, a 24HR may be used as a secondary source of information.
- In intervention studies in which an objective instrument, such as a [glossary term:] biomarker, has been used as the main instrument and 24HRs have been used as a secondary instrument, the agreement between the objective instrument and the 24HRs should be examined. If there is substantial agreement, the 24HRs may be useful for analyses of additional dietary factors beyond those measured by the objective instrument.
- In intervention studies in which 24HRs are used as the main instrument, variables known to be related to reporting [glossary term:] accuracy (e.g., body mass index) and to differential response bias (e.g., social desirability score) should be included as covariates.
- Respondent burden may cause [glossary term:] attrition. The extent and nature of this attrition must be considered in the analyses, for example, by comparing the objective measures and the self-reported diets at baseline of those who completed the study to those who dropped out.