Data Processing & Data Analysis
Data Processing Requirements
- Each food and beverage reported in a food record requires coding and data processing. Poor handwriting, missing information, and inconsistent reporting from individual to individual are common problems.
- A nutrient composition database is required to translate foods and beverages reported on the food record into nutrient intakes (Learn More about Food Composition Databases for 24-hour Dietary Recalls and Food Records). A database, such as the Food Patterns Equivalents Database (FPED), is required to translate foods and beverages reported into amounts of guidance-based food groups (e.g., dark-green vegetables, whole grains, and added sugars).
- Software is required to derive nutrient estimates for each food, beverage, and supplement (Learn More about Software for 24-hour Dietary Recalls and Food Records).
- Daily nutrient and/or FPED estimates should be examined to identify [glossary term:] outliers (Learn More about Outliers). Outliers sometimes indicate coding or processing errors that should be fixed.
Data Analysis Considerations
General Considerations
- Nutrition data often have [glossary term:] skewed distributions (truncated at zero and skewed toward high intakes). Therefore, [glossary term:] transformation of nutrient data is usually required for statistical testing and modeling. Log transformation is often used, but other transformations that may better approximate [glossary term:] normal distributions should be considered.
- Beyond correcting obvious coding or processing errors, [glossary term:] extreme values, or outliers, should be examined and accounted for. Outliers can be handled in various ways in analyses.
- Some analyses use energy-adjusted dietary intake variables (e.g., percentage of energy from fat). [glossary term:] Energy adjustment is particularly useful when overall [glossary term:] underreporting of energy is anticipated, or when the density rather than absolute amount of a food/nutrient is of biological interest (Learn More about Energy Adjustment).
- Depending on study design, analyses should use modeling to account for [glossary term:] nuisance effects, such as day of the week for each intake day (Learn More about Day-of-Week Effect), and season (Learn More about Season Effect) in which the record is completed.
- Some surveys, particularly [glossary term:] cross-sectional surveys, include sampling weights that are meant to allow generalization to the population from which the sample was drawn. Sampling weights account for both sampling design and response rates. Thus, their use is recommended for analyses that estimate the means of the population. Their use in analyses of relationships between diet and other factors is less settled among statistical experts, and, therefore, it may be advisable to compare the consistency of results between weighted and unweighted analyses.
- If [glossary term:] recovery biomarkers are collected in a subsample of participants (called an [glossary term:] internal calibration sub-study) at relevant time points, they can be used as reference instruments in [glossary term:] regression calibration to correct for [glossary term:] measurement error (or bias) for some nutrients (Learn More about Regression Calibration). Alternatively, data from an external source (called an [glossary term:] external calibration study) can be used (Learn More about Calibration).
Guidance for Specific Research Objectives
- If your research objective is to estimate solely the [glossary term:] mean intakes of a group, defining the mean for a proportion or [glossary term:] ratio appropriate to the particular objective requires further consideration. For example, data from multiple repeated administrations of food records can be used to estimate average per person or population ratios and proportions (Learn More about Ratios and Proportions). In addition, for studies that have collected multiple records, it is unnecessary to adjust for [glossary term:] within-person random error.
- If your research objective is to estimate [glossary term:] usual dietary intake distributions for a group (e.g., to examine percentiles or to estimate the proportion above or below some threshold), clarification of whether the focus is to assess the distribution of habitual intake over the long run (Learn More about Usual Dietary Intake) or of single-day intake is required.
- If your focus is the distribution of habitual intake for a group over the long run, statistical modeling must be conducted to account for [glossary term:] day-to-day variation in intakes over the various administrations of multiple or [glossary term:] n-day records. (Note that the repeat administration(s) can be done on a subsample rather than the entire sample.)
- If your focus is the distribution of intake on a given day for a group, then distributions of dietary intake, including the mean, can be estimated without considering day-to-day variation.
- If your research objective is to analyze the association between diet as an [glossary term:] independent variable and some other variable (e.g., diet at baseline and onset of cancer), and the food record is your main instrument, statistical modeling of data from multiple administrations of the food records will account for day-to-day variation, allowing greater [glossary term:] precision in the intake estimates and thus of their association with health [glossary term:] outcomes, and increasing statistical [glossary term:] power.
- If your research objective is to analyze the association of an independent variable (e.g., socioeconomic status) and diet as the [glossary term:] dependent variable, statistical modeling to remove within-person random error is not necessary. However, averaging food records across administrations may increase the precision of the diet estimate and thus the statistical power to detect associations. In addition, variables known to affect quality of report (e.g., body mass index) should be included as [glossary term:] covariates in analyses.
- If your research objective is to analyze a change in diet as a result of an intervention, the reactivity bias inherent in data collected using food records makes this method less desirable. Food records are desirable, however, to motivate and monitor participants trying to adhere to a dietary intervention.
Several methods have been developed to appropriately analyze 24HR data to estimate usual distributions of intake, including the [glossary term:] National Cancer Institute (NCI) method [16-18]. This method also may be appropriate for multiple n-day food records, in which each n-day record is considered a single administration, although more research is needed to apply and evaluate the NCI method to food record data.