Validation

Validation is a critical concept when determining whether a dietary assessment instrument is suitable for a particular research question. This concept involves several aspects: evaluating the [glossary term:] validity of an instrument, understanding characteristics of those who [glossary term:] misreport on an instrument, and considering what type of [glossary term:] measurement error is present in the instrument.

Evaluating Validity

The validity of screeners has been evaluated in several ways (see Key Concepts About Validation). The strongest class of validity studies is based on objective [glossary term:] recovery biomarkers.

  • Salient features of studies with recovery biomarkers:
    • Recovery biomarkers are ideal for validation because the intake of the dietary component is reflected by the [glossary term:] biomarker in a relatively constant and known manner. Recovery biomarkers thus provide unbiased estimates of [glossary term:] true intake.
    • Known recovery biomarkers are [glossary term:] doubly labeled water (DLW) for energy intake, urinary nitrogen for protein intake, urinary potassium for potassium intake, and urinary sodium for sodium intakes.
    • Few studies have evaluated screeners against recovery biomarkers, as most screeners assess dietary factors for which biomarkers are not known.
  • Results of evaluation studies using screeners and biomarkers:

A second class of validation studies, which includes most of the validation studies, examines screener performance relative to other self-report instruments, such as 24HRs.

  • Salient features of comparative studies:
    • [glossary term:] Comparative validation studies administer two or more self-report dietary instruments to the same population, and often for the same or overlapping time periods.
    • Comparative validation is imperfect, as no self-report instrument represents true intake. Although individual comparative validity studies may be useful, for example, to learn whether two different instruments produce comparable results, no overall judgment about screener validity can be made from this type of study.
    • Another weakness in comparative validation is that errors in the two instruments are likely to be [glossary term:] correlated, which overstates their agreement. Thus, although individual comparative validity studies may be useful for particular studies, no overall judgment about a screener's validity can be made from this class of studies.

Many different screeners have been developed for different purposes and different populations. These vary widely by format, items, and length. Some research has compared the performance of different screeners in the same population.

  • Results of studies comparing different screeners:
    • In one comparative validity study using multiple 24HRs over a year as the [glossary term:] reference instrument, two different fruit and vegetables screeners performed similarly and somewhat better than the National Cancer Institute's (NCI) Diet History Questionnaire, an FFQ [7].
    • Another study, using a 24HR and a concentration biomarker as reference instruments, found better performance of a longer 36-item fruit and vegetable screener compared to two shorter screeners [6].
    • To our knowledge, no research has compared different modes of screener administration in the same respondents.

NCI's Register of Validated Short Dietary Assessment Instruments includes descriptive information and links to abstracts for validation studies performed on screeners.

Understanding Misreporting

Misreporting on dietary assessment instruments can occur either by overreporting or [glossary term:] underreporting intakes. Knowledge of who is likely to misreport, and in which direction, is useful in interpreting screener results (Learn More about Misreporting).

Only a few studies have examined misreporting on screeners. Results from one comparative validation study, within an [glossary term:] intervention study design, found that social desirability trait affected reports of percentage energy from fat among women but not among men, and did not affect reports of fruit and vegetable intake among either women or men. Multiple 24HRs were used as the reference measure [8]. Another study found a similar lack of social desirability trait effect on screener-derived fruit and vegetable intake [9] (Learn More about Reactivity and Learn More about Social Desirability).

Considering Measurement Error

Measurement error refers to the difference between the true value of a parameter, such as true sodium intake, and the value obtained from a particular measure, for example, sodium intake estimated from a screener (see Key Concepts About Measurement Error). There are two types of measurement error:

The structure of measurement error in screeners has not been formally evaluated. For frequency-type screeners, we assume that the measurement error structure is similar to that of FFQs. The major type of measurement error for an FFQ is systematic error, arising from its limitations, which include incomplete or inappropriate food lists and the difficulty inherent in performing cognitively complex memory and averaging tasks. For behavioral screeners, we have no information about the measurement error structure.