The term "paradata" refers to a broad class of data elements that are produced (with or without explicit analytic goals) during the process of survey data collection. Case dispositions, mode of data collection, time stamps, and keystroke files are all types of paradata, but this list does not include all data that could be considered paradata.

The term was coined by Michael P. Couper in a 1998 presentation at the Joint Statistical Meetings, but the term does not appear in the published proceedings paper titled "Measuring survey quality in a CASIC environment." [1]

The context of that presentation and paper was computer assisted survey information collection (CASIC), and the data collected as a byproduct of using such systems, but the concept can included process data that aren't computerized (e.g., paper contact histories) and aren't byproducts (e.g., intentional interviewer observations).

Paradata can be thought of as a type of auxiliary data that can be used to better understand survey errors and costs. In that sense it is similar to administrative records and other auxiliary data.

Paradata should be distinguished from metadata. Paradata reflect the process of data collection (e.g., number and type of contacts, miles traveled, etc.), while metadata often describe a static product (e.g., variable names and labels for a final data set). The dynamic-static distinction is not perfect, but helps motivate the difference between paradata and metadata.


