Statistics, Biostatistics & Informatics: Errors in Sample Survey

There is no denying of the fact that the whole sampling procedure is liable to varying degree of errors at all stages of its operation. The total errors involved in such operation can broadly be classified as Sampling Error and Non-Sampling Error.

Sampling Error

The sampling error is always assessed with reference to the value of the population parameter. Whatever may be degree of cautiousness in selecting a sample from a population; there will always be a difference between the population value and its corresponding estimates. This difference is attributable to sampling and is termed sampling error.

A sampling error is usually measured in terms of the standard and no other reasons can be attributed to cause such error, is called sampling error.

A sampling error is usually measured in terms of the standard error for a particular statistic (e.g. mean, proportion, ratio etc.). If a sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formula for computing standard error as square root of the sample variance. In many occasions, we use more complex designs than SRS and consequently, measurement of standard error also warrants more complex formula. N this text, we have elaborated the discussion on estimating standard error in this case of some commonly used designs.

Non-Sampling Error

In practice, every operation of survey is a potential source of non-sampling errors. These are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the respondent or the interviewer and the data entry error. While this suggests a multiplicity of sources of non-sampling errors, we can group them in to four broad categories as follows:

1. Non-Response Error

2. Measurement Error

3. Randomized Response

4. Processing Error

01. Non-Response Error: Element of non-response refers to the situation when no data are possible to be collected for one or more of the elements selected for the survey. An element may be an individual (respondent) or any other unit, such as a household. The reasons for such non-response may be either due to the fact that

(i) The respondent could not be contacted

(ii) They could be contacted but they refused to be interviewed or that

(iii) They were contacted and provided data but the elicited data were dubious in quality and thus were excluded from data processing.

Broadly speaking, non-response rates are relatively more pronounced in

(i) Mail surveys

(ii) Surveys dealing with sensitive issues and

(iii) Interview surveys with in adequate trained interviewers.

Since non-response can hardly be avoided, our attempt will be to achieve as low non-response rate as possible. Here are some measures that are likely to contribute towards achieving low non-response rate

• Making the audience survey oriented

• Imparting training to the survey statisticians

• Imparting training to the survey interviewers

• Call backs and reminders

• Sub-sampling the non-respondents.

If the people have a positive attitude and appreciation of the use of statistics, they are likely to co-operate to a large extent, thereby contributing to the response rate.

02. Measurement Error: By measurement, we understand determining the ‘true’ value of the variable or category of an attribute of interest. If we fall to do so, we encounter measurement error.

The potential sources of measurement error, among others are,

(a) Failure to understand the questions by the respondents

(b) The respondents are unaware of the true answer to the question

(c) The question are biased.

03. Randomized Response: In many cultures, one reason why people do not provide true response or altogether refuse to respond is their sensitivity to the question asked. Imagine a survey designed to estimate the proportion of persons who view X-rated video, addicted to marijuana, indulged in immoral activities, committed crime or ever have evaded taxes. A person, who does not view X-rated video, say will probably respond with a ‘No’. The response of a viewer, however, could be ‘Yes’ or ‘No’ or an outright refusal to the question. This is true for the other cases as well. Thus, direct questioning on these may introduce bias in the results.

A reasonable precaution is to treat the response of the individuals confidently and assure them that the response cannot be tracked back to the respondents. Such assurance can be given when the data are collected by personal interview or by means of a small questionnaire but not by any means where the person interviewed may feel alarmed, embarrassed or afraid of revealing the truth to the interviewer. Randomized response method have been developed to cope with this problem of ‘evasive’ bias. The method was introduced by Walner (1965) that aims at encouraging truthful response by disassociating the question from the response. Walner showed that it is possible to estimate the proportion of P of individuals who belong to some sensitive category by means of a survey using personal interview without the respondent revealing his or her personal status with respect to the question asked. The objective is to encourage truthful answers while fully preserving confidently.

04. Processing Error: After careful consideration of the objectives of the study, one must plan for processing of the data. Such a plan assures the researcher that all the information has indeed been collected in a standardized way. The processing of the data involves several steps, a few of which are: sorting, categorizing, coding and compiling. At any stages of these operations, errors may creep in.

Coding is a special kind of measurement operation. Its purpose is to classify the response to a question into meaningful and mutually exclusive categories, so as to bring out their essential pattern. These responses are then labeled by some numerical (1, 2 etc.) or letter codes (M,S etc.) for ease of computer entry. The coding operation is typically carried manually by special coding clerk. Experience has shown that this operation is susceptible to error. The verification of this coding error may be done in several ways: A lucid description is given by Dalenius (1985). He also presents a automatic coding scheme by computer, which is helpful to reduce the coding error rate.

Sorting data refers to grouping and categorizing data according to some common characteristics. Sorting operation may be effected either through computer or manually. Manual sorting is used when the size of the sample is small. Sorting error is more severe for manual sorting. Tallying process is a part of manual sorting, which is liable to larger margin of error.

When using computer, you may run the risk of compilation error if you fail to:

• Choose an appropriate computer program

• Ensure correct entry of the data

• Choose appropriate verification or validation program

• Use right programming

Statistics, Biostatistics & Informatics

Thursday, December 17, 2009

Errors in Sample Survey

No comments:

Post a Comment

ARCHIVES

About me