Statistics, Biostatistics & Informatics: 2009

Thursday, December 17, 2009

Steps involved In Designing a Questionnaire

Once the decision has been made to use a particular technique, the following questions should be considered before designing the questionnaire.

(a) What exactly do we want to measure according to the objectives formulated and variables identified?
(b) Of whom will we ask questions and what techniques will we follow?
(c) Are the respondents mainly illiterate?
(d) How large is the sample that will be interviewed?

The above questions are raised to ensure that the contents of the questionnaire are relevant to the
(i) goals of the study and (ii) individual respondents.

In a sample survey, it is customary to employ structured interview rather than unstructured ones, since the former lend themselves better to quantitative analysis and the latter create serious data processing difficulties, particularly if the sample is large. A large interview is one that employs a standard questionnaire (or interview schedule) to ensure that all respondents are asked exactly the same set of questions in the same sequence. The exact wording of each question to the respondent. This is also true for survey when information are sought by mail questionnaire.

The questioning of persons is an imposition and invasion of privacy, so it should not be surprising that some persons do not respond as we expect. In dealing with this problem, there are several problems to remember with respect to the designing and wording of the questions. We discuss below a few of these points.

(a) Use simple language
(b) Start with an interesting and easy question
(c) Use short language
(d) Avoid double-barrel questions
(e) Avoid ambiguous wording of questions
(f) Avoid leading questions
(g) Avoid questions with vague words
(h) Avoid presuming questions
(i) Avoid hypothetical questions
(j) V questions that involve memory
(k) Avoid sensitive or embarrassing questions
(l) Maintain sequencing of the questions

Questionnaire and its Construction

Interview schedule or the self-administered questionnaires are probably the most important and commonly used research instruments for data collection. Construction of these tools thus occupies a central position in any scientific investigation. Before we discuss this issue, we distinguish between a questionnaire and a schedule.

Questionnaire: A questionnaire is an instrument that is generally mailed or handed over to the respondent and filled in by term with no help from the interviewer or any other person.

Schedule: A schedule also known as an interview schedule, is an instrument that is not given to the respondents but is filled in by interviewer himself who reads the questions to the respondents and records the answers as provided by the respondents.

Before elaborating the steps involved in designing a questionnaire, we need to know the types of questions used in questionnaire. Depending on how questions are asked and recorded, we can distinguish two types of questions: Open-ended questions and closed questions. A question that is formulated without pre-determined response is an open-ended question. An open-ended question permits free response that should be recorded in the respondents own words. Here the respondents are not provided with any possible answer to choose from. Such questions are useful to obtain information on:

• Facts with the researcher is not very familiar or difficult to recollect;

• Opinion, attitude and suggestions of informants etc;

• Sensitive issues

A closed question on the other hand, offers a list of possible options or answers with alternatives, from which the respondents must choose. Closed questions are useful if the range of possible responses is known. In practice, a questionnaire usually has a combination of open-ended and closed, arranged in such a way that the discussion follows as naturally as possible. Data processing is much easier in terms of time and resources when the interview schedule is structured and closed.

Open-ended question may provided valuable new insight into the problem relating to the issues not previously thought of at the planning stage. Closed questions, on the other hand, have the advantages of providing quick answer. The analysis is also easier with the closed question.

The most important disadvantages of open-ended question is that it may lead to distorted information when the interviewers are unskilled. Analysis is also time consuming with such question. Closed question are unsuitable for face-to-face interview. Options provided in the questionnaire may lead to bias and some important information may be missed if it is not asked.

In practice, a questionnaire usually has a combination of open-ended and closed questions, arranged in such way that the discussion follows as naturally as possible. For open-ended questions, multiple responses is usually allowed. The interviewers in such cases will not be in a hurry to skip to the next question. He should be trained to wait for additional answer that the respondent may provide. For closed question too, the interviewer must choose to tick the most appropriate answer(s).

A question may again be either pre-coded or post-coded. A pre-coded scheme may be followed for closed questions such s for sex. Thus, you may designates the male by a numerical code “1” and the female by “2”. You may do the other way round ‘1” for female and “2” for male, but the former is the most common practice.

When possible answers cannot be exactly comprehended in pre-coding scheme, the question is kept open and after getting the responses from the field, the answer are arranged in logical order and then numerical codes are assigned to each selected response.

Errors in Sample Survey

There is no denying of the fact that the whole sampling procedure is liable to varying degree of errors at all stages of its operation. The total errors involved in such operation can broadly be classified as Sampling Error and Non-Sampling Error.

Sampling Error

The sampling error is always assessed with reference to the value of the population parameter. Whatever may be degree of cautiousness in selecting a sample from a population; there will always be a difference between the population value and its corresponding estimates. This difference is attributable to sampling and is termed sampling error.

A sampling error is usually measured in terms of the standard and no other reasons can be attributed to cause such error, is called sampling error.

A sampling error is usually measured in terms of the standard error for a particular statistic (e.g. mean, proportion, ratio etc.). If a sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formula for computing standard error as square root of the sample variance. In many occasions, we use more complex designs than SRS and consequently, measurement of standard error also warrants more complex formula. N this text, we have elaborated the discussion on estimating standard error in this case of some commonly used designs.

Non-Sampling Error

In practice, every operation of survey is a potential source of non-sampling errors. These are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the respondent or the interviewer and the data entry error. While this suggests a multiplicity of sources of non-sampling errors, we can group them in to four broad categories as follows:

1. Non-Response Error

2. Measurement Error

3. Randomized Response

4. Processing Error

01. Non-Response Error: Element of non-response refers to the situation when no data are possible to be collected for one or more of the elements selected for the survey. An element may be an individual (respondent) or any other unit, such as a household. The reasons for such non-response may be either due to the fact that

(i) The respondent could not be contacted

(ii) They could be contacted but they refused to be interviewed or that

(iii) They were contacted and provided data but the elicited data were dubious in quality and thus were excluded from data processing.

Broadly speaking, non-response rates are relatively more pronounced in

(i) Mail surveys

(ii) Surveys dealing with sensitive issues and

(iii) Interview surveys with in adequate trained interviewers.

Since non-response can hardly be avoided, our attempt will be to achieve as low non-response rate as possible. Here are some measures that are likely to contribute towards achieving low non-response rate

• Making the audience survey oriented

• Imparting training to the survey statisticians

• Imparting training to the survey interviewers

• Call backs and reminders

• Sub-sampling the non-respondents.

If the people have a positive attitude and appreciation of the use of statistics, they are likely to co-operate to a large extent, thereby contributing to the response rate.

02. Measurement Error: By measurement, we understand determining the ‘true’ value of the variable or category of an attribute of interest. If we fall to do so, we encounter measurement error.

The potential sources of measurement error, among others are,

(a) Failure to understand the questions by the respondents

(b) The respondents are unaware of the true answer to the question

(c) The question are biased.

03. Randomized Response: In many cultures, one reason why people do not provide true response or altogether refuse to respond is their sensitivity to the question asked. Imagine a survey designed to estimate the proportion of persons who view X-rated video, addicted to marijuana, indulged in immoral activities, committed crime or ever have evaded taxes. A person, who does not view X-rated video, say will probably respond with a ‘No’. The response of a viewer, however, could be ‘Yes’ or ‘No’ or an outright refusal to the question. This is true for the other cases as well. Thus, direct questioning on these may introduce bias in the results.

A reasonable precaution is to treat the response of the individuals confidently and assure them that the response cannot be tracked back to the respondents. Such assurance can be given when the data are collected by personal interview or by means of a small questionnaire but not by any means where the person interviewed may feel alarmed, embarrassed or afraid of revealing the truth to the interviewer. Randomized response method have been developed to cope with this problem of ‘evasive’ bias. The method was introduced by Walner (1965) that aims at encouraging truthful response by disassociating the question from the response. Walner showed that it is possible to estimate the proportion of P of individuals who belong to some sensitive category by means of a survey using personal interview without the respondent revealing his or her personal status with respect to the question asked. The objective is to encourage truthful answers while fully preserving confidently.

04. Processing Error: After careful consideration of the objectives of the study, one must plan for processing of the data. Such a plan assures the researcher that all the information has indeed been collected in a standardized way. The processing of the data involves several steps, a few of which are: sorting, categorizing, coding and compiling. At any stages of these operations, errors may creep in.

Coding is a special kind of measurement operation. Its purpose is to classify the response to a question into meaningful and mutually exclusive categories, so as to bring out their essential pattern. These responses are then labeled by some numerical (1, 2 etc.) or letter codes (M,S etc.) for ease of computer entry. The coding operation is typically carried manually by special coding clerk. Experience has shown that this operation is susceptible to error. The verification of this coding error may be done in several ways: A lucid description is given by Dalenius (1985). He also presents a automatic coding scheme by computer, which is helpful to reduce the coding error rate.

Sorting data refers to grouping and categorizing data according to some common characteristics. Sorting operation may be effected either through computer or manually. Manual sorting is used when the size of the sample is small. Sorting error is more severe for manual sorting. Tallying process is a part of manual sorting, which is liable to larger margin of error.

When using computer, you may run the risk of compilation error if you fail to:

• Choose an appropriate computer program

• Ensure correct entry of the data

• Choose appropriate verification or validation program

• Use right programming

Pilot Survey

Pilot survey is a small-scale replica of the main survey, which goes beyond pre-tests by linking documents and procedure, which have already been individually pre-tested. It is a systematic and integrated inquiry in the form of a miniature or preliminary survey. A pilot survey is often compared with a theatrical dress rehearsal, which, before a final theater is staged. The pilot survey is designed to help the planners to clarify many of the problems left unsolved by pre-test operations. A pilot survey nearly always results in considerable improvement to the survey documents leading to a general increase in the efficiency of the survey. A well-planned pilot survey also offers an opportunity to the researchers on the basis of its results, as to whether the main survey still worth to carry out. Many standard designs require some prior or supplementary knowledge of the population elements to allow estimation based on sample observations. Pilot survey is a neat solution to the problem in such instances. In post-stratification, prior knowledge of the stratum, weight is needed for the estimation purpose. In sample size determination with SRS, we need some quessed values of p and s.

Pilot study might provide us with such values to determine. Since the pilot survey is the researchers last safe guard against the possibility that the main inquiry may be ineffective, its size and designed should be also so planned that it fulfills the above functions and the sample should ideally be of a comparable structure to that of the main survey.

Acceptance Sampling

Acceptance sampling is an important field of statistical quality control that was popularized by Dodge and Roming (1959) and originally applied by the U.S. military to the testing of bullets during World War II. If every bullet were tested in advance, no bullet would be left to ship. If, on the other hand, none were tested, malfunctions might occur in the field of battle, with potentially disastrous results.

Dodge reasoned that a sample should be picked at random from the lot and on the basis of information that was yielded by the sample, a decision should be made regarding the disposition of the lot. The process is called lot acceptance sampling or just acceptance sampling.

Acceptance sampling is the task of taking samples from the lot and decides whether the lot is to be accepted or rejected, on the basis of evidence provided by inspection of samples drawn at random. If the average quality level is indicated by the sample, the lot is accepted, if not the lot is rejected.

The main purpose of acceptance sampling is to decide whether or not the lot likely to be acceptable, not to estimate the quality of the lot. The plan merely accepts or rejects the lots.

Acceptance sampling is employed when one or several of the following hold:

• Testing is destructive.

• The cost of 100% inspection is very high.

• 100% inspection takes too long.

Snowball Sampling

Snowball sampling is the colorful name or technique of building up a list or a sample of a special population. Some recent authors have referred to snowball sampling as a chain referral sampling. It has achieved increased use in recent years in situations where respondents are difficult to identity and are best located by using in initial set of its members or informants through referral network approach (Kish, 1961). For example, consider the selection of beggars for which no frame is available. This can be best done by asking an initial group of beggars to supply the name of other beggars they come across. Selection of mosque Imams or the sex workers also can be made following this network approach, since members of this population may well know each other particularly in small areas.

Snowball sampling is a non-probability sampling in which persons initially chosen for the sample are used as informants to locate other persons having necessary characteristics making them eligible for the sample through referral network.

Although snowball sampling is generally considered non-probability sampling, strategies have been developed to draw snowball sampling through probabilistic approach, which allows compilations of sampling errors, and use statistical test of significance. If one wishes, the snowball sample to be probabilistic, one should sample randomly within each stage.

Snowball sampling, whether probabilistic or non-probabilistic, is conducted in stages. In the first, a few persons processing the requisite characteristic are identified and interviewed in the third stages and so on. The term snowball stems from the analogy of a snowball, which begins small but becomes bigger and bigger as it falls downhill.

Snowball sampling has been particularly used to study drug culture, heroin addiction, teenage gang activities, community relations and other issues where respondents may not be visible or are difficult to identify and contact.

Purposive Sampling

A non-probability sampling method that conforms to certain criteria is called purposive sampling. There are two major types of purposive sampling:

(i) Judgment Sampling (ii) Quota Sampling

(i) Judgment Sampling: Judgment sampling or expert choice is one in which cases are included for investigation through a planned selection procedure.

In judgment sampling procedure are selected who are considered to be most representative of the population as a whole. It is a sampling because choice of the individual units depends entirely on the sampler, who, on his own judgment, decides the sample to be selected that conforms to some criteria. In a study of labor problem, you may decide to talk only with those who have experienced discrimination while they were in job. Election results are predicted from only few selected persons because of their predictive in past elections.

(ii) Quota Sampling: Another type of non-probability sampling is (ii) quota sampling. This technique is widely used by market researcher, political opinion seekers and many others to avoid the cost problems of interviewing a pre-selected sample of individuals. In this method, individual are not pre-selected at all, but once strata are formed (usually based on sex, age, social status, region of residence etc.), general breakdown of the sample is decided (e.g. how many men and how many women, how many persons in each age group or in each sex category or social class is to include) and quota assignments are allocated to the interviewers, selection of the individuals within the strata is left to the interviewers with whom they are to conduct interviews. The factors sex, age, social status etc., which are used to form strata, are termed ‘Quota Control’.

Convenience Sampling and Accidental Sampling

Convenience Sampling

Non-probability samples, which are unrestricted, are known as convenience samples. Researcher or field workers have the freedom to choose whomever they find; thus the name convenience. The convenience sample may consist of respondents living in an easily accessible locality. Undoubtedly, it is the simplest and least reliable form of non-probability sampling. The primary virtue is its low cost.

While a convenience sample has no control to ensure precision, this method is quite frequently used, especially in market research and public opinion surveys. They are used because probability sampling is often a time-consuming and expensive procedure and in fact, may not be feasible in many situations. In the early stages of exploratory research, when one is seeking guidance, this approach is recommended.

Accidental Sampling

An accidental type sampling is one in which the selection of the cases is made whatever happens to be available instantly.

In such sampling, individuals are selected as they appear in a process. If it is decided that only diabetic patients will be chosen from a queue in front of a hospital counter, the resulting sample will lead to an accidental sampling procedure.

Probability and Non-Probability Sampling

A variety of sample designs is available for drawing sample from a population. A fundamental question is whether the sample is selected by a probability mechanism or by some other means. A probability sample has the characteristic that each element in the population has a known and non-zero probability of being included in the sample. As a result, selection biases are as possible to be avoided and statistical theory can be employed to derive the properties of the estimators. A probability sample is also designed that statistical inference to population can be based on measures of variability computed from the sample data. In addition, probability sampling allows us to construct a confidence interval within which the true value of the population parameter is expected to lie.

A non-probability sample, on the other hand, is based on a sampling plan that does not have the above feature. It is a non-random and subjective method of sampling where the selection of the population elements comprising the sample largely depends on the personal judgment or the discretion of the sampler. It is arbitrary and is made on the basis of convenience.

A good number of probability sampling designs are in use. Among the most widely used are simple random sampling, systematic sampling, stratified sampling, multi-stage sampling and probability proportional to size sampling. A detailed exposition of these designs is undertaken in the subsequent chapters.

With non-probability sampling, there are several ways to choose cases to include in the sample. Often we allow the choice of subjects to be made by field workers on the spot. When this thee case, there is greater opportunity for bias to enter the sample selection procedure and to distort the finding of the study. The obvious disadvantages of non-probability sampling are that, since it is not based on the probability mechanism, the investigator cannot claim that his or her sample is representative of the large population. This greatly limits the investigators ability to generalize the findings beyond the specific sample studied. Further, no confidence interval estimation is possible for non-probability sampling.

Evaluation of a Sample Design

It is almost always desired that a sample design be evaluated for its perfection and a perfect sample design is excepted to meet certain criteria which include among others, the criteria of accuracy, reliability, validity and efficiency. We provide below a brief account of these concepts.

Accuracy: The accuracy of a sample estimate refers to its closeness to true population value. The closer the sample estimate to the population value, the greater is its accuracy. In our foregoing example with four population values 10, 17, 21 and 24, the population mean is 18, while the sample mean base on the observations 10, 24 and 17. The difference between these two means in an indication of inaccuracy in the estimate. If the draw results in the selection of the observations 10 and 21, the sample mean is 15.5, which is further away from the true mean and hence the estimate is more inaccurate. The accuracy of an estimate is generally assessed on the basis of its mean square error (MSE). The smaller the MSE of an estimator, the greater is its accuracy.

Reliability: if we assume that there is no measurement error in the survey, then the reliability or precision of an estimate can be stated in terms of its sampling variance or equivalently, of its standard error. The standard error measures the precision with which the estimate from a particular sample approximates the hypothetical average result from all possible samples. The smaller the standard error of an estimate, the greater is its reliability. Samples with high precision are regarded as efficient samples.

Validity: if we assume that there is no measurement error in the survey, then the validity of an estimator can be evaluated by examining the bias of the estimator. The smaller the bias, the greater is the validity. The validity of an estimated population characteristic thus refers to how the mean of the estimator over repetitions of the process, yielding the estimate differs from the true value of the parameter being estimated.

Efficiency: The criteria of efficiency are related to the cost of sampling. A sampling design is considered to be more efficient than another, if the former results in lower costs than the later design, with the same degree of reliability.

The discussion above helps us to set criteria to identify a good sample design. We speak of these criteria with reference to only probability sampling methods, because probability-sampling methods are the only sampling plans that allow us to assess the reliability of the estimates to be derived from the sample data. Keeping this in view, we summarize below what a sample design requires to quality as a good sample design.

(a) A good sample design should be oriented to the research objectives in terms of its selection and estimation of the population values. Furthermore, it must have the compliance with the survey design and suit to the survey environment.

(b) A good sample design should allow statistical inference to draw regarding the population values. This is possible only, when the sample is probability sample. A design must allow us to measure valid estimates of its sampling variability; which is ordinarily expressed with SE or MSE.

(c) A sample design must judge in terms of its practicability. This means that a good design is one, which permits execution with simplicity, clarity, practicability and completeness.

(d) Economy is another aspect of a sample design. A good design must therefore involve lowest cost for the fulfillment of the survey objectives, which are commonly stated in terms of the variance of the estimates.

Sample Design and Survey Design

Sample design refers to the methods to be followed in selecting sample from the target population and the estimation technique vis-à-vis formula for computing the sample statistics. These statistics are the estimates used to infer the population parameters.

Implicit in the concept, the sampling design also includes such issues as the choice of the sampling frame, determination of the size of the sample, estimation of reliability of the estimates, stratification procedure, sampling allocation method, clustering of the sample etc.

Survey design is the process of preparing a complete plan of operations to be followed in conducting a survey and disseminating its intended results. Specially, it includes among others decision on such factors as variable to be included in the survey (called survey variables), the method of data collection (whether by direct interview, telephone interview, or self administrated questionnaire), construction of questionnaire, organizing fieldwork, management of non-sampling error, data processing and data analysis. It seems obvious that the survey objectives covered under survey design determine the sample design and in practice, the sample design must be developed as an integral part of the overall survey design. Survey design and sample design are thus the two interrelated concepts and one is complementary to the other.

Advantages of Sampling over Complete Count

It is now widely agreed that sampling is a most popular part and scientific technique of data collection. The following are some of the considerations that dictate the use of sampling in place of complete count:

Cost: By comparison with a complete enumeration of the same population, a sample may be based on data for only small number of the units comprising that population. A sample survey may thus be very much less expensive to conduct than a comparable complete enumeration.

Time: Being small in scale, a sample survey is not only less expensive than a census; the desired information is obtained in much less time.

Scope: The smaller scale is likely to; permit the collection of a wider range of survey data and allow a wide choice of methods of observations, measurements or questioning than is usually feasible with a complete enumeration.

Respondents Convenience: The sample survey considerably reduces the overall burden of the respondents in the way that only a few, not all of the individuals in the population are put to the trouble of having to answer questions or provide information.

Labor: Sampling saves labor. A small staff is required both for fieldwork and for tabulation and processing data.

Flexibility: In certain types of investigation, highly skilled and trained personnel or even specialized equipment are needed to collect data. A complete enumeration in such cases is impracticable and hence sample surveys, being more flexible and greater scope, will be more appropriate for this type of inquires.

Data Processing: The data-processing requirement for a sample survey is likely to be much less than for a complete count. Whereas a complete count may well require a computer to process the data, a sample survey can often be processed manually with fewer people and less logistic supports.

Accuracy: A sample survey employs personnel of higher quality equipped with intensive training and supervision that is more careful is possible for fieldwork. As a result, observations, measurements, equipments, or questioning for a sample survey can often be carried out more carefully and thus yields results subject to similar non-sampling error than is generally practicable in a more extensive complete enumeration, usually at a much lower cost.

Feasibility: there are situations where complete enumeration is not feasible and thus a survey is necessary. There is also instance where it is not practicable to enumerate all the units due to their perishable or fragile nature. The alternative in this situation is to take only a few of the units. For example, consider the problem of checking the quality of mango juice produced by a local company. One way to test the quality is to drink entire lot, which is impracticable. Testing of electric bulb, screws, glass, medicine all are example of this type, where sampling is necessary.

Limitations of Sample Survey

Despite several advantages of sample survey over complete count, it has some disadvantages or limitations too. The results of a sample survey are subject to sampling error, and on that account are less precise than those of an otherwise comparable complete enumeration are. Moreover, by chance alone, a sample may seriously over-represent, under –represent or even fail to represent in frequently occurring sub-groups of a population. In such instances, the estimates provided by such surveys are liable to larger margin of errors.

Steps in planning and executing a sample survey

Sample survey now a days, is the most efficient technique of providing relevant information for drawing inference about a population. From economic point of view, it is the only viable means to study the population. It is therefore essential to describe the main steps involved in executing a sample survey. The following are some of the step:

Objectives: Whenever we plan a sample survey, a clear and concise statement of the objectives should laid down. The objectives must be kept simple enough to be understood by those working on the survey and to be met successfully when the survey is completed. While setting the objectives, keep in mind the hypothesis (if any) being tested and the end points that will be used in the study to examine these hypotheses.

Target Population: the population from which the sample is to be drawn should be defined and identified in clear and unambiguous terms in terms of its contents, units, and extent and reference time. This important because a sample must be selected from the population you define at the outset. The ‘target population’ may be modified to ‘survey population’ to take account t of practical constraints.

Data: The data to be collected must be relevant and pertinent to the purpose of the survey. Keeping the objectives in view, a detailed list of variables should be prepared, defined and how these variables will be measured should be indicated in advance.

Precision Desired: In a sample survey, only a part of the population is measured for which the survey results are almost always subjects to error. Error of measurement is also an additional sources of distorting the survey results. These errors can be reduced to some extent by using larger sample and improved measuring instruments. However, this involves additional cost, time and effort. Consequently, a decision on the degree of precision desired in the result must be specified.

Sampling Frame: A sampling frame is an indispensable tool for conducting a sample survey. A complete accurate and up-dated sampling frame must be constructed in order to draw valid sample.

Sample Design: When a number of sampling designs are available for a particular sample survey, one that is most efficient in terms of cost, reliability and appropriateness to meet the objectives, should be employed (e.g. simple random, stratified or cluster). An approximately chosen sampling design is highly desirable to obtain reliable estimates of the population parameters. Many surveys have produced little or no useful information because they were not properly designed.

Survey Design: The design of a survey involves many interrelated decisions on such factors as the mode of data collection, constructing questions, data processing method as well as the sample design. Field data may be collected in a number of ways. A questionnaire may be constructed an d be mailed to the respondents, who will be instructed to fill-it up and send back the same to surveyor. This is sometimes called self-administered questionnaire method. The survey may also employ a face-to-face interview using an interview schedule. This is referred to as the interview method. Data may also be collected through telephonic conversation or through direct observation, in which it is called an observational study.

Sample Size: Determination of sample size is perhaps the most difficult part of a statistical investigation. Often it is the claimed that a sample should bear some proportional relationship to the size of the population from which it is drawn. This is not true. The absolute size of a sample is a much more important than its size compared with the population. The size of a sample is a function of the variation in the population parameters under study and the precision of the estimate needed by the researcher. A sample of 500 may be appropriate sometimes, while more than 2000 are required in other circumstance; in other cases, perhaps a sample of only 50 is called for.

Preparation of Field Materials: Questionnaire is an important instrument for any scientific study. It is therefore necessary to construct questionnaires relevant to the study keeping in mind the objectives. Necessary instruction manuals to fill-up the questionnaire must also be prepared in advance so that the field workers can collect data without any difficulty.

Pre-Testing: Pre-testing is a trial or operation that allows us to test the questionnaire in the field or other measurement instruments, to screen interviewers and to check on the management of field operations. The results of the pre-test usually suggest that some modifications must be made before a full-scale sampling is undertaken. The failure to pre-test concepts and detailed plans for the survey could result in loss of time, costs exceeding budget limit and even a survey o poor and inadequate quality. Pre-testing provides the means of uncovering deficiencies and the basis for corrective action prior to carrying out the actual survey work. It may also suggest amount of workload to be assigned to each investigator and an insight into the data processing operation in advance.

Duration of Study: Once the data of execution of the survey is decided, it remains to set up a work schedule for the completion of the various stages of the study. These included, among others, the time that would be needed for preparatory works, sample selection, pre-test, questionnaire development field work including travelling and subsistence, tabulation plan, training of field staff, data coding, data entry processing and report writing. It is recommended that this information be presented in the form of a detailed timetable with months across on the top and activities listed along the left margin. For each of the activities mark a cross against the month(s) in which they will occur.

Fieldwork: Efficient organization of this fieldwork is a pre-requisite of the successful completion of a statistical investigation. The personal involved in the fieldwork should receive-adequate training related to the work. Appropriate measures should be taken so that the field personnel are regularly supervised and their works be monitored. Quality of the work should be ensured at very early stage of the work so that any inconsistence or shortcomings can be removed well before the completion of the work. An instruction manual should be prepared for all categories of person involved in the field operation. This will ensure quality data assuring maximum accuracy in the estimates.

Data Management: Large surveys generate huge amounts of data. Hence, a well-prepared data management plan is of prime importance. This plan should included the steps for processing data from the very inception of the study until the final analysis is completed. The administrative and computer procedures to be used, the type of staff available and whether any training will be needed to facilitate data management should also be described. A quality control scheme should also be included in the plan in order to check for agreement between processed data and data gathered in the field.

Editing and Checking: A detailed plan must be outlined at the outset to check and edit the field data soon after they are hand for any erroneous and inconsistent entries. Both manual and computer checking may be employed for any inconsistency in data. For any erroneous entry, which cannot be corrected at this stage, should be corrected by re-interviewing the respondents.

Data Processing and Analysis: once the data are checked, edited and corrected for errors, processing of data should be attempted keeping in view the objectives of the survey. This task also needs careful planning. The next step is the statistical analysis, which is carried out to arrive at the desired estimates of the population parameters. Outline the statistical methods that will be used for analysis of the data, including a description of how the information collected will be used to test the stated hypothesis and how any missing data will be dealt with.

Project Management: For collaborative study, involving several organizations, indication should be made at the planning stage, which will have overall responsibility, which other organizations will be involved and what their responsibilities will be, and the manner in which the work will be coordinated and monitored.

Report Writing: Finally, report of the finding of the study highlighting the policy implications and suggesting possible action and measures to be taken including policy recommendation, should be written in the report.

Lessons Learned: Survey is a complex undertaking and is liable to large margin of errors if not properly handled. Because of this complexity, things never go exactly as we plan. The main obstacle and difficulties, which one expects, might interfere with the successful completion of the study within the time and cost proposed should therefore be described

Census and Survey

Survey is a general term that refers to the collection of data by means of interviews, questionnaires or direct observation. The entities surveyed could from a whole ‘whole’ when it is called a census (complete count) or ‘part’ when it is called a sample survey.

A sample survey is a study involving a subset (or sample) of individuals selected from a larger population by accepted statistical methods. It is an alternative to complete count of a population serving as a basis for estimates or inferences for that population.

The census is an operation that is generally confined to inquire that are more or less straightforward counts like censuses of population, manufacturing industries, live stocks etc. while the term survey being applied to inquires which goes beyond simple counts. From this point of view, it is sensible to speak of sample census. Traditionally, a sample census is a large sample survey that is undertaken after each population census using a long questionnaire. This sample census is integrated with the population census for detailed data on such events as birth, death, marriage and migration. Almost every country Population Census also follows this practice.

Many people now regard a census as a special case of sampling when all the units of the population are included in the sample (UN, 1997). This appears to be a meaningful concept, because except for sampling error. The same sources of error that pertain to a sample also pertain to a census. In fact, non-sampling error can often be more adequately controlled when using a small sample. Therefore, the combined error from all sources is not necessarily less for a census than for a sample.

Sampling frame

The theory of probability sampling, on which sample survey techniques are based, assumes the existence of a list of units from which the sample can be drawn. Such a list is called a frame. A frame is either a complete list of all units of the population or some other basis that provides a selection process such that every element of the population has a known non-zero probability of being included in the sample.

Definition: A sampling frame is a list of units or group of units of the population to be sampled, organized and arranged in such manner that every unit occurs once and only once in the list and no unit is excluded from the list.

In sampling problems, we encounter two types of sampling frames:

(a) Frames for area sampling and

(b) Frames that are lists; for example, lists of households or addresses of housing units

Area frames are usually used to sample geographical areas. With this technique, each element of the population is associated with a particular geographical area consisted by a group of people or households. Then a sample of areas is included in the survey or a sample of these elements is included. A list-sampling frame, on the other hand, is a complete list of well-defined reporting units. This list should contain relevant information about individual units, which will enable efficient sampling.

Friday, December 11, 2009

Historical Perspectives of Sampling

Despite the current widespread use of surveys, sampling as a formal field of study did not begin until early in the twentieth century. During the first quarter of the twentieth century, considerable advances have been made in all aspects of survey methodology. Initially, statisticians were debating whether anything less than a complete count of a population would suffice, given that this was feasible in principle (O’Muirchaiagh and Wong, 1981). Since that time, support gradually grew for samples selected in a purposive fashion , so as to make them ‘representative ‘ of the populations with regard to certain known variables and attributes . From about 1925 on, theory and practice gradually got momentum in favor of ‘randomization’ as a method for selecting a sample. This development, in run, contributed to the growth of mathematical literature dealing with the properties of various random sampling methods and estimates based on these methods. Uses of random sampling are now widespread and may now be viewed as the efficient, dominant and economic method for estimating the unknown characteristics of a population in a variety of practical setting.

Booth, the pioneer in conducting sample survey, formally inquired into the ‘Labour and Life of People of London (Booth: 1989-1902). This effort gave an impetus to others. At the turn of the century, Booth and Rowntre were conducting large-scale surveys. By the late twenties and early thirties social surveys were being conducted in London and some neighboring cities. Subsequently, surveys began to be used in conjunctions with town planning and various government activities and the techniques were adapted to the needs of market and public opinion research. Now a days, a government organization is wholly dependent on the finding of survey results on such areas as health, social, education, economic and the like others. Social scientist regard social surveys as one of their basic techniques and courses on survey methodology are given in many universities.

Tuesday, December 08, 2009

Importance of Sampling

A sample is taken almost always to provide statistical data on an extensive range of subjects for both research and administrative purpose. The following examples are designed to illustrate the importance of sampling in real life.

01. In opinion poll, a relatively small number of persons are interviewed and their opinions on current issues are solicited in order to discover the attitude of the community as a whole.

02. Marketing and advertising agencies conduct countless inquires to determine customers expectations, attitude, buying habits or shopping patterns. These information are useful to the manufacturers of goods for sales promotion.

03. Large lots of manufactured products are accepted or rejected by purchasing departments in business or government following inspection of a relatively small number of items drawn from these lots.

04. At border stations, customs officers enforce the laws by checking the effects of only a small number of travelers crossing the border.

05. Auditors often judge the extent to which the proper accounting procedures have been followed by examining a small number of transactions, selected from a large number of such transactions taking place within a specified period.

06. Information may be needed by concerned authority how families of different size, composition and social status spend their incomes. A small number of city dwellers may be asked to provide information on this.

07. Ministry of health and family welfare might be interested to know the status of knowledge among then adult population in a city on the danger of environment pollution by interviewing a few adults of the city.

Countries measurement of the economy, health, labor force, contraceptive use, immunization, unemployment, income, export, import, industrial products and the like rely on samples, rather than on complete enumeration, numerous surveys are being conducted to develop, test and refine hypothesis in such disciplines as sociology, psychology, demography, political science, anthropology, geography, economics, education and public health. Both local and central government make considerable use of survey data to be aware of the various population characteristics for planning and development purpose. In every case, a sample is selected because it is impossible, inconvenient, slow or uneconomical to monitor the entire population.

Sunday, December 06, 2009

Concept of Sampling

In the modern world, facts and figures are sine-qua-non for balanced development, transparent governance and efficient administration. In order to achieve the above, carefully worked out plans are drawn up and executed as far as possible. To formulate these in a scientific manner, it is essential to have basic facts in numerical terms for various groups, regions in the century and for the country as a whole.
It is beyond the resources of countries and more often impossible to collect facts on regular basis from each person, establishment or farm in the country. Fortunately, as we know it is not required to enumerate each unit in the population to arrive at an acceptable figure, known as estimate of the of the population parameter. A well-designed sample can be providing an accurate estimate that a country needs at a cost the country may well afford.

During the past 30 years or so, the methods and techniques of sampling have reached a high level of scientific development. As a result, the uses of sampling have been extended into a wide variety of fields. From the standpoint of statistical data collection, sampling is a means for selecting a relatively small of households, persons or others units for inclusion in a survey of some kind and inferring conclusions based on these limited number for instances. This selection is done because enumeration of all units in the target population (population for which information is needed) is a large and complex undertaking that is usually affected by limitations of time, budget and availability of experienced personnel. Not only that, it is unnecessary as well as from the standpoint of precision and statistical reliability. Many countries have found, moreover, that sampling can play an important role in an overall census program.
Let us now introduce the concept of sampling by an example.
Example 01: Very frequently, we talk about banning or restricting student’s politics in the university campus. This is a very sensitive issue. We sometimes wonder whether our views on this issue are shared by the student community, who are directly or indirectly involved in this important issue. We may want to know the actual percentage of students of Dhaka University who do not approve of banning students politics in the campus. This percentage could be obtained by asking every student in the campus if he/she approves or disapproves of this. The procedure, however, would be quite time-consuming, expensive and probably impracticable. To overcome this, we might choose only a small portion of the students and try to infer the attitude of all the students based on the answers received from this portion of students.
This is a typical example of statistical inference. The procedure of choosing the portion of students from the entire student body is technically known as sampling. The portion of the students referred to above is called sample and the entire student body a population.

Definition: Sampling is a scientific process of selecting a part from a statistical population and may embrace the derivation of estimates and any inferences derived from them for that population.

Statistics, Biostatistics & Informatics