Suggestions for future surveys


This section presents suggestions based on problems found during the data analysis of the 1999 "Homeless in Texas" statewide survey. These suggestions are not general guidelines for the construction of surveys, but specific points that should be corrected in the 1999 survey in order to make it a more useful data collection instrument.

 

The objectives of the survey should be kept in mind when writing each question. If a question asks for information that is not useful or cannot be analyzed, this question should be eliminated. Weak questions increase the cost of the survey but do not return interesting information. For example, in the version of the survey used, the question "Where did you spend last night?" is a weak question. Although the knowledge of the place a homeless individual spend the previous night can be relevant in some cases, it does not show his/hers habitual place of sleeping. It also has no relationship with most of the other variables in the survey.

When constructing the survey, the researcher should have a clear idea of  how the data collected will be analyzed. For example, many questions of the survey allowed the individual to mark as many options as applied. This creates a problem for the analysis, because the statistical program cannot tell which item is the most important for each individual. In this survey, that is what happened in the question "Do you get money or assistance from any of the following? (mark as many as apply)". Individuals were allowed to mark both important sources of income and less important sources of income. Consequently, the researcher was not able to identify the most important sources of income. If the researcher is interested in the most important source of income, but also wants to know the other ones, it is possible to leave the instruction "Mark as many as apply" but another question should be added: From the sources of income listed above, which one is the most important? or "Rank the sources of income you chose above by importance".

The data should be collected and typed in the statistical software in a specific way that allows the statistical program to analyze it. For example, the question number 18 of the 1999 survey is: For each of your children under 18, please tell me their age, sex, what grade they attend in school, and where they are lilving now. It should be entered as a separated database. In this database, each case should be a child. If the data of question 18 is entered in the data base where the cases are the homeless individuals, each variable will have many missing values. This happens because individuals differ in the number of children, and many do not have any children. Most of the statistical procedures cannot be executed when there are too many missing values.

Some questions lead to false answers, generating bias in the results. The answer that will be falsely chosen depends on the way a question is worded. It also depends on the benefits some individuals think they can get by answering in a certain way. Individuals can also bias the results by always choosing the most socially desirable item. For example, the results of the survey shows that the question "Are you unable to work because of disability"” generated an exaggerated number of "Yes". This bias in the results was detected by the researcher. If the bias had passed undetected, false results would be reported.

The items of a question should refer to just one variable. For example, in the question "Where is the child living now?", some of the items refer to the residence of the child, others to who the child lives with, and others refer to both.   Consequently, it is impossible to differentiate the two things.


Back to main page