If we see data cleaning as a process of detecting, diagnosing and editing data abnormalities, then we have to use every tool available in order to find those cases. The traditional and simple approach is process data and check out each variable, looking for subjects whit a significant distance in relation to the central points of the distribution. These values have to be confirmed, and, in the worst case, deleted.
But are these subjects deviants in only one response? Why should they be eliminated, shrimping the extention of our data? Can they be valuable representatives of a group that would be hidden until now?
When we apply multivariate statistical analysis, we can identify those subjects that systematically provide responses out-of-range, that are more than eventual mistakes. Multivariate analysis (like cluster analysis, discriminant analysis or similar statistical procedures) allow us to show the presence of small groups that can become a valuable target for our business.