removal of data which are obviously erroneous or irrelevant. This should be done with caution: outliers or data which are anomalous and do not harmonize with your hypotheses, are perhaps not faulty after all. They can as well show that your hypothesis is defective. normalizing or reducing your data means that you eliminate the influence of some well known but uninteresting factor. For example, you may remove the effect of inflation by dividing all the prices with the price index of the date of the purchase.
In the analysis itself, the target usually is to extract an invariance, an interesting structure in data. This does not mean feeding data in a computer and expecting the machine to report the structures that can be found in them. Computers are not intelligent enough for that.
Instead, it is habitual that already early in the project, the investigator has a mathematical model that he tries to apply to the data. This model also provides the possible hypotheses, if any, for the investigation project, or at least it functions as an initially nonexact working hypothesis that will be refined during the analysis.
In other words, the investigator first arranges the data into the pattern given by the model, and then he assesses whether the model now is giving a plausible picture of the data or will it still be necessary to search for another, better