Exploratory Data Analysis

Words: 2426
Pages: 10

Crime Classification

Abstract

In the last years, the task of crime prediction has gained significant popularity in the research literature. Although, there exist many procedures and approaches used to carry out investigations to respond to different crimes, predicting where and when a crime is likely to occur leads to develop more efficient strategies either to prevent crimes or to improve the investigation efforts.

It is mostly observed that people tend to follow certain life habits and patterns, and even though it is not always true, the amount of crimes occurred under the same circumstances make these kind of methods work reasonably good. Therefore, one of the main justification of crime prediction is to identify these patterns in order
…show more content…
Project Structure

In general, for developing this project we used two phases. In the first phase, as shown in figure below, we worked with the training dataset by exploring it, finding patterns, cleaning it, extracting the most important features and transforming these features as necessary for our models, afterwards we validated each classification algorithm with a partition cross validation with 0.7 for training and 0.3 for testing and we obtained the log-loss result, finally we selected the model that gave us the best result.
4. Exploratory Data Analysis

In this section, I aimed to analyze the features of the training set, in order to get insights and discover interrelations among the features and the target variable (Category). With this purpose, the first thing I did was to observe the distribution of the crimes in the data set. One interesting fact discovered is that the crimes are not only unevenly distributed but also that their distribution follows the power law or long tail, as shown in figure 5, this fact will make the least common crimes more difficult to predict and further feature engineering must be performed. Nevertheless, the ten most common crimes shown in table 1 cover more than 83% of the whole training