What Factors of a Community Predict That Community’s Crime Rate?
Essay by Joshua Thomas • November 26, 2018 • Essay • 325 Words (2 Pages) • 772 Views
Essay Preview: What Factors of a Community Predict That Community’s Crime Rate?
Project Model Reasoning
Our research question is the following: “what factors of a community predict that community’s crime rate?” To find the answer to that question, we have found two datasets of Chicago crime rates and education statistics by area. In our datasets we have the fields of district, ward, literacy rate, safety perception score, type of crime, date of crime, and various other education and crime data.
To analyze this data, we plan to use a classification model to predict whether an area is high, medium, or low risk for crimes. The first thing we need to do is combine our two datasets. We will do this by matching the “Police District” column in the education dataset with the “District” column in the crime dataset. This will give us a unified view of the data to work with. We will then filter the columns of this new dataset to include only relevant columns for analysis. After this we will handle missing row data depending on the nature of that data (perhaps either drop the row or interpolate if possible). We will then run the data through our classification model using RapidMiner. We still need to choose which particular model to use, but we are considering the use of CHAID or Naïve Bayes. We would use these two because they are both widely used classification methods.
This will give us a unified view of the data to work with. We will then filter the columns of this new dataset to include only relevant columns for analysis. After this we will handle missing row data depending on the nature of that data (perhaps either drop the row or interpolate if possible). We will then run the data through our classification model using RapidMiner. We still need to choose which particular model to use, but we are considering the use of CHAID or Naïve Bayes. We would use these two because they are both widely used classification methods.
...
...