Police killings
In the Guardian post, there are 1169 observations each of which shows people killed by the police force. There are 14 columns for each row. Gender string describes the gender of the individual who was killed. The most frequent result is Male whereas female fender is rare in the dataset. String H shows the year in which the murders occurred with F and G showing the month and the date respectively. Washington Post contains 1501 observations and 14 columns per row. Column K describes the state of mental health of the victim with most of them showing that they had false allegations of the same. The differences between these two datasets are that The Guardian post shows how the victim was killed. However, The Washington post shows whether the victim was fleeing or not, the mental health, whether they had a camera or not, and the threat level they posed to the society.
To clean data in The Guardian, I will recommend removal of the column on the year of killing because it is repetitive and combine it with the day and month columns. The rows should also be made to show the number of cases based on race. This can start with the blacks, Hispanics then whites. This way, it is easy to compare the number of deaths by race. The column on classification should be renamed “Cause of Death.” For The Washington post, I would put the flee column and the armed close together. The column on signs of mental illness should be removed from the data set. Data on reason for murder would shed more light on these deaths.