Data Geek III – Crime and Census Data for England and Wales
This blog is my entry for the Data Geek III: Thrones of Data contest and I have picked the House of Spirits.
I have analysed just over 18 million England and Wales crimes and also population data from the 2011 Census. I use my own Lumira custom choropleth maps to help to analyse this data. The data is all in the SAP HANA Cloud Platform (SAPHCP – a free account) and I use only the free edition of SAP Lumira to query my data in HANA.
Glossary of Terms used
Two terms I will use throughout the blog and a base to map data for England and Wales.
Local authority districts (LAD) – Is a generic term to describe the ‘district’ level of local government
Lower Super Output Areas (LSOA) – Taken from National Stats site “ Super output areas (SOA) are a geographic hierarchy designed to improve the reporting of small area statistics. The 34,753 lower layer SOA in England (32,844) and Wales (1,909) were built from groups of output areas that were created in 2003 and maintained in 2011” reference Link
Background & Objective
My entry to the contest will use custom choropleth maps based on my Lumira extensions here and here to analyse the crime data I uploaded to the SAPHCP that I covered here. I will also use the maps to analyse population data from the 2011 Census for England and Wales link here.
I like the way choropleth maps can represent (or misrepresent) data on a map to tell a story about the data. While looking into the use of choropleths elsewhere I have found statements that they should not represent raw data (it can depend on the exact nature of the data) as this can lead to a misunderstanding of any map’s intention. The recommendation is to normalise raw data so it can be compared on a map, I like this article on the subject here. So I intend to show how the raw and normalised data can affect a choropleth map and the interpretation of it.
A bit of background to the data I will use.
While researching how to use the data I recently came across this BBC blog about recorded crime data and that it is not a reliable method for spotting crime trends and is more about police activity. Also there are inspections about police data here that detail some limitations in how the different police forces collect the data in England and Wales and how accurate it is.
So I decided for my blog that I will check and compare specific crime types based on population age ranges in England and Wales and not look for any particular trend in the 3 years of data I have in the SAPHCP. The data range is from January 2011 to December 2013. I would look across the entire data set for any particular crime.
To get population details I would use the population data from the 2011 census.
I found the Nomis web site allows the population census data to be downloaded in a format that matched the LSOA areas used by the police data site for it’s crime data. Therefore allowing me to merge the data based on these LSOA codes.
SAPHCP calculation view
Previously I had two ways to view data on a map from my custom choropleth blog for the England and Wales crime data. Those were by using the raw data and normalised by the size of the Local Authority Districts. This was a comparison unfair to London due to its population size. So my objective was to add the population stats from the 2011 Census to allow further breakdown of the Crime data I had loaded into the SAPHCP. The census is a snapshot of time and I have taken the next step to use this data across the entire date range of crime data. I did have to spend some time with the HANA Calculation view to allow all measures to be used in Lumira. So I combined two analytic views into one calculation view.
Screen shot of my SAPHCP HANA Calculation view
Summary of Analysis
I will focus my blog on the following areas. First I will look at the data for Crime by using the raw data and then normalised by area size and population. Next breakdown the census population data into age ranges. Use these age range groups to select areas for further analysis of the crime data (using the population data as a rough guide to an areas breakdown of age groups – obviously this changes over time and could even change daily ). And finally pick one crime type for further analysis.
Now onto the actual visualisation of the data and starting where I stopped last time from my custom choropleth blog.
Crime Data – Continue where I stopped last time
Top 10 raw count
The above map and chart indicate Birmingham has the most crimes over the 3 year period. However Birmingham is the largest Local Authority District and has the most population (by LADs) in the UK. So not surprising that it would have the highest raw crime data.
Normalising the crime data by size of the Local Authority District (by km).
Top 10 by area
The bar chart and map show London with the most crimes per kilometer. However London has a large population as well so now I can calculate against the crime data I can show crimes / 1000 people.
I have added the total population to the chart for this as it does highlight the special case for London, particularly the areas of City of London and Westminster as there is a high transient population. With people commuting to work and tourists. So special care needs to be taken and the population stats can be misleading in these areas. So I took these London areas out to see how it would affect my map.
Now I have removed City of London and Westminster from the map then the spread of colour is more across the map.
My chosen age group ranges to focus on are
- Age 0 to 15
- Age 60 and over
To see if there is a difference where young and old live in England and Wales.
Move to the South West of England when you’re too old for London
Local authority district lad map by percentage of population .
The above map indicates the age range up to 15 is highest inland especially outer London.
Map Age 60 and over by LAD
Seems as if London is not popular with the older generation and a move south west at an older age occurs. However the map is percentage and does not show densisty.
Living side by side in London?
The map highlights the high population density of London. Although from the above map it did lead me to investigate the following idea.
Move to the Coast when you’re too old for London
I have used the lower level detail for this map by using the LSOA. The maps indicates a more even spread of age range up to 15 by percentage across the country with peaks at the major cities.
Again the percentage figures on the map for the over 60 indicates a move away from London. However a population density map.
60 and over live by the coast or living side by side in London ?
I did debate whether to show the density map for as the LSOA area are small and the screenshot may not make it clear. However London and coastal places do come out on top for the density map.
Compare population age group areas against types of crime.
I chose to compare the age ranges 0 to 15 to the over 60s from the 2011 Census. I took the above data from the census population maps and filtered them to the top areas percentage wise for each age range. I took approx 250000 total population count for the two groups of the areas chosen. I can’t link the population to the crime data date range or link the age to the crimes, however it is to compare a relatively young average population to an old one. In my opinion there should be such a contrast between young and old today between the two grouped areas – maybe you can argue over that. I’ll explain further over the following charts and maps.
The population breakdown of each grouped area is as follows, the information on the left is for the age group up 15 and on the right the 60 and over age groups. I will keep this for the next few charts.
These chosen areas are at LSOA level however the below maps shows the local authority district where the LSOA is located. Also a chart showing some of the areas on the map.
The only crime type that is higher in the over 60 Age grouped area is shoplifting and did lead me to search on OAP and shoplifting crimes.
In the final part of my blog I will focus on one crime type.
As I was researching how to present crime data I came across this tweet which mentioned students are more likely to be victims of property crime and a mentioned bike theft. Bike theft is a relatively new crime type to make it into my data set. The crime is recorded from May to December 2013. When I listed the top six LSOA areas I was immediately drawn to Oxford and Cambridge and I made a link to students. So I picked on the local authority district of Oxford for further analysis as one of its LSOA areas was top in a raw count of bike thefts.
Bike theft breakdown by LSOA areas for Local Authority District of Oxford
As described in my custom choropleth map blog I visited http://overpass-turbo.eu/ to get further information for the map. For the Oxford map I ran a query amenity=university and the following map shows places extracted from the site related to the university. I will probably restrict the shapes of the university to completely different colour or just convert to point data and not polygon shapes next time.
I now need to show the maps for bike thefts by LSOA area and population. On the maps below the left one shows the crimes mapped by area km and the map on the right shows bike crime per population.
Here is some advice from Oxford University to protect your bike.
Although I did not cover Cambridge in the end I did come across the following headline.