Skip to Content
Technical Articles
Author's profile photo Marc DANIAU

Top Bottom Analysis using R Visualizations (part 3)

As a BI user, a top analysis helps us find in our data, the few important items that have the highest values. But what happens to all of the other values? They simply don’t appear in the visualization. To not completely lose the information about these excluded items, it is common practice to aggregate their values and put the result in an extra category. You will see in part 3 how to add a bar “Other” to your bar plots as well as your stacked bar plots. We will also show a case encountered with our customers that consists of finding the top segments in each bar of a stacked bar chart.

 

We load the R packages required for the coming examples and choose a color palette.

library(dplyr)
library(tidyr)
library(stringr)
library(highcharter)

library("RColorBrewer")
my_colors <- brewer.pal(n = 12, name = "Set3")

 

We will work with the dataset “Import cars” that is available in the R environment.

data(imports85, package = "randomForest")
str(imports85)  

 

 

The data needs some cleaning.

my_data <- imports85 %>% 
  mutate(bodyStyle = str_to_title(bodyStyle)) %>% 
  mutate(make = str_to_title(make)) %>% 
  mutate(make = str_replace(make, "Peugot", "Peugeot")) %>% 
  mutate(make = str_replace(make, "Bmw", "BMW"))

 

Top bars with category Other

We want to retrieve the top 12 brands from the import cars data.

We count the cars by brand and then rank the brands based of the number of cars.

bars <- as.data.frame(table(my_data$make)) %>%
  select(Brand = Var1, Bar_size = Freq) %>%
  mutate(Rank = row_number(-Bar_size))

 

We restrict the brands to the 12 highest.

top_limit = 12
tpn_bars <- bars %>% 
     filter(Rank <= top_limit) %>% 
	 arrange(Rank, Brand) %>% 
	 select(Brand, Bar_size)

 

We add a bar where all the brands that aren’t in the top 12 get aggregated.

oth_bar <- bars %>% 
  anti_join(tpn_bars) %>%
  summarise(Brand = "Other brands", Bar_size=sum(Bar_size))  

df <- rbind(tpn_bars, oth_bar)

 

We plot the data.

title_caption = paste("Top", top_limit, "Import Car Brands")

hchart(df, "bar", hcaes(x = Brand, y = Bar_size)) %>% 
  hc_xAxis(title=list(text = NULL), tickLength = 0, 
           lineWidth = 3, lineColor = "black",
           labels = list(style = list(fontSize = "14px", fontWeight = "bold"))) %>%
  hc_yAxis(title = list(text = "Number of cars"), allowDecimals = FALSE) %>%
  hc_title(text = title_caption, style = list(fontWeight = "bold")) %>% 
  hc_tooltip(headerFormat = "<b> Brand: {point.key}</b><br>",
             pointFormat = "Number of cars: {point.y}")

 

Top-n stacked bars with category Other

We want to segment each bar from the previous chart by the type of cars.

We count the cars by brand and type.

segments <- as.data.frame(table(my_data$make, my_data$bodyStyle)) %>%
  select(Brand = Var1, Type = Var2, Segment_size = Freq) %>%
  spread(key = Type, value = Segment_size)

 

The code below refers to the tpn_bars data frame defined earlier. We obtain the top segmented bars.

tpn_segbars <- segments %>% 
     inner_join(tpn_bars) %>% 
	 arrange(desc(Bar_size), Brand) %>% 
	 select(-Bar_size)

 

We add an extra bar: Other brands.

oth_segbar <- segments %>% 
  anti_join(tpn_segbars) %>%
  gather(Type, Quantity, -Brand) %>% 
  group_by(Type) %>%
  summarise(Brand = "Other brands", Segment_size = sum(Quantity)) %>%
  spread(key = Type, value = Segment_size)

df <- rbind(tpn_segbars, oth_segbar)

 

We plot the data with twelve bars plus one.

title_caption = paste("Top", top_limit, "Import Car Brands")

bar_column <- "Brand"
segments_columns <- df %>% 
 select_if(is.numeric) %>% 
  names()
length_of_strings <- as.integer()
invisible(lapply(segments_columns, function(x){
  length_of_strings <<- append(length_of_strings, nchar(x))
}))

p <- highchart() %>%
  hc_xAxis(categories = df[, bar_column],
	tickLength = 0, lineWidth = 3, lineColor = "black",
     labels = list(style = list(fontSize = "14px", fontWeight = "bold"))) %>%
  hc_yAxis(title = list(text = "Number of cars"), allowDecimals = FALSE, 
           stackLabels = list(color = "black", fontWeight = "bold", enabled = TRUE)) 

invisible(lapply(segments_columns, function(column) {
  p <<-
    hc_add_series(hc = p, name = column,
                  data = df[, column])
}))

p %>%
  hc_chart(type = "bar") %>%
  hc_colors(my_colors) %>%
  hc_plotOptions(series = list(stacking = "normal", 
                 dataLabels = list(enabled = FALSE))) %>%
  hc_title(text = title_caption, 
           margin = 20, align = "center",
           style = list(color = "black", fontWeight = "bold")) %>% 
  hc_tooltip(headerFormat = "<b> Brand: {point.key}</b><br>",
             pointFormat = "Car type: {series.name}<br>Number of cars: {point.y}") %>% 
  hc_legend(align = "center", verticalAlign = "top", 
            reversed = "TRUE", symbolRadius = 0 )

 

 

Top segments

For the last example we must find, brand by brand, the car type having the highest value.

top_limit = 1

df <- as.data.frame(table(my_data$make, my_data$bodyStyle)) %>%
  select(Brand = Var1, Type = Var2, Segment_size = Freq) %>% 
  group_by(Brand) %>%
  top_n(top_limit, Segment_size) %>%
  ungroup %>%
  spread(key = Type, value = Segment_size) %>% 
  arrange(Brand)
  
df <- data.frame(df)

 

We plot the result.

title_caption = paste("Import Cars - The top", top_limit, "type(s) by brand")

# Insert here the code snippet used for the previous chart

 

There are tied results for certain brands: Porsche, Renault and Saab.

 

Continue to part 4

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.