Handling Time Series Data
Many times we come across data that is time dependant (like weather, stock price, currency…). For such data we have multiple series of values belonging to a specific time. For example in a stock market on a given day we have a Open, High, Low and Close value for one single script. Such data cannot be directly read from CSV and used for analysis as this data is time dependant. This means that observation of our dataset has time tag attached to it. The main distinguishing feature of this kind of data is that order between cases matters, due to their attached time tags. In the case of Stock data we have what is usually known as a multivariate time series because we measure several variables at the same time tags, namely the Open, High, Low and Close (OHLC).
R has several packages to deal with this kind of data. Among the most flexible ones are zoo and xts.Both have similar power although xts has more capabilities. Technically xts extends the zoo class. This document talks about xts (xtensible time series) and shows with example how easy it is to manipulate and plot data for an xts object.
I used the book “Data Mining with R” as my reference. I have been trying to build the Stock model highlighted there using PA. I will post about it when it is ready.
I have used the S&P 500 Index data from the dataset that came with the book. I have used the CSV version of the file. I loaded the file in SAP PA and wrote the below code to read the data as a time series data.
The S&P 500 data and the model built are shown below:
The code for creating “Time Series Component” and plotting various graphs is shown and explained (comments##) below:
mymain <- function(mydata)
## The package xts is required for using xts classes and functions. Loading the xts package.
## Reading the data from csv file as a time series data and storing it in GSPC dataset. The system reads data as independent time series as will be visible from summary
GSPC <- as.xts(read.zoo(mydata))
## Function to plot multiple graphs
## Getting all Open vvalues between 26-Feb-200 to 03 Mar-2002
X1 <- GSPC[“2000-02-26/2002-03-03”, c(“Open”)]
## Getting all High values for Jan 2003
X2 <- GSPC[“2003-01”, c(“High”)]
## Getting all Low values after 1 April 2005
X3 <- GSPC[“2005-04-01/”, c(“Low”)]
##Plotting different values with appropriate ticks
plot(GSPC$AdjClose, xlab = “Dates”, ylab = “Adjusted closing price”,type = “l”, main = “S&P 500 Adj Close”,major.ticks=’years’,minor.ticks=FALSE,col=3)
plot(X1, xlab = “Dates”, ylab = “Open price”,type = “l”, main = “S&P 500 Open”,major.ticks=’months’,minor.ticks=FALSE,col=2)
plot(X2, xlab = “Dates”, ylab = “High price”,type = “l”, main = “S&P 500 High”,major.ticks=’days’,minor.ticks=FALSE,col=1)
plot(X3, xlab = “Dates”, ylab = “Low price”,type = “l”, main = “S&P 500 Low”,major.ticks=’years’,minor.ticks=FALSE,col=4)
plot(GSPC$Volume, xlab = “Dates”, ylab = “Volume”,type = “l”, main = “S&P 500 Volume”,major.ticks=’years’,minor.ticks=FALSE,col=5)
To get candlestick chart:
mymain <- function(mydata)
## Quantmod is required to plot candlestick chart,
GSPC2 <- as.xts(read.zoo(mydata))
## Converting data to numeric type to handle “NaN” values.
storage.mode(GSPC2) <- “numeric”
candleChart(last(GSPC2, “3 months”, theme = “white”, TA = NULL))
output <- GSPC2$Close
You will see that time series data is treated differently from summary function. The summary of raw data is below:
The summary of GSPC data is below
As we see above the raw data cannot be differentiated according to time, but using xts we can assign time index to each series and also do indexing on the position of time to plot the graphs that we want. The outputs are shown below.
Time Series Plot
I could not find a way to plot candle stick using par function with the time series plot. Anyone knows a way please put it in comment section.