What will be the price of Apple stock in 2022?
APPL (Apple Inc.), the stock with the biggest market capitalization in the United States, has a huge influence on many different fields.
For this reason, we feel like being able to predict its return is paramount. In order to predict Apple Inc.’s return, we used our interdisciplinary knowledge and decided to combine different technical indicators to see which ones better explained/predicted its return. To identify which technical indicators better explained Apple Inc.’s return, we developed a multivariable regression including 24 different technical indicators.
Those indicators include ATR, BB_MA, CMCI, ADXR, EMAVG, FEAR_GREED, BB_UPPER,BB_LOWER,BB_PERCENT,HURST,MM_RETRACEMENT,MOM_MA,MOME NTUM,MACD, MACD_SIGNAL, MAE_MIDDLE, MAOsc, MAO_SIGNAL, PTPS, ROC, TAS_DSS, RSI, SMAVG, and Dummy variable CROSS.
We collected daily data from 8/18/2021 to 12/18/2021. We used R studio to build the multivariable regression and residual analysis models. We also used statistical distribution tests to refine our independent variables/strategy.
In conclusion, we collected data on 24 technical indicators, and analyzed the results of the multivariable regression and residual analysis to test the Gauss-Markov assumptions as well as the normality assumption.
Variables and their business sense
Dependent Variable:
Apple company stock daily return
Independent Variables After Statistics Selection:
EMAVG: exponential moving average (EMA) is a type of moving average (MA) that places a greater weight and significance on the most recent data points.
MOM_MA: It compares the current price with the previous price from a number of periods ago. MAE_MIDDLE: 100 * (Applied price – low / (high – low)
ROC: a momentum-based technical indicator that measures the percentage change in price between the current price and the price a certain number of periods ago.
SMAVG: A simple moving average (SMA) calculates closing prices, by the number of periods in that range.
CROSS: Cross is the dummy variable. This factor is based on the judgment criteria, whether the last price of the APPL stock is larger than the value of the EMAVG factor. If the last price is larger, the Cross value will be 1, else it’s 0.
Model Building
First Model: Multivariable Regression
We collected 24 factors, but eliminated 6 of them as they did not have continuous time series data. Therefore, we continued building our model using the remaining 18 variables. Below is the output of the model from R is shown in the figure below:
We can see the summary shows six variables are significant at a significance level of alpha = 0.05. These variables are EMAVG, MOM_MA, MAE_MIDDLE, ROC, SMAVG, CROSS. Therefore, since the other variables are not statistically significant, we will remove them before continuing our analysis.
Second Model: Multivariable Regression with Significant Variables
After removing the statistically insignificant variables, we rebuilt the regression model. Below is the output of the updated multivariable regression:
Below are the confidence intervals of the 6 variables we tested:
2.5 % | 97.5 % | |
(Intercept) | -0.046484637 | 0.0969558627 |
EMAVG | 0.009891665 | 0.0347252591 |
MOM_MA | -0.006660177 | -0.0019121426 |
MAE_MIDDLE | -0.017729746 | -0.0062241277 |
ROC | -0.007301806 | -0.0007536999 |
SMAVG | -0.019513697 | -0.0013970389 |
cross | -0.017785074 | -0.0071492796 |
For each parameter, we tested the following hypothesis:
For i = 1-6:
H0: ���� = 0 (the coefficient of parameter equals to 0)
H1: ���� ≠ 0
From the results of the multivariable regression above, we can see that the p-value of each independent variable is less than the significance level α = 0.05. Therefore, we reject the original (H0) hypothesis. There is enough evidence that each independent variable in the model is significant.
As for Adjusted R-squared, we get 44.1%, which means 44.1% of the Apple stock daily return can be explained by these independent variables.
In regards to the joint test for the model, we can refer to the F-statistic, which is 12.17 and P value is 1.99e-09. This tells us that at significance level α = 0.01, there is enough evidence that at least one of the parameters is significant.
Below is the correlation between these variables:
Autocorrelation of the Error Terms and Conditional Heteroskedasticity Tests
To test for autocorrelation of the error terms of the model 2, we plotted the error terms on a graph first as shown below:
As we can see in the graph, the error terms appear to be random, but since the data is too dense, there could be a pattern that is not easily observable. Therefore, we conducted a Durbin-Watson test for simple linear autocorrelation with lag 1.
Durbin-Watson Test:
We assume ���� = ������−1 + ����,
H0: a=0 (there is no relation between errors)
H1: a>0
Output:
data: model2
DW = 2.4641, p-value = 0.9559
alternative hypothesis: true autocorrelation is greater than 0
The p-value is 0.9559, so we do not reject the original hypothesis (H0) at significance level alpha = 0.05, which means that there is not enough evidence that the error terms in the model are
positively correlated with lag 1. However, failing to reject the original hypothesis does not indicate that the errors are independent.
QQ-Plot
QQ-plot in R is used for testing the normal distribution assumption of the error terms. From the graph above, we can see that not all points are along the straight dotted line. This means that the residuals do not strictly obey the normal distribution assumption, which will lead to the statistics being biased.
The graphs above show that there is more random spread along the plot as well as the fact that the residuals are independent. Based on this, we can assume that the homoscedastic assumption holds.
Breusch-Pagan Test:
We set up hypothesis as follows:
H0: there is no conditional heteroskedasticity
H1: there is conditional heteroskedasticity
The result from R BP-test:
BP = 22.45, df = 6, p-value = 0.001003
BP p-value is 0.001003, which is largely less than the significant level 0.05 we set. It means that there is conditional heteroskedasticity. Heteroskedasticity does not affect the estimator of beta, but it will influence the assumed distribution of the statistics, so that the statistical conclusion will be biased.
For correction of the heteroskedasticity, one way is to find robust standard errors to correct the original standard errors, and another way is to use the generalized least-squares method.
Conclusion
Our final multivariable linear regression model is
������������ = 0.025236 + 0.022308EMAVG − 0.0.004286MOMMA − 0.011977MAEMIDDLE − 0.004028ROC − 0.010455SMAVG − 0.012467CROSS
Based on the residual analysis and the hypothesis test, we conclude that our multivariable liner regression model is a valid model. This model means that:
∙ A 1 unit increase in the EMAVG would cause 0.022308% an increase in Apple Inc.’s stock return
∙ A 1 unit increase in MOM_MA would cause 0.0.004286% decrease in Apple Inc.’s stock return
∙ A 1 unit increase in MAEMIDDLEwould cause 0.011977% decrease in Apple Inc.’s stock return
∙ A 1 unit increase in the ROC would cause 0.004028% decrease in Apple Inc.’s stock return
∙ A 1 point increase in the ��MAVG index would cause the Apple Inc.’s stock return to decrease by 0.010455%
∙ If cross =1, then Apple Inc.’s stock return will be . 010455% higher than if the cross=0.
Finally, Adjusted R-squared equals to 0.441, which means that these 6 technical indicators explain 44.1% of Apple’s stock returns.
APPENDIX:
library(readxl)
library(tidyverse)
library (psych)
library (dplyr)
library (ggplot2)
library (GGally)
library (skimr)
library(foreign)
library(lmtest)
library(dplyr)
data <- read_excel(‘clean.xlsx’)
data1<- read_excel(‘clean.xlsx’,sheet = 2)
ggpairs(data1)
warnings()
data
#skim(data)
#ggpairs(data)
model <- lm(data=data, return~ATR+BB_MA+CMCI+ADXR+EMAVG+FEAR_GREED +HURST+MM_RETRACEMENT+MOM_MA+MOMENTUM+MACD+MAE_MIDDLE+MA Osc+PTPS+ROC+RSI+SMAVG+cross)
coef(model)
summary(model)
e<-summary(model)$resid
plot(e)
dwtest(model)
#model1
model1 <- lm(data=data,
return~ATR+EMAVG+MOM_MA+MAE_MIDDLE+ROC+SMAVG+cross) summary(model1)
e1<-summary(model1)$resid
plot(e1)
dwtest(model1)
#model2 delete ATR
model2 <- lm(data=data, return~EMAVG+MOM_MA+MAE_MIDDLE+ROC+SMAVG+cross) summary(model2)
ggpairs
confint(model2)
e<-summary(model2)$resid
plot(e)
dwtest(model2)
#model3 fear
model3<- lm(data=data, return~EMAVG+cross+MOM_MA+MAE_MIDDLE+ROC+SMAVG) summary(model3)
#working with logarithmic data
library(memisc)
library(dplyr)
library(‘psych’)
library(lmtest)
library(sjPlot)
library(sgof)
library(ggplot2)
library(foreign)
library(car)
library(hexbin)
library(GGally)
library(vcd)
library(devtoolbox)
library(pander)
library(knitr)
library(devtools)
library(sessioninfo)
library(gdata)
library(readxl)
a <- read_excel(‘./clean.xlsx’, sheet=1, na=’NA’) glimpse(a)
#residual analysis
library(ggplot2)
linmodel<-lm(return~EMAVG+cross+MOM_MA+MAE_MIDDLE+ROC+SMAVG,data=a) bptest(linmodel,studentize=FALSE)
summary(linmodel)
plot(linmodel, which=1,col=c(“darkblue”)) #Residual plot vs fitted values plot(linmodel, which=2,col=c(“red”)) #Normal probability plot
plot(linmodel,which=3,col=c(“darkgreen”))#Scale-location plot
#more random spread along the plot it shows that the residuals are independent and then we assume homoskedasticity holds
#equal variance along the regression line
Back To News
What will be the price of Apple stock in 2022?
Written by
Yinan Chen
Jiahui Zhou
Neel Shah
Julian Moreno