# What will be the price of Apple stock in 2022?

What will be the price of Apple stock in 2022?

#### APPL (Apple Inc.), the stock with the biggest market capitalization in the United States, has a huge influence on many different fields.

For this reason, we feel like being able to predict its return is paramount. In order to predict Apple Inc.’s return, we used our interdisciplinary knowledge and decided to combine different technical indicators to see which ones better explained/predicted its return. To identify which technical indicators better explained Apple Inc.’s return, we developed a multivariable regression including 24 different technical indicators.

Those indicators include ATR, BB_MA, CMCI, ADXR, EMAVG, FEAR_GREED,  BB_UPPER,BB_LOWER,BB_PERCENT,HURST,MM_RETRACEMENT,MOM_MA,MOME NTUM,MACD, MACD_SIGNAL, MAE_MIDDLE, MAOsc, MAO_SIGNAL, PTPS, ROC, TAS_DSS, RSI, SMAVG, and Dummy variable CROSS.

We collected daily data from 8/18/2021 to 12/18/2021. We used R studio to build the  multivariable regression and residual analysis models. We also used statistical distribution tests  to refine our independent variables/strategy.

In conclusion, we collected data on 24 technical indicators, and analyzed the results of the  multivariable regression and residual analysis to test the Gauss-Markov assumptions as well as  the normality assumption.

Dependent Variable:

Apple company stock daily return

Independent Variables After Statistics Selection:

EMAVG: exponential moving average (EMA) is a type of moving average (MA) that places a  greater weight and significance on the most recent data points.

MOM_MA: It compares the current price with the previous price from a number of periods ago. MAE_MIDDLE: 100 * (Applied price – low / (high – low)

ROC: a momentum-based technical indicator that measures the percentage change in price  between the current price and the price a certain number of periods ago.

SMAVG: A simple moving average (SMA) calculates closing prices, by the number of periods in  that range.

CROSS: Cross is the dummy variable. This factor is based on the judgment criteria, whether the  last price of the APPL stock is larger than the value of the EMAVG factor. If the last price is  larger, the Cross value will be 1, else it’s 0.

Model Building

First Model: Multivariable Regression

We collected 24 factors, but eliminated 6 of them as they did not have continuous time series  data. Therefore, we continued building our model using the remaining 18 variables. Below is the  output of the model from R is shown in the figure below:

We can see the summary shows six variables are significant at a significance level of alpha =  0.05. These variables are EMAVG, MOM_MA, MAE_MIDDLE, ROC, SMAVG, CROSS.  Therefore, since the other variables are not statistically significant, we will remove them before  continuing our analysis.

Second Model: Multivariable Regression with Significant Variables

After removing the statistically insignificant variables, we rebuilt the regression model. Below is  the output of the updated multivariable regression:

Below are the confidence intervals of the 6 variables we tested:

For each parameter, we tested the following hypothesis:

For i = 1-6:

H0: ���� = 0 (the coefficient of parameter equals to 0)

H1: ���� ≠ 0

From the results of the multivariable regression above, we can see that the p-value of each  independent variable is less than the significance level α = 0.05. Therefore, we reject the  original (H0) hypothesis. There is enough evidence that each independent variable in the model  is significant.

As for Adjusted R-squared, we get 44.1%, which means 44.1% of the Apple stock daily return  can be explained by these independent variables.

In regards to the joint test for the model, we can refer to the F-statistic, which is 12.17 and P value is 1.99e-09. This tells us that at significance level α = 0.01, there is enough evidence that  at least one of the parameters is significant.

Below is the correlation between these variables:

Autocorrelation of the Error Terms and Conditional Heteroskedasticity Tests

To test for autocorrelation of the error terms of the model 2, we plotted the error terms on a  graph first as shown below:

As we can see in the graph, the error terms appear to be random, but since the data is too dense,  there could be a pattern that is not easily observable. Therefore, we conducted a Durbin-Watson  test for simple linear autocorrelation with lag 1.

Durbin-Watson Test:

We assume ���� = ������−1 + ����

H0: a=0 (there is no relation between errors)

H1: a>0

Output:

data: model2

DW = 2.4641, p-value = 0.9559

alternative hypothesis: true autocorrelation is greater than 0

The p-value is 0.9559, so we do not reject the original hypothesis (H0) at significance level alpha  = 0.05, which means that there is not enough evidence that the error terms in the model are

positively correlated with lag 1. However, failing to reject the original hypothesis does not indicate that the errors are independent.

QQ-Plot

QQ-plot in R is used for testing the normal distribution assumption of the error terms. From the  graph above, we can see that not all points are along the straight dotted line. This means that the  residuals do not strictly obey the normal distribution assumption, which will lead to the statistics  being biased.

The graphs above show that there is more random spread along the plot as well as the fact that  the residuals are independent. Based on this, we can assume that the homoscedastic assumption  holds.

Breusch-Pagan Test:

We set up hypothesis as follows:

H0: there is no conditional heteroskedasticity

H1: there is conditional heteroskedasticity

The result from R BP-test:

BP = 22.45, df = 6, p-value = 0.001003

BP p-value is 0.001003, which is largely less than the significant level 0.05 we set. It means that  there is conditional heteroskedasticity. Heteroskedasticity does not affect the estimator of beta,  but it will influence the assumed distribution of the statistics, so that the statistical conclusion  will be biased.

For correction of the heteroskedasticity, one way is to find robust standard errors to correct the  original standard errors, and another way is to use the generalized least-squares method.

Conclusion

Our final multivariable linear regression model is

������������ = 0.025236 + 0.022308EMAVG − 0.0.004286MOMMA − 0.011977MAEMIDDLE − 0.004028ROC − 0.010455SMAVG − 0.012467CROSS

Based on the residual analysis and the hypothesis test, we conclude that our multivariable liner  regression model is a valid model. This model means that:

∙ A 1 unit increase in the EMAVG would cause 0.022308% an increase in Apple Inc.’s stock return

∙ A 1 unit increase in MOM_MA would cause 0.0.004286% decrease in Apple Inc.’s  stock return

∙ A 1 unit increase in MAEMIDDLEwould cause 0.011977% decrease in Apple Inc.’s stock  return

∙ A 1 unit increase in the ROC would cause 0.004028% decrease in Apple Inc.’s stock  return

∙ A 1 point increase in the ��MAVG index would cause the Apple Inc.’s stock return to  decrease by 0.010455%

∙ If cross =1, then Apple Inc.’s stock return will be . 010455% higher than if the cross=0.

Finally, Adjusted R-squared equals to 0.441, which means that these 6 technical indicators  explain 44.1% of Apple’s stock returns.

APPENDIX:

library(tidyverse)

library (psych)

library (dplyr)

library (ggplot2)

library (GGally)

library (skimr)

library(foreign)

library(lmtest)

library(dplyr)

ggpairs(data1)

warnings()

data

#skim(data)

#ggpairs(data)

model <- lm(data=data, return~ATR+BB_MA+CMCI+ADXR+EMAVG+FEAR_GREED +HURST+MM_RETRACEMENT+MOM_MA+MOMENTUM+MACD+MAE_MIDDLE+MA Osc+PTPS+ROC+RSI+SMAVG+cross)

coef(model)

summary(model)

e<-summary(model)\$resid

plot(e)

dwtest(model)

#model1

model1 <- lm(data=data,

return~ATR+EMAVG+MOM_MA+MAE_MIDDLE+ROC+SMAVG+cross) summary(model1)

e1<-summary(model1)\$resid

plot(e1)

dwtest(model1)

#model2 delete ATR

model2 <- lm(data=data, return~EMAVG+MOM_MA+MAE_MIDDLE+ROC+SMAVG+cross) summary(model2)

ggpairs

confint(model2)

e<-summary(model2)\$resid

plot(e)

dwtest(model2)

#model3 fear

model3<- lm(data=data, return~EMAVG+cross+MOM_MA+MAE_MIDDLE+ROC+SMAVG) summary(model3)

#working with logarithmic data

library(memisc)

library(dplyr)

library(‘psych’)

library(lmtest)

library(sjPlot)

library(sgof)

library(ggplot2)

library(foreign)

library(car)

library(hexbin)

library(GGally)

library(vcd)

library(devtoolbox)

library(pander)

library(knitr)

library(devtools)

library(sessioninfo)

library(gdata)

a <- read_excel(‘./clean.xlsx’, sheet=1, na=’NA’) glimpse(a)

#residual analysis

library(ggplot2)

linmodel<-lm(return~EMAVG+cross+MOM_MA+MAE_MIDDLE+ROC+SMAVG,data=a) bptest(linmodel,studentize=FALSE)

summary(linmodel)

plot(linmodel, which=1,col=c(“darkblue”)) #Residual plot vs fitted values plot(linmodel, which=2,col=c(“red”)) #Normal probability plot

plot(linmodel,which=3,col=c(“darkgreen”))#Scale-location plot

#more random spread along the plot it shows that the residuals are independent and then we  assume homoskedasticity holds

#equal variance along the regression line

Yinan Chen

Jiahui Zhou

Neel Shah

Julian Moreno