1
$\begingroup$

Suppose I have a single time-dependent variable $y_{t}$ (e.g. stock price) and a few hundred independent variables $X_{it}$ with data available for the same time frame as $y_{t}$ (e.g. company revenue, total market sales, interest rates, value of $y_{t-1}$ etc.) I want to identify a model to use for forecasting using this data.

  1. How do I know which of the independent variables to include?

  2. What is the problem with including all the variables?

My superficial ideas for identification are to use AIC/BIC/R^2 comparison between every single combination of variables in a simple ARIMA and build on that(would be thousands of model calculations) or do Granger causality for every $y_t / x_{it}$ pair. There must be an easier way, surely?

asked Nov 14, 2023 at 19:11
$\endgroup$
1
  • 1
    $\begingroup$ Re "every single combination:" there are 2ドル^\text{several hundred}$ such combinations! $\endgroup$ Commented Nov 14, 2023 at 19:17

1 Answer 1

0
$\begingroup$

There has been work on selecting variables using the lasso and similar methods in a time series context; see here for some pointers to literature.

Alternatively, you could also consider a PCA.

answered Nov 14, 2023 at 19:21
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.