TL:DR Will be different for each factor/predictive model using the data. Factors modeling intra-day behavior generally need intraday data. Factors modeling behavior over next 1-10 days generally need daily samples at a minimum. Systematic rule of thumb is that if you want to predict how the world works over the next time step with resolution R, you need at least 30-60 samples at frequency 1/R. For a given factor, test which data works better by computing predictions on both, looking at the residuals between both predictions and the actual outcomes, then picking the ones that are closer and performing an out of sample test to validate.
More info:
Apologies I missed that specific question. I would say that it depends on what you're trying to model. If you're trying to determine what the average state on a day might be, then certainly average state on previous days may be informative. If you're trying to determine close price, then maybe close price may be informative. In general if you're trying to make daily trades, you want enough history to build statistical confidence in your predictions, rule of thumb is at least 30-60 samples. So if you were trying to estimate a trend in daily closes, you'd want to look at the last 30 daily closes to see if a linear regression showed any consistent slope. If you were trying to estimate a trend over the next couple minutes, you'd need the last 30-60 minutes, etc.
More data is almost always better, so having the VWAP in addition to the OHLCV would most likely improve any model trying to forecast price trends. My systematic way is usually the following:
Agnostic of what data is available, do I have any ideas for models that could be used to forecast future returns?
Given this idea, what data would I need to collect to validate the model works?
If this data is available, collect it. If not, is there a very close substitute that we've already shown behaves similarly. If not, then the model is untestable and cannot currently be used.
Again, different models will be more or less sensitive to the use of say OHLCV vs VWAP. I believe, but have not tested, that OHLCV will generally behave fairly similarly, especially when zoomed out to 30-60 days, but may not be good enough. If your holding period were say 1 hour, then you would almost definitely need minutely data to make effective forecasts. If your holding period is a few days, then you can look at the last 30-60 days to get a sense of what's going on in the world. Looking at the behavior of the last several hours may be helpful for some factors, and may not be helpful for others.
Is this helpful? If not I apologize it's Friday evening here and I'm still a bit jet lagged so my brain is fairly burnt out.