Example https://www.quantopian.com/posts/new-challenge-build-smart-beta-factors
I probably don't understand something fundamental on how challenge submissions are evaluated. Naively I'd think if all data is known about a past time period, most algorithms can be backfitted to look good for that period, for just about any metric.
How can the jury differentiate between truly valuable factors, and backfitted ones?