What is the expected size of data sets submitted to the repository?