Insights from Publishing Open Data in Industry-Academia Collaboration is a just submitted paper by me P E Strandberg, P Peterseil, J Karoliny, J Kallio, and J Peltola.

When (if) it is accepted, I will provide a link to it.

Tentative Abstract Data is more valuable than oil and a critical success factor in industry-academia collaboration. To learn more about aspects of publishing open data sets from collaborations, this paper explores the motivations and lessons learned when publishing open data sets. Through a survey of participants in a European research project that published 13 data sets, and an analysis of metadata from almost 281 thousand datasets in Zenodo, we collected qualitative and quantitative results on motivations, achievements, research questions, licences and file types. Through inductive reasoning and statistical analysis we found that planning the data collection is essential, and that only few datasets (2.4%) had accompanying scripts for improved reuse. We also found that authors are not well aware of the importance of licences or which licence to choose. Finally, we found that data with a synthetic origin, collected with simulations and potentially mixed with real measurements, can be very meaningful, as predicted by Gartner and illustrated by many datasets collected in our research project.

