Among the obstacles to a more robust culture of research data sharing and preservation, one that stands out to me is the current lack of participation by most researchers. Research data are unlikely to be re-used or repurposed unless they are published (with a lower-case “p”) or made available to a wide audience via posts on websites or deposits into online data repositories. Unfortunately, several studies of researcher behavior show that this is not common practice. But perhaps the dissemination of research data could become more common if data were Published (with an upper-case “p”) or made available to a wide audience with an assurance of their quality, marking datasets as objects of importance and worthy of attention from other researchers, funding agencies, and tenure and promotion committees.
The recent emergence of “data papers” and “data journals” allows for this possibility. A data paper resembles a traditional journal article except that instead of forming an argument or drawing conclusions from data, it provides a detailed description of a dataset, including how the data were collected, processed, and/or analyzed. By my count, there are now at least 60 journals that either exclusively publish data papers (e.g., Geoscience Data Journal, Gigascience) or publish data papers as one article type (e.g., Ecology, BMC Research Notes). These journals include those that publish descriptions of large databases or data collections (e.g., BMC Bioinformatics), computer programs created for research purposes (e.g., Journal of Open Research Software), or computational models (e.g., Geoscientific Model Development). Nearly all data journals are aimed toward the basic and medical sciences, although some cover scientific disciplines that cross into the social sciences and humanities (e.g., Journal of Open Public Health Data, Journal of Open Archeology Data). Adding credence to this new publishing format is the fact that the prestigious Nature Publishing Group is launching a new data journal, Scientific Data, next year (although other Nature forays into new publication formats, such as Nature Precedings, were ultimately not successful).
At the CLIR Postdoctoral Fellow Winter 2013 meeting in Washington DC, data journals were one of the discussion topics in a seminar on data publication. Although some fellows viewed data journals as a promising development in research data sharing and preservation, others expressed concerns about turning data over to publishers and whether the structure of scientific data journals can be transformed to accommodate humanities data.
My opinion is that because data papers fit so nicely into the current scholarly publication-based system of research advancement and career development, they stand a significant chance of being adopted by researchers. Researchers are motivated to get papers published, not to create well-annotated datasets that can be understood by others. There is little incentive for researchers to spend time organizing and describing their datasets for use beyond their own analysis and publication purposes. Data papers can provide this incentive. The publication of data papers necessitates description of the context and content of datasets, which increases the likelihood of data re-use. All existing data journals enforce some degree of peer review of the methodology of data collection, the completeness of the data and accompanying metadata, and/or the potential re-use value of the data. After their publication, data papers can be listed on CVs and cited by other publications in the same way as traditional journal articles, thus serving to enhance both the productivity of researchers and the impact of their work.
Importantly, data papers need not replace the deposit of data into national/international, disciplinary, or institutional repositories. Rather, data papers can complement data repositories. That is, data papers can be viewed as descriptive documents that overlay data files housed in data repositories, providing the metadata or documentation that is often lacking within many data repositories or sharing tools that employ little to no behind-the-scenes curation. Although there are differences among data journals in where the underlying data files are housed, many data journals require that the data be deposited in a trustworthy open repository. This means that when publishing a data paper, researchers do not necessarily need to turn data over to the publisher. Furthermore, nearly all data journals are open access, and most allow authors to retain copyright to their data papers. If in some cases data journals request that authors transfer copyright to the publisher, this copyright would apply only to the data paper and not the underlying data. If researchers make informed decisions about where to submit their data papers and where to deposit their data, there should be no fear of publishers taking control of the data. Instead, the act of publishing data papers can encourage the safekeeping and open dissemination of research data.
Data journals are neither the only nor the best solution to the growing need for research data sharing and preservation. In particular, peer review is a slow process that cannot keep up with the accelerating pace of research data generation. For now, however, data papers could serve to incentivize the sharing of research data and thus catalyze a cultural change toward the open dissemination of research results.