Big data is a widely used term referred to the massive accumulation of data that can not be analyzed using traditional tools. It si an increasingly expanded phenomenon affecting many fields such as health care, employment, economic productivity, crime, security, or resource management. It is not easy to manage all these data. The process involves different steps as capture, storage, search, sharing, analysis and visualization.
The problem of managing huge amounts of data also appears in many scientific research fields. High performance computing is one of the pillars to accelerate materials discovery and development in many fields of science and engineering, most prominently chemistry, physics and related areas. The volume of information that is generated daily, coming from the results of scientific calculations is increasing exponentially and researchers need to have tools to manage it. It is also a problem to access the data generated. Most of the times, even if the work has been published, the results of scientific research projects are difficult or even impossible to process due to the high degree of diversity in the data formats that requires the definition of standards.
In the search for solutions to this problem, researchers at ICIQ and at URV have developed the ioChem-BD platform, a multi-headed tool aimed at managing large volumes of computational chemistry results from a diverse group of already common simulation packages. The platform automates the extraction of relevant data and its conversion into fully tagged information in a distributed database. It provides tools for the researcher to validate, enrich, publish and share information, and tools in the cloud to access it and view it.
“We have developed a useful tool with the intention of it becoming a standard. We already have a data collection, but our idea is that other research groups, once they have published their work, incorporate their data to the platform. Thus results of published research projects will be available almost immediately” -says professor Carles Bo, leading researcher in this project.
The initial design of the platform has been made according to the demands of computational chemistry. But once it is established, it could be applied to any field in which large amounts of data are generated.
Managing the Computational Chemistry Big Data Problem: The ioChem-BD Platform
M. Álvarez-Moreno, C. de Graaf, N. López, F. Maseras, J. M. Poblet, C. Bo
J. Chem. Inf. Model., DOI: 10.1021/ci500593j
ioChem-BD on media