Big Data is one of the key assets of the future. However, the cost and effort required for introducing Big Data technology in a value chain is significant.
HOBBIT aims at abolishing the barriers in the adoption and deployment of Big Linked Data by European companies, by means of open benchmarking reports that allow them to assess the fitness of existing solutions for their purposes. These benchmarks are based on data that reflects reality and measures industry-relevant Key Performance Indicators (KPIs) with comparable results using standardized hardware.
Mastering the creation of value from Big Data will enhance European competitiveness, will result in economic growth and jobs and will deliver societal benefit. A key step towards abolishing the barriers to the adoption and deployment of Big Data is to provide European companies with open benchmarking reports that allow them to assess the fitness of existing solutions for their purposes.
However, achieving this goal demands:
- The deployment of benchmarks on data that reflects reality within realistic settings.
- The provision of corresponding industry-relevant key performance indicators (KPIs).
- The computation of comparable results on standardized hardware.
- The institution of an independent and thus bias-free organization to conduct regular benchmarks and provide the European industry with up-to-date performance results.
In one of its key tasks, HOBBIT will continuously collect various datasets (i.e., not limited to specific domains) as the base for benchmarks. Those data will initially be provided by the project industrial partners, and later on by members of the HOBBIT community. A data management plan will be used as a guideline when handling the data submitted by members of the HOBBIT community to the benchmarks.
Each year, we will update the data management plan, to reflect the wishes and needs of the community. In this initial plan, we discuss the envisioned data management lifecycle (how can data be added to the platform, how can it be accessed, and how long will it be kept?), as well as the details of the data management plan as they have been agreed upon by the consortium at this time.
To make the data discoverable and accessible, besides providing the generated benchmarks as dump files that can be loaded from the project repository, HOBBIT will also provide a SPARQL endpoint that will serve all the benchmark datasets. The HOBBIT SPARQL endpoint will enable the platform users to run their own queries against one or more benchmark(s) to obtain tailored benchmark(s) that fit exactly each user needs.
Data Management Lifecycle OverviewFigure 1. Data Management Lifecycle Overview
To keep the dataset submission process manageable, we host an instance of the CKAN open source data portal software, extended with custom metadata fields for the HOBBIT project. For the time being, this instance is hosted at http://hobbit.iminds.be/. When the benchmarking platform goes online, the CKAN instance will be moved there, to accommodate more space for datasets. Users who want to add a dataset of their own, first need to request to be added to an organization on the CKAN instance, after which they can add datasets to this organization http://projec-thobbit.eu/contacts/.
Datasets will be kept available on the HOBBIT platform for at least the lifetime of the project unless they are removed by their owners. After the project, the HOBBIT platform will be maintained by the HOBBIT Association, and so will the datasets. Owners may add or remove a dataset at any time.
We invite all stakeholders to publish data sets that are suitable for benchmarking via the HOBBIT CKAN site.
More about HOBBIT initial data management plan can be found here.
Other Hobbit resources and links.
A blog post concerning our project’s Preliminary Survey Results:
A blog post concerning our mini survey on Benchmarking RDF Query Engines:
A blog post concerning the Call for Papers of http://www.semantic-web-journal.net/
Special issue on Benchmarking Linked Data:
A blog post concerning Versioning for Big Linked Data: approaches and benchmarks
2 Tutorials and 1 workshop @ ISWC16, Kobe, Japan
a) Tutorial on Link Discovery – Algorithms, Approaches, and Benchmarks:
b) Tutorial on SPARQL Querying Benchmarks
c) Workshop on Benchmarking Linked Data (BLINK)