To save this page as a PDF, click this button and choose the PDF destination.

An introduction to Triplestores and the SPARQL Query Language for RSEs

12:00 - 12:30 Tuesday, 3rd September, 2024

G.06 (Theatre with grouped seating)

Presentation type Walkthrough

Camilla Eldridge


151 An introduction to Triplestores and the SPARQL Query Language for RSEs

Chris Wood ORCID iD
University of Edinburgh, United Kingdom

Event Type

Walkthrough

Abstract

Triplestores are a form of graph database designed to store highly structured web-accessible data. They are widely used to store data that can be described in a consistent manner. The underlying data structure used in triplestores is also used to describe controlled vocabularies, such as DCAT - a vocabulary used to describe data catalogues. Knowledge of the structure of triplestores, and the related use of controlled vocabularies to describe datasets, increases an RSE's set of tools that can be used to publish, access, and understand data that that follows recognised standards. Triplestores and controlled vocabularies are key technologies in helping data become more Findable, Accessible, Interoperable, and Reusable (FAIR). 

This walkthrough will briefly introduce the technology stack generally used for hosting triplestores as well as the underlying data structure that is required to store data in a triplestore. I will then demonstate the power of the query language used for triplestores, SPARQL. I will exemplify the use of SPARQL via a range of SPARQL endpoints that exist in the wild across several subject domains, including the triplestore representation of the structured elements of Wikipedia, as well as scientific and geospatial examples providing access to large, diverse datasets. I will show some of the more advanced features of SPARQL, and how these compare with other APIs which may be more familiar to the audience. Finally, I will discuss the idea of federated querying, where a single query can be used to retrieve data from different databases distributed over a network.

Prerequisites

This walkthrough will be designed for the Novice level; as such, no prerequisite technical or domain knowledge will be assumed. The walkthrough will include a short introduction to the underlying theoretical concepts.

Outcomes

Attendees will acquire knowledge that will be useful to any RSE who is either involved in publishing data, or who needs to interact with published datasets. Many data catalogues adhere to the DCAT standard, and so a general understanding of the principles of controlled vocabularies, particularly where a data catalogue has a publicly-accessible SPARQL endpoint, will help query data more effectively. For RSEs who need to publish data, the walkthrough will provide an appreciation of the standards around the FAIRness of data, as both DCAT and the SPARQL Query Language are World Wide Web Consortium (W3C) Standards.

In-Person or Online Delivery

In-Person