School of Informatics

Text mining wish list for the culture sector

Research Fellow Dr Beatrice Alex has produced a wish list for galleries, libraries, archives and museums who are interested in sharing their data.

Issues like access, licencing and the the ability of computer systems in different organisations to exchange and make use of information can limit analysis of cross-sector trends.

Blogging for Europeana Research, which aims to liberate cultural heritage for use in research, Dr Alex says,

“Although galleries, libraries, archives and museums - the GLAM sector - may be interested in making their data available for text mining, they might be worried that the data they carefully curated over many years will be copied and misused for things that they did not anticipate.

“Assuring them that their data is safe and that we will not release it to the public is very important. Our aim is merely to identify patterns in the data, usually ones related to a particular hypothesis and domain in question.”

Funding implications

“Applications for research funding are always much stronger if we can provide evidence for being able to work with a given dataset. If GLAMs are interested in sharing their available datasets for text mining, they need to be proactive in publicising and explaining how to get hold of them.”

While making the data available is more vital than determining its optimal format, it is helpful if GLAMs provide information on what a collection or dataset contains (metadata, content, size, format), as well as a mechanism to share the data easily.

The full list of tips for potential GLAM data sharers may be found in Dr Alex’s blog post.

Useful links

Blog post

Dr Alex’s web page