Metadata for Language Data
  • Metadata for Language Data
    • Introduction
    • Participants
  • Classes
    • CollectionEvent
    • CollectionProtocol
    • DataLicense
    • DataDepositLicense
    • DataReuseLicense
  • Properties
    • access
    • accessControlList
    • annotationOf
    • annotationType
    • annotator
    • author
    • authorizationWorkflow
    • channels
    • collectionEventType
    • collectionProtocolType
    • communicationMode
    • compiler
    • consultant
    • dataInputter
    • dateFreeText
    • depositor
    • derivationOf
    • developer
    • doi
    • editor
    • geoJSON
    • hasAnnotation
    • hasCollectionProtocol
    • hasDerivation
    • illustrator
    • indexableText
    • inLanguage
    • interpreter
    • interviewee
    • interviewer
    • isDeIdentified
    • linguisticGenre
    • materialType
    • openAccessIndex
    • participant
    • performer
    • person
    • photographer
    • recorder
    • register
    • researchParticipant
    • researcher
    • responder
    • reviewDate
    • signer
    • singer
    • speaker
    • sponsor
    • subjectLanguage
    • transcriber
    • translator
  • Defined Term Sets
    • AccessTypes
    • AnnotationTypeTerms
    • AuthorizationWorkflows
    • CollectionEventTypeTerms
    • CollectionProtocolTypeTerms
    • CommunicationModeTerms
    • IndexTypes
    • LinguisticGenreTerms
    • MaterialTypes
    • WrittenLanguageTypeTerms
  • Defined Terms
    • AccessControlList
    • AgreeToTerms
    • Annotation
    • AuthorizedAccess
    • AuthorizationByApplication
    • AuthorizationByInvitation
    • Coded
    • DerivedMaterial
    • Dialogue
    • Drama
    • ElicitationTask
    • Formulaic
    • FullText
    • Gestural
    • Gesture
    • Handwritten
    • Informational
    • Interview
    • Lexicon
    • Ludic
    • MaterialSelectionCriteria
    • Narrative
    • OpenAccess
    • Oratory
    • Orthographic
    • PartOfSpeech
    • Phonemic
    • Phonetic
    • Phonological
    • PrimaryMaterial
    • Procedural
    • Prosodic
    • Report
    • SelfAuthorization
    • Semantic
    • Session
    • SignedLanguage
    • Song
    • SpokenLanguage
    • Syntactic
    • Thesaurus
    • Transcription
    • Translation
    • Typeset
    • Typewritten
    • WrittenLanguage
    • WhistledLanguage
  • Relationships
Powered by GitBook
On this page
  • RO-Crate Profiles
  • Entities in the ontology
  1. Metadata for Language Data

Introduction

PreviousMetadata for Language DataNextParticipants

Last updated 10 months ago

This material aims to provide a user's guide to the language data ontology being developed for use in the Language Data Commons of Australia () project (which includes the Australian Text Analytics Platform ()).

is often defined as ‘data about data’. High-quality metadata is important in making data :

  • Findable: Metadata is the starting point for searching . For example, if we want to find data in a particular language, this will only be possible for data that has a language recorded in its metadata.

  • Accessible: that apply to data should be part of the associated metadata.

  • Interoperable: Information about the format of data and whether it requires specific software to be usable should be part of the associated metadata.

  • Reusable: All of the aspects of metadata mentioned above contribute to making data reusable. The more we know about some data, the easier it is to know whether it will be useful to us or not.

RO-Crate Profiles

in general have basic metadata requirements, but it is possible to specify a profile for crates for specific purposes. is developing such a profile for our data; we are basing this largely on previous work in the area. An important aspect of the approach is that it uses the principles of . This means that terms used in our metadata will (whenever possible) link to an openly available definition. In developing the profile, we are drawing on existing attempts to provide vocabularies for describing data, particularly language data.

schema.org

Our general approach is informed by the various kinds of entities recognised in the ontology documented at , which is at least partly based on the framework. In particular, we have adopted high-level entities which are part of the schema.org vocabulary, for example and .

Open Language Archives Community (OLAC)

is an international partnership of institutions and individuals; one of their activities is developing consensus on best current practice for the digital archiving of language resources and this includes making recommendations for metadata. The metadata scheme is based on (DC), a widely used general metadata schema. have suggested refinements and extensions of the DC base which make it more useful for describing language resources.

Entities in the ontology

Classes (rdfs: Class) are used to classify resources. An instance of an rdfs: Class is defined using the predicate rdf: type. For example, we have defined as a class and is an instance of this class. Properties (rdfs: Property) are used to add attributes to classes. Similar to how we define classes, we can define instances of properties to add attributes to statements. In the example from earlier, we can add the property and give it the value . ElicitationTask is a DefinedTerm. A is a 'word, name, acronym, phrase, etc. with a formal definition' and they are 'often used in the context of category or subject classification.' DefinedTerms allow us a) to have accurate definitions of the values we want to give to properties and b) to group such definitions in which can function as controlled vocabularies. In our example, there is a DefinedTermSet which includes the DefinedTerm ElicitationTask.

Here is another example of the relationship between each of these entities:

Level
Example

Class

Annotation

↓

↓

Property

annotationType

↓

↓

Defined Term Set

AnnotationTypeTerms

↓

↓

Defined Terms

Gestural, Phonemic, Phonetic, Phonological, Prosodic, Semantic, Syntactic, Transcription, Translation

Contributors to the current work include: Peter Sefton Simon Musgrave Nick Thieberger Marco La Rosa River Tae Smith Maria Weaver Rosanna Smith

LDaCA
ATAP
Metadata
FAIR
data collections
Access conditions
RO-Crates
LDaCA
RO-Crate
Linked Open Data
schema.org
RDF
CreativeWork
Person
OLAC
OLAC
Dublin Core
OLAC
CollectionProtocol
Man and Tree & Space Games
collectionProtocolType
ElicitationTask
DefinedTerm
DefinedTermSets
CollectionProtocolTypeTerms