EWIG Data Model

Living Standard,

This version:
https://ewig.zib.de/datamodel.html
Editors:
(Zuse Institute Berlin (ZIB))

Abstract

Definition of the data model and vocabularies used within the EWIG digital preservation system.

1. Data Model for EWIG

1.1. Introduction

This document describes the data model for the EWIG long-term preservation system. The data model follows the Information Model as described in the reference model for an Open archival information system (OAIS; ISO 14761:2012). To distinguish EWIG terms from terms and entities used and defined in the OAIS, the latter will be in italics.

The purpose of the project is to preserve usability and utility of information over the long term.

The following is the proposed data model for the relationship between Information Packages, Information Objects and Digital Files, along with a data model for storing the Concepts Schemes (term lists and subject headings) and Ontologies which the project will generate. Literal strings of cardinality 1 can be repeated only for language variants (@“lang“).

At present this model does not make any use of LDPs basic, direct or indirect containers.

1.1.1. Assumptions and Motivations

The model does not include RDF triples automatically added by Fedora.

1.1.2. Namespaces

Namespace Prefix Namespace URI
ewig http://ewig.zib.de/ontologies/ewig#
ewigvocab http://ewig.zib.de/ontologies/vocab/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
owl http://www.w3.org/2002/07/owl#
skos http://www.w3.org/2004/02/skos/core#
dcterms http://purl.org/dc/terms/
schema http://schema.org/
pcdm http://pcdm.org/models#
pcdmrights http://pcdm.org/rights#
ore http://www.openarchives.org/ore/terms/
edm http://www.europeana.eu/schemas/edm/
premis http://www.loc.gov/premis/rdf/v1
premis3 http://www.loc.gov/premis/rdf/v3

1.1.3. Identifiers

Identifiers will be opaque.

2. Classes

2.1. Information Packages (ewig:InformationPackage)

The Information Package in the data model follows the definition 4.2.2.1 in OAIS: „The conceptual structure for supporting Long Term Preservation of information is the Information Package. An Information Package is a container that contains two types of Information Objects, the Content Information and the Preservation Description Information (PDI);[...]“. The OAIS Package Description is modelled through the aggregation of data from the SubmissionManifest and one or more Information Objects (which ones?)

Information Packages aggregate Information Objects and serve as containers during the different stages within the preservation workflow.

Information Packages of type TransferAggregation can aggregate *Submission or Archival* Information Packages. Submission or Archival Information Packages aggregate Information Objects comprising Content and Preservation Description Information.

Information Packages MUST include a RightsStatement as fallback statement for the Access Functional Entity and the SubmissionManifest as record fort he Administration Functional Entity.

Information Packages also include status messages for API access.

2.1.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type pcdm:Collection
ewig:InformationPackage
2..n Subclass of ore:Aggregation.
ewig:submissionManifest ewig:SubmissionManifest 1 Submission metadata is mandatory for all IPs.
dcterms:identifier Literal (String) 0..1

is being auto-generated by API:

- for Transfer Aggregations: <contractId>,<submissionName>

- for Information Packages: <contractId>,<submissionName>,<ieName>

ewig:type ewigvocab:packagetype# 1 TransferAggregation (TA), Submission (SIP), Archival (AIP), ArchivalCollection (AIC).
pcdm:hasMember ewig:InformationPackage 0..n TAs/AICs contain other InformationPackages. SIPs/AIPs do not.
pcdm:hasMember ewig:InformationObject 0..n SIPs/AIPs contain InformationObjects. TA/AIC do not.
pcdm:memberOf ewig:InformationPackage 0..1 If SIP/AIP/AIC.
ewig:status ewigvocab:status# 1 Status for API.
ewig:statusMessage Literal (String) 0..1 Optional message contextualizing API Status.
ewig:stage ewigvocab:stage# 1 Workflow stage.
skos:prefLabel Literal (String) 0..1 Optional label/title for package for Dashboard. Identifier will be used if absent.
dcterms:description Literal (String) 0..1 Optional description
dcterms:rights ewig:RightsStatement 1 Mandatory rights information in every package.
dcterms:dateAccepted W3CDTF 0..1 Package processing has finished successfully.
dcterms:dateSubmitted W3CDTF 0..1 Package received.
ewig:submissionSize Literal 0..1 Size of InformationPackage at time of ingest. Value is human-readable version of bytes in SI-prefix form (i.e. 14MB or 2 GB).
premis3:size Literal(xs:long) 0..1 Size of InformationPackage at the time of ingest in bytes. This is mandatory after ingest.
ewig:ieName Literal (String) 0 (TA); 1 (SIP/AIP) Name of Intellectual Entity
ewig:callbackStatus Literal (String) 0..1 Notification Status (Date / Response)

2.2. Information Objects (ewig:InformationObject)

Information Objects follow the definition in 4.2.1.1 of the OAIS Reference Model. The Physical Object specialization of the Data Object is modelled as edm:aggregatedCHO.

Information Objects are categorized (ewig:use) according to a vocabulary (ewigvocab:use\#) including IE (PREMIS Intellectual Entity), PDI, SubmissionDocumentation, Transcripts (OCR/TEI), Service/Intermediate files and so on.

Information Objects MUST be memberOf a single Information Package within EWIG and MAY contain one or more files (Digital Data Objects).

ObjectPreservationTypes define sets of significant properties through another EWIG ontology for IEs which might come into being in the future.

2.2.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type

pcdm:Object

ewig:InformationObject

2..n Subclass of ore:Aggregation.
pcdm:memberOf ewig:InformationPackage 1 Every InformationObjects MUST belong to one package.
pcdm:hasFile pcdm:File 0..n
ewig:objectPreservationType Literal (String) 0..n Signifies significant properties to be preserved. Not used yet.
skos:prefLabel Literal (String) 0..1 Optional label/title for object for Dashboard. Use# will be used if absent
dcterms:description Literal (String) 0..1 Optional description.
edm:aggregatedCHO edm:ProvidedCHO 0..1 Description of IE. See Europeana Mapping Guidelines 2.3: http://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_Documentation/EDM_Mapping_Guidelines_v2.3_112016.pdf
ewig:structure ore:Proxy 0..n Optional structuring information of object.
dcterms:rights ewig:RightsStatement 0..1 Rights statement overrides package rights.
ewig:use ewigvocab:use# 1 Categorizes files contained in object according to usage in LTDPS.

2.3. SubmissionManifest (ewig:SubmissionManifest)

The SubmissionManifest contains the administrative (including rights) information for Administration Functional Entity. Semantics are according to the submission agreement...

Every SubmissionManifest MUST include a reference to a Contract.

2.3.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type ewig:SubmissionManifest 1
dcterms:identifier Literal (String) 1 SubmissionName
dcterms:publisher Literal (String) 1 SubmittingOrganization as given at time of delivery
ewig:publisherUri URI to ewig:Agent (Organization) 1 EWIG-URI of SubmittingOrganization
dcterms:accrualPolicy ewig:Contract 1 Contract
dcterms:creator Literal (String) 1 Contact (Responsible Person) as given at time of delivery
ewig:creatorUri URI to ewig:Agent (Person) 1 EWIG-URI of Contact
dcterms:contributor Literal (String) 1 TransferCurator as given at time of delivery
ewig:contributorUri URI to ewig:Agent (Person) 1 EWIG-URI of TransferCurator Resource
ewig:metadataFile Literal (String) 1 Metadata File
ewig:metadataFileFormat URI 1 Metadata File Format
ewig:dataSourceSystem Literal (String) 0..1 System where data originates from.
skos:prefLabel Literal (String) 0..1 Optional label/title.
dcterms:description Literal (String) 0..1 Optional SubmissionDescription
ewig:callbackParams Literal (String) 0..1 Parameters for Transfer-/Ingest-Status Responses via callback URLs

URL-Template: http://host/api.endpoint/?param1=<params>

< > will be replaced with the following parameters:

< code>: short success or error code

< message>: longer explanation of success or error condition

< ewig_id>: ewig identifier of information package

2.4. Agent (ewig:Agent)

The Agent contains information about Persons or Organizations relevant to administrative workflows. They do not act as premis:Agent within the PDI. An Agent cannot be a Person and an Organisation simultaneously.

2.4.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type

dcterms:Agent

premis:Agent

schema:Person|Organization

1..n Premis or dcterms agents allow software agents. How to record tbd.
dcterms:identifier ISIL... ORCID? 1 (Organization), 0..1 (Person) Unique identifier for agent within EWIG. Use ISIL or ORCID if available.
schema:name Literal (String) 1 (Organization), 0 (Person) Name of organization.
schema:alternateName Literal (String) 0..1 Optional alternative or abbreviation. Will be displayed within ()s in dashboard.
schema:email Literal (String) 1 Personal or functional email-address.
schema:familyname Literal (String) 0 (Organization), 1 (Person)
schema:givenname Literal (String) 0 (Organization), 1 (Person)
schema:honorificPrefix Literal (String) 0 (Organization), 1 (Person) Prof./Dr./...
schema:honorificSuffix Literal (String) 0 (Organization), 1 (Person) Phd./MA/MDB/...
schema:affiliation ewig:Agent (Organization)
Literal (String)
0..1 (Organization)
0..1 (Person)
Parent organization or organization the person is loosely affiliated with at the time of recording. For work relation use worksFor.
schema:jobTitle Literal (String) 0 (Organization), 1 (Person) Administrative or functional role within organization regarding data submissions.
schema:worksFor ewig:Agent (Organization) 0 (Organization), 1 (Person) Employer (Organization).
skos:prefLabel Literal (String) 0..1 Optional label/title.
dcterms:description Literal (String) 0..1 Optional description/comments.
ewig:login Literal (String) 0..1 (Organization),
0 (Person)
Transfer-Server Login of Organization

2.5. RightsStatement (ewig:RightsStatement)

RightsStatements MUST include a rights declaration, a rights holder if applicable, licensing information (including PublicDomainMark). The semantics of accessRights are according tot he submission agreement. If certain reuse restrictions cannot be expressed through rights and license alone, a human readable legal note can be used in description. Embargos are modelled through pcdm:rightsOverride.

If not explicitly stated RightsStatements will be inherited through the hierarchy.

RightsStatements take preference over each other though the hierarchy bottom up (pcdm:File, InformationObject, InformationPackage) except for pcdm:rightsOverride.

2.5.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type dcterms:RightsStatement 1
dcterms:rights URI (rightsstatements.org) 1 In Copyright, PublicDomain etc.
dcterms:license URI (creativecommons.org ...) 1 Permission to use restrictions/licenseinformation or rights reserved.
dcterms:accessRights ewigvocab:rightsScope# 1 Potentially available tot he public or restricted to access by the submitting institution.
dcterms:rightsHolder Literal (String) oder ewig:Agent 0(..n?) Owner (legal body) of intellectual property/data that is entitled to select license.
pcdmrights:rightsOverride ewigvocab:rightsScope# 0..1 Embargo scope. I.e. access rights „Institution“ until expiration.
pcdmrights:rightsOverrideExpiration W3CDTF 0..1 if rightsOverride Embargo end.
skos:prefLabel Literal (String) 0..1 Human readable (not necessarily understandable) expression for display.
dcterms:description Literal (String) 0..1 if rights NoC-CR Can be used to express restrictions in case of „Out of copyright – Contractual restrictions“ or information/instructions how to get permission in case of rights reserved.

2.6. Contract (ewig:Contract)

Contract MUST contain an identifier and information about contract length and size of storage (in Bytes).

2.6.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type dcterms:Policy 1
dcterms:identifier Literal (String) 1 Contract Number
dcterms:contributor ewig:Agent 1..n Contracting party other than ZIB.
schema:startDate W3CDTF 1
schema:endDate W3CDTF 0..1 Optional if open-ended.
premis3:size Literal(xs:long) 1 Net storage allowance in bytes.
skos:prefLabel Literal (String) 0..1
dcterms:description Literal (String) 0..1

##

2.7. File (pcdm:File)

2.7.1. Property Usage

Property Expected Object Type Range/Cardinality Scope Note
rdf:type

pcdm:File
pcdm:use#

ewigvocab:use#

2..n(?) Usage per pcdm:use# or ewigvocab:use# subclasses.
skos:prefLabel Literal (String) 0..1
dcterms:description Literal (String) 0..1
premis: ...

2.8. Vocabularies

An ontology has been developed in RDF, RDFS and OWL to provide us with terms where no suitable existing vocabulary term existed.

2.8.1. ewigvocab:packagetype#

There are four types of Information Packages within EWIG: TransferAggregations, SIP, AIP, AIC. DIPs are not relevant for this data model.

Label Scope Note
TA Transfer Aggregation
SIP Submission Information Package
AIP Archival Information Package
AIC Archival Information Collection

2.8.2. ewigvocab:rightsScope#

Label Scope Note
public Everyone/the public is allowed to access.
institution Only submitting institution is allowed to access.
license License determines access (open/closed).

2.8.3. ewigvocab:stage#

Different stages an Information Package can pass through. Will be reported by the API.

Label Scope Note
quarantine Information Package (TransferAggregation) has been (logically) created and is in the process of transferring to a storage area. Archive hasn’t done anything yet.
pre-ingest IP has been transferred successfully and is in the process of being prepared for ingest into the Archive.
backlog An SIP has been prepared for ingest and is waiting for Ingest.
ingest SIP is going through the ingest workflow.
storage An AIP has been created and stored.

2.8.4. ewigvocab:status#

Status of Information Packages within the different stages. Semantics depend on stage. Will be reported by the API.

Label Scope Note
incomplete Stage is unable to proceed due to incomplete data.
success Stage has been completed without errors.
failed Stage has been terminated due to unrecoverable errors.
processing Stage is processing data.
deleted IP has been deleted.

2.8.5. ewigvocab:use#

Label Scope Note
submissionDocumentation Contextual information from the Producer. Not actively monitored within the LTDPS.
intellectualEntity Primary Content Information. Focus of Preservation Actions.
preservationDescription Preservation Description Information enabling Management and Preservation Watch and Actions.
preservationDerivative Normalized/migrated derivative as new preservation master file
metadataContainer Metadata container files