EWIG Data Model

Living Standard,

This version:
https://ewig.zib.de/static/datamodel.html
Editor:
(Zuse Institute Berlin (ZIB))

Abstract

Definition of the data model and vocabularies used within the EWIG digital preservation system.

1. Data Model for EWIG

1.1. Introduction

This document describes the data model for the EWIG long-term preservation system. The data model follows the Information Model as described in the reference model for an Open archival information system (OAIS; ISO 14761:2012). To distinguish EWIG terms from terms and entities used and defined in the OAIS, the latter will be in italics.

The purpose of the project is to preserve usability and utility of information over the long term.

Publisher: Digital Preservation Working Group (Zuse Institute Berlin)

License: CC0

1.1.1. Namespaces

Namespace Prefix Namespace URI
ewig http://ewig.zib.de/ontologies/ewig#
ewigvocab http://ewig.zib.de/ontologies/vocab/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
owl http://www.w3.org/2002/07/owl#
skos http://www.w3.org/2004/02/skos/core#
dct http://purl.org/dc/terms/
schema http://schema.org/
pcdm http://pcdm.org/models#
pcdmrights http://pcdm.org/rights#
ore http://www.openarchives.org/ore/terms/
edm http://www.europeana.eu/schemas/edm/
premis http://www.loc.gov/premis/rdf/v1
premis3 http://www.loc.gov/premis/rdf/v3
sh http://www.w3.org/ns/shacl#
xsd http://www.w3.org/2001/XMLSchema#

2. Classes

2.1. Information Packages (ewig:InformationPackage)

The Information Package in the data model follows the definition 4.2.2.1 in OAIS: „The conceptual structure for supporting Long Term Preservation of information is the Information Package. An Information Package is a container that contains two types of Information Objects, the Content Information and the Preservation Description Information (PDI);[...]“. The OAIS Package Description is modelled through the aggregation of data from the SubmissionManifest and one or more Information Objects (which ones?)

Information Packages aggregate Information Objects and serve as containers during the different stages within the preservation workflow.

Information Packages of type TransferAggregation can aggregate *Submission or Archival* Information Packages. Submission or Archival Information Packages aggregate Information Objects comprising Content and Preservation Description Information.

Information Packages MUST include a RightsStatement as fallback statement for the Access Functional Entity and the SubmissionManifest as record fort he Administration Functional Entity.

Information Packages also include status messages for API access.

2.1.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type pcdm:Collection
ewig:InformationPackage
0..n Subclass of ore:Aggregation.
ewig:submissionManifest ewig:SubmissionManifest 0..1 Submission metadata is mandatory for all IPs.
dct:identifier Literal (String) 0..1

is being auto-generated by API:

- for Transfer Aggregations: <contractId>,<submissionName>

- for Information Packages: <contractId>,<submissionName>,<ieName>

ewig:type ewigvocab:packagetype# 1 TransferAggregation (TA), Submission (SIP), Archival (AIP), ArchivalCollection (AIC).
ewig:memberOfAIC ewig:InformationPackage 0..1 TAs/SIPs/AIPs can be a member of an AIC.
ewig:memberOfTA ewig:InformationPackage 0..1 SIPs/AIPs can be a member of a TA.
owl:sameAs URI to Fedora Resource 0..1 Links to Package equivalent in Fedora.
ewig:status ewigvocab:status# 0..1 Status for API.
ewig:statusMessage Literal (String) 0..1 Optional message contextualizing API Status.
ewig:stage ewigvocab:stage# 0..1 Workflow stage.
skos:prefLabel Literal (String) 0..1 Optional label/title for package for Dashboard. Identifier will be used if absent.
dct:description Literal (String) 0..1 Optional description
dct:rights ewig:RightsStatement 1 Mandatory rights information in every package.
dct:dateAccepted xsd:dateTime 0..1 Package processing has finished successfully.
dct:dateSubmitted xsd:dateTime 0..1 Package received.
ewig:archivematicaUuid Literal 0..1 Archivematica UUID of an AIP.
premis3:size Literal(xs:decimal) 0..1 Size of InformationPackage at the time of ingest in bytes. This is mandatory after ingest.
ewig:callbackStatus Literal (String) 0..1 Notification Status (Date / Response to external API)
ewig:publisherUri URI to ewig:Agent (Organization) 1 EWIG-URI of SubmittingOrganization

2.2. Information Objects (ewig:InformationObject)

Information Objects follow the definition in 4.2.1.1 of the OAIS Reference Model. The Physical Object specialization of the Data Object is modelled as edm:aggregatedCHO.

Information Objects are categorized (ewig:use) according to a vocabulary (ewigvocab:use\#) including IE (PREMIS Intellectual Entity), PDI, SubmissionDocumentation, Transcripts (OCR/TEI), Service/Intermediate files and so on.

Information Objects MUST be memberOf a single Information Package within EWIG and MAY contain one or more files (Digital Data Objects).

ObjectPreservationTypes define sets of significant properties through another EWIG ontology for IEs which might come into being in the future.

2.2.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type

pcdm:Object

ewig:InformationObject

2..n Subclass of ore:Aggregation.
pcdm:memberOf ewig:InformationPackage 1 Every InformationObjects MUST belong to one package.
pcdm:hasFile pcdm:File 0..n
ewig:objectPreservationType Literal (String) 0..n Signifies significant properties to be preserved. Not used yet.
skos:prefLabel Literal (String) 0..1 Optional label/title for object for Dashboard. Use# will be used if absent
dct:description Literal (String) 0..1 Optional description.
edm:aggregatedCHO edm:ProvidedCHO 1 Description of IE. See Europeana Mapping Guidelines 2.3: http://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_Documentation/EDM_Mapping_Guidelines_v2.3_112016.pdf
ewig:structure ore:Proxy 0..n Optional structuring information of object.
dct:rights ewig:RightsStatement 0..1 Rights statement overrides package rights.
ewig:use ewigvocab:use# 1 Categorizes files contained in object according to usage in LTDPS.

2.3. SubmissionManifest (ewig:SubmissionManifest)

The SubmissionManifest contains the administrative (including rights) information for Administration Functional Entity. Semantics are according to the submission agreement...

Every SubmissionManifest MUST include a reference to a Contract.

2.3.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type ewig:SubmissionManifest 1
dct:identifier Literal (String) 0..1 SubmissionName
ewig:submissionManifestVersion Literal (String) 1 Version of submission-manifest as given at time of delivery
dct:isPartOf Literal (String) 1 Name of SubmissionSet (AIC)
dct:accrualPolicy ewig:Contract 1 URI of Contract
dct:publisher Literal (String) 1 SubmittingOrganization as given at time of delivery
ewig:publisherUri URI to ewig:Agent (Organization) 1 EWIG-URI of SubmittingOrganization
dct:creator Literal (String) 1 Contact (Responsible Person) as given at time of delivery
ewig:creatorUri URI to ewig:Agent (Person) 1 EWIG-URI of Contact
dct:contributor Literal (String) 1 TransferCurator as given at time of delivery
ewig:contributorUri URI to ewig:Agent (Person) 1 EWIG-URI of TransferCurator Resource
ewig:metadataFile Literal (String) 1 Metadata File
ewig:metadataFileFormat URI 1 Metadata File Format
ewig:dataSourceSystem Literal (String) 0..1 System where data originates from.
skos:prefLabel Literal (String) 0..1 Optional label/title.
dct:description Literal (String) 0..1 Optional SubmissionDescription
ewig:callbackParams Literal (String) 0..1 Parameters for Transfer-/Ingest-Status Responses via callback URLs

URL-Template: http://host/api.endpoint/?param1=<params>

< > will be replaced with the following parameters:

< code>: short success or error code

< message>: longer explanation of success or error condition

< ewig_id>: ewig identifier of information package

2.4. Agent (ewig:Agent)

The Agent contains information about Persons or Organizations relevant to administrative workflows. They do not act as premis:Agent within the PDI. An Agent cannot be a Person and an Organisation simultaneously.

2.4.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type

dct:Agent

premis:Agent

schema:Person|Organization

1..n Premis or dct agents allow software agents.
dct:identifier ISIL... ORCID? 1 (Organization), 0..1 (Person) Unique identifier for agent within EWIG. Use ISIL or ORCID if available.
schema:name Literal (String) 1 (Organization), 0 (Person) Name of organization.
schema:alternateName Literal (String) 0..1 Optional alternative or abbreviation. Will be displayed within ()s in dashboard.
schema:email Literal (String) 0..1 Personal or functional email-address.
schema:familyname Literal (String) 0 (Organization), 1 (Person)
schema:givenname Literal (String) 0 (Organization), 1 (Person)
schema:honorificPrefix Literal (String) 0 (Organization), 1 (Person) Prof./Dr./...
schema:honorificSuffix Literal (String) 0 (Organization), 1 (Person) Phd./MA/MDB/...
schema:affiliation ewig:Agent (Organization)
Literal (String)
0..1 Parent organization or organization the person is loosely affiliated with at the time of recording. For work relation use worksFor.
schema:jobTitle Literal (String) 0 (Organization), 1 (Person) Administrative or functional role within organization regarding data submissions.
schema:worksFor ewig:Agent (Organization) 0 (Organization), 1 (Person) Employer (Organization).
skos:prefLabel Literal (String) 0..1 Optional label/title.
dct:description Literal (String) 0..1 Optional description/comments.
ewig:login Literal (String) 0..1 (Organization),
0 (Person)
Transfer-Server Login of Organization

2.5. RightsStatement (ewig:RightsStatement)

RightsStatements MUST include a rights declaration, a rights holder if applicable, licensing information (including PublicDomainMark). The semantics of accessRights are according tot he submission agreement. If certain reuse restrictions cannot be expressed through rights and license alone, a human readable legal note can be used in description. Embargos are modelled through pcdm:rightsOverride.

If not explicitly stated RightsStatements will be inherited through the hierarchy.

RightsStatements take preference over each other though the hierarchy bottom up (pcdm:File, InformationObject, InformationPackage) except for pcdm:rightsOverride.

2.5.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type dct:RightsStatement 1
dct:rights URI (rightsstatements.org) 1 In Copyright, PublicDomain etc.
dct:license URI (creativecommons.org ...) 0..1 Permission to use restrictions/licenseinformation or rights reserved.
dct:accessRights ewigvocab:rightsScope# 1 Potentially available tot he public or restricted to access by the submitting institution.
dct:rightsHolder Literal (String) oder ewig:Agent 0..n Owner (legal body) of intellectual property/data that is entitled to select license.
pcdmrights:rightsOverride ewigvocab:rightsScope# 0..1 Embargo scope. I.e. access rights „Institution“ until expiration.
pcdmrights:rightsOverrideExpiration xsd:dateTime 0..1 if rightsOverride Embargo end.
skos:prefLabel Literal (String) 0..1 Human readable (not necessarily understandable) expression for display.
dct:description Literal (String) 0..1 if rights NoC-CR Can be used to express restrictions in case of „Out of copyright – Contractual restrictions“ or information/instructions how to get permission in case of rights reserved.

2.6. Contract (ewig:Contract)

Contract MUST contain an identifier and information about contract length and size of storage (in Bytes).

2.6.1. Property Usage

Property Expected Object Type (Range) Cardinality Scope Note
rdf:type dct:Policy 1
dct:identifier Literal (String) 1 Contract Number
dct:contributor ewig:Agent (Organization) 1..n Contracting party other than ZIB.
schema:startDate xsd:dateTime 1
schema:endDate xsd:dateTime 0..1 Optional if open-ended.
premis3:size Literal(xs:decimal) 1 Net storage allowance in bytes. -1 if unlimited.
skos:prefLabel Literal (String) 0..1
dct:description Literal (String) 0..1

##

2.7. File (pcdm:File)

2.7.1. Property Usage

Property Expected Object Type Range/Cardinality Scope Note
rdf:type

pcdm:File
ewig:use

ewigvocab:use#

1..n Usage per pcdm:use# or ewigvocab:use# subclasses.
skos:prefLabel Literal (String) 0..1
dct:description Literal (String) 0..1
premis: ...

2.8. Vocabularies

An ontology has been developed in RDF, RDFS and OWL to provide us with terms where no suitable existing vocabulary term existed.

2.8.1. ewigvocab:packagetype#

There are four types of Information Packages within EWIG: TransferAggregations, SIP, AIP, AIC. DIPs are not relevant for this data model.

Label Scope Note
TA Transfer Aggregation
SIP Submission Information Package
AIP Archival Information Package
AIC Archival Information Collection

2.8.2. ewigvocab:rightsScope#

Label Scope Note
public Everyone/the public is allowed to access.
institution Only submitting institution is allowed to access.
license License determines access (open/closed).

2.8.3. ewigvocab:stage#

Different stages an Information Package can pass through. Will be reported by the API.

Label Scope Note
quarantine Information Package (TransferAggregation) has been (logically) created and is in the process of transferring to a storage area. Archive hasn’t done anything yet.
pre-ingest IP has been transferred successfully and is in the process of being prepared for ingest into the Archive.
backlog An SIP has been prepared for ingest and is waiting for Ingest.
ingest SIP is going through the ingest workflow.
storage An AIP has been created and stored.

2.8.4. ewigvocab:status#

Status of Information Packages within the different stages. Semantics depend on stage. Will be reported by the API.

Label Scope Note
incomplete Stage is unable to proceed due to incomplete data.
success Stage has been completed without errors.
failed Stage has been terminated due to unrecoverable errors.
interrupted Stage is halted for (manual) data checks.
deleted IP has been deleted.
deleted IP has been deleted.

2.8.5. ewigvocab:use#

Label Scope Note
submissionDocumentation Contextual information from the Producer. Not actively monitored within the LTDPS.
intellectualEntity Primary Content Information. Focus of Preservation Actions.
preservationDescription Preservation Description Information enabling Management and Preservation Watch and Actions.
preservationDerivative Normalized/migrated derivative as new preservation master file
metadataContainer Metadata container files