Documentation

Resources

Corpus Schema

Each inscription record contains the following fields. The corpus is distributed as a static JSON file (corpus.json) and as RDF/Turtle (corpus.ttl).

FieldTypeDescription
idstringUnique identifier (e.g. Cr 2.20, ETP_001)
canonicalstringStandardized philological transcription
old_italicstring?Old Italic Unicode (U+10300 block)
phoneticstring?IPA pronunciation
findspotstring?Modern provenance name
findspot_latnumber?Latitude (WGS 84)
findspot_lonnumber?Longitude (WGS 84)
date_approxnumber?Approximate date (negative = BCE)
date_uncertaintynumber?Date uncertainty (Β± years)
classificationstring?Epigraphic type (funerary, votive, ownership, …)
mediumstring?Inscription medium (stone, bronze, ceramic)
object_typestring?Object typology
sourcestring?Bibliographic source reference
pleiades_idstring?Pleiades gazetteer ID
geonames_idstring?GeoNames gazetteer ID

Script Systems

The normalizer converts between five transcription systems used in Etruscan philology. Source-system detection is automatic.

SystemExampleDescription
CIE StandardMI AVILESUppercase, unaccented. Used in the Corpus Inscriptionum Etruscarum.
Philologicalmi avileΒ·sLowercase with diacritics (ΞΈ, Ο†, Ο‡, Ε›). Standard in modern Etruscology.
Old ItalicπŒŒπŒ‰ πŒ€πŒ…πŒ‰πŒ‹πŒ„πŒ”Unicode U+10300 block. Faithful to original script direction.
IPA/mi aviles/International Phonetic Alphabet rendering.
Web-safemi avilesASCII-only approximation for contexts lacking Unicode support.

Classifier Architecture

Two neural architectures are available for epigraphic classification. Both operate at the character level (no tokenizer required) and classify inscriptions into 7 epigraphic types. Models are exported as ONNX and run client-side via WebAssembly.

ModelParametersSizeArchitecture
CharCNN~28K111 KB1D convolution β†’ max-pool β†’ dense. Fast inference (~5 ms).
Transformer~300K1.2 MBCharacter embedding β†’ 2-layer Transformer encoder β†’ classifier head. Higher accuracy on long texts.

Linked Open Data

The corpus is published as Linked Open Data following W3C standards. Each inscription is modelled as a lawd:WrittenWork with spatial anchoring via geo:SpatialThing (GeoSPARQL).

  • Ontologies: LAWD, Dublin Core, GeoSPARQL, SKOS
  • Gazetteers: 41 findspots aligned to Pleiades, 17 to GeoNames
  • Format: RDF/Turtle (corpus.ttl, 1.6 MB)
  • Endpoint: Apache Jena Fuseki (SPARQL 1.1)

Licence

  • Code: MIT License
  • Data: CC0 1.0 Universal (Public Domain)
  • Models: Apache 2.0