Full monorepo: Python backend, Next.js frontend, data pipeline scripts, and model training code.
Pre-trained CharCNN and Transformer models for Etruscan inscription classification. ONNX format, client-side inference.
Complete corpus exported as RDF/Turtle using LAWD, Dublin Core, and GeoSPARQL ontologies (1.6 MB, 4,728 inscriptions).
Apache Jena Fuseki endpoint for querying the RDF corpus. Supports federated queries with other Linked Open Data sources.
Python library for programmatic access to the corpus: normalization, classification, and data export utilities.
Each inscription record contains the following fields. The corpus is distributed as a static JSON file (corpus.json) and as RDF/Turtle (corpus.ttl).
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier (e.g. Cr 2.20, ETP_001) |
canonical | string | Standardized philological transcription |
old_italic | string? | Old Italic Unicode (U+10300 block) |
phonetic | string? | IPA pronunciation |
findspot | string? | Modern provenance name |
findspot_lat | number? | Latitude (WGS 84) |
findspot_lon | number? | Longitude (WGS 84) |
date_approx | number? | Approximate date (negative = BCE) |
date_uncertainty | number? | Date uncertainty (Β± years) |
classification | string? | Epigraphic type (funerary, votive, ownership, β¦) |
medium | string? | Inscription medium (stone, bronze, ceramic) |
object_type | string? | Object typology |
source | string? | Bibliographic source reference |
pleiades_id | string? | Pleiades gazetteer ID |
geonames_id | string? | GeoNames gazetteer ID |
The normalizer converts between five transcription systems used in Etruscan philology. Source-system detection is automatic.
| System | Example | Description |
|---|---|---|
| CIE Standard | MI AVILES | Uppercase, unaccented. Used in the Corpus Inscriptionum Etruscarum. |
| Philological | mi avileΒ·s | Lowercase with diacritics (ΞΈ, Ο, Ο, Ε). Standard in modern Etruscology. |
| Old Italic | ππ ππ
ππππ | Unicode U+10300 block. Faithful to original script direction. |
| IPA | /mi aviles/ | International Phonetic Alphabet rendering. |
| Web-safe | mi aviles | ASCII-only approximation for contexts lacking Unicode support. |
Two neural architectures are available for epigraphic classification. Both operate at the character level (no tokenizer required) and classify inscriptions into 7 epigraphic types. Models are exported as ONNX and run client-side via WebAssembly.
| Model | Parameters | Size | Architecture |
|---|---|---|---|
| CharCNN | ~28K | 111 KB | 1D convolution β max-pool β dense. Fast inference (~5 ms). |
| Transformer | ~300K | 1.2 MB | Character embedding β 2-layer Transformer encoder β classifier head. Higher accuracy on long texts. |
The corpus is published as Linked Open Data following W3C standards. Each inscription is modelled as a lawd:WrittenWork with spatial anchoring via geo:SpatialThing (GeoSPARQL).
corpus.ttl, 1.6 MB)