Manifesto

The Etruscan language is one of the least-documented languages of the ancient Mediterranean, yet it left behind thousands of inscriptions scattered across museums, publications, and archaeological sites. We believe these inscriptions belong to everyone.

Why this project exists

Existing corpora of Etruscan epigraphy, notably the Corpus Inscriptionum Etruscarum (CIE), the Etruskische Texte(ET), and the Thesaurus Linguae Etruscae (TLE), are indispensable, but they were born in the age of print. Access depends on institutional subscriptions, physical library holdings, or scanned PDFs that resist machine analysis.

OpenEtruscan is a response to this situation. It provides a fully open, computationally accessible version of the Etruscan epigraphic record, published under permissive licences (MIT for code, CC0 for data) so that any scholar, student, or enthusiast can use, modify, and redistribute the material without restriction.

Principles

  1. Open by default. All data, code, and models are published openly. If something is closed, it is because we have not yet been able to open it, not by design.
  2. Interoperability over isolation. We align our identifiers with established gazetteers (Pleiades, GeoNames, Trismegistos) and publish as Linked Open Data so that the corpus participates in the wider ecosystem of ancient-world information, rather than standing apart.
  3. Computational methods as a complement to philology. Neural classifiers, normalizers, and statistical analyses are tools, not replacements for close reading. Their value lies in surfacing patterns across a corpus too large for one scholar to hold in memory.
  4. Provenance and attribution. Every inscription carries its bibliographic source. When we disagree with a reading, we note the alternative. Scholarly consensus is tracked, not overridden.
  5. Low-barrier access.The entire platform runs in a web browser. Neural models execute client-side; no data leaves the user's machine. There are no accounts, no paywalls, and no tracking beyond anonymised performance telemetry.

Scope

The current corpus contains 4,728 inscriptions in Etruscan and related Italic scripts (Faliscan, Lemnian, Oscan, Umbrian), georeferenced to 45 archaeological sites across Italy. We aim to extend coverage as new publications appear and as OCR extraction from the CIE fascicles matures.

We welcome corrections, additions, and alternative readings. Contributions can be submitted via the project's GitHub repository.

Scholarly context

OpenEtruscan follows the FAIR data principles (Findable, Accessible, Interoperable, Reusable). The Linked Open Data layer uses the Linking Ancient World Data (LAWD) ontology, Dublin Core, and GeoSPARQL.

The classifier models are described in a forthcoming technical note. Training data, evaluation metrics, and model weights are available on Hugging Face.

Invitation

Etruscology is a small field. The epigraphic record is fragmentary, the language only partially understood, and the community of specialists is distributed across continents. We believe that open tools and open data can lower the barrier to entry, connect researchers who would otherwise work in isolation, and preserve a body of evidence that deserves wider attention.

This project is an invitation to participate.