cityjson-corpus
Shared test and benchmark corpus for CityJSON data handling software.
This repository keeps the corpus contract in one place.
Where does the corpus come from?
- Painstakingly curated and handwritten CityJSON spec conformance test cases in the
cases/conformance/v2_0folder. - Synthetic data generated by cityjson-fake for benchmarking particular workloads (eg. attribute-heavy, geometry-heavy etc.).
- Real-world data downloaded from the 3DBAG and Basisvoorziening 3D projects.
Read In The Docs Site
If you are reading the repository through the docs site, start with:
../index.md../shared-corpus.md../cases/index.md../contributing.md../independent-use.md../licensing.md
Repository Layout
cases/: source of truth. Each case folder contains the case metadata, the expected result, and the source or instructions for the artifact.catalog/: derived machine-readable index built fromcases/.schemas/: JSON Schemas and the short glossary for controlled values.scripts/: validation, catalog rendering, docs generation, and acquisition helpers.pipelines/: notes about how derived benchmark outputs are built.artifacts/: generated files, acquired files, and derived indexes.docs/: hand-written docs and architecture notes.
Working Rules
cases/is the source of truth.catalog/andartifacts/are derived outputs.- Do not edit derived files by hand when a source file or build command owns them.
- If you remove a case, run
just cleanbefore rebuilding docs so stale generated case pages do not remain in the site output.
Main Commands
just fmt: format Python files with ruff.just lint: validate the repo.just sync-catalog: rebuild../reference/cases.mdandartifacts/correctness-index.json.just generate-data: materialize generated workload data and refreshartifacts/benchmark-index.json(requires cityjson-fake).just acquire-3dbag: materialize the pinned 3DBAG workload artifacts.just acquire-basisvoorziening-3d: materialize the pinned Basisvoorziening 3D workload artifacts via the PDOK OGC API.just clean: remove generated outputs and generated docs pages.just docs-build: build the ProperDocs site.just docs-serve: serve the ProperDocs site locally.
just lint and just docs-build use the checked-in
schemas/cityjson-fake-manifest.schema.json. Only just generate-data
requires access to cityjson-fake.
Licensing
This repository now uses a dual-license model for repository-authored content:
LICENSE:Apache-2.0for repository-authored code, scripts, schemas, and build logic.LICENSE-DATA:CC BY 4.0for repository-authored docs, metadata, and synthetic corpus content.- Acquired third-party data keeps the upstream license named in its
acquisition.json.
Contributing
Contributions are welcome in all forms. You are welcome to create, refine, and delete cases by submitting a pull request and explaining the changes. For a detailed guide on how to contribute, see online documentation.
Use of AI in this project
ChatGPT 5.4 was used to scaffold the repository, develop the schemas and structure of the corpus, write the documentation. LLM-models do not generate the actual data files that are used in tests and benchmarks by the consuming software.