Skip to content

cityjson-corpus

Shared test and benchmark corpus for CityJSON data handling software.

This repository keeps the corpus contract in one place.

Where does the corpus come from?

  • Painstakingly curated and handwritten CityJSON spec conformance test cases in the cases/conformance/v2_0 folder.
  • Synthetic data generated by cityjson-fake for benchmarking particular workloads (eg. attribute-heavy, geometry-heavy etc.).
  • Real-world data downloaded from the 3DBAG and Basisvoorziening 3D projects.

Read In The Docs Site

If you are reading the repository through the docs site, start with:

  • ../index.md
  • ../shared-corpus.md
  • ../cases/index.md
  • ../contributing.md
  • ../independent-use.md
  • ../licensing.md

Repository Layout

  • cases/: source of truth. Each case folder contains the case metadata, the expected result, and the source or instructions for the artifact.
  • catalog/: derived machine-readable index built from cases/.
  • schemas/: JSON Schemas and the short glossary for controlled values.
  • scripts/: validation, catalog rendering, docs generation, and acquisition helpers.
  • pipelines/: notes about how derived benchmark outputs are built.
  • artifacts/: generated files, acquired files, and derived indexes.
  • docs/: hand-written docs and architecture notes.

Working Rules

  • cases/ is the source of truth.
  • catalog/ and artifacts/ are derived outputs.
  • Do not edit derived files by hand when a source file or build command owns them.
  • If you remove a case, run just clean before rebuilding docs so stale generated case pages do not remain in the site output.

Main Commands

  • just fmt: format Python files with ruff.
  • just lint: validate the repo.
  • just sync-catalog: rebuild ../reference/cases.md and artifacts/correctness-index.json.
  • just generate-data: materialize generated workload data and refresh artifacts/benchmark-index.json (requires cityjson-fake).
  • just acquire-3dbag: materialize the pinned 3DBAG workload artifacts.
  • just acquire-basisvoorziening-3d: materialize the pinned Basisvoorziening 3D workload artifacts via the PDOK OGC API.
  • just clean: remove generated outputs and generated docs pages.
  • just docs-build: build the ProperDocs site.
  • just docs-serve: serve the ProperDocs site locally.

just lint and just docs-build use the checked-in schemas/cityjson-fake-manifest.schema.json. Only just generate-data requires access to cityjson-fake.

Licensing

This repository now uses a dual-license model for repository-authored content:

  • LICENSE: Apache-2.0 for repository-authored code, scripts, schemas, and build logic.
  • LICENSE-DATA: CC BY 4.0 for repository-authored docs, metadata, and synthetic corpus content.
  • Acquired third-party data keeps the upstream license named in its acquisition.json.

Contributing

Contributions are welcome in all forms. You are welcome to create, refine, and delete cases by submitting a pull request and explaining the changes. For a detailed guide on how to contribute, see online documentation.

Use of AI in this project

ChatGPT 5.4 was used to scaffold the repository, develop the schemas and structure of the corpus, write the documentation. LLM-models do not generate the actual data files that are used in tests and benchmarks by the consuming software.