Making Korean government open data accessible worldwide with a single line of code.
from datasets import load_dataset
ds = load_dataset("kpubdata/seoul-apartment-trades")
df = ds["train"].to_pandas()
Korean public data (data.go.kr) is valuable but hard to access: complex API authentication, XML responses, Korean-only documentation, and no standard formats like Parquet or HuggingFace Datasets.
We bridge the gap — raw public data, cleaned and published as HuggingFace Datasets. No feature engineering, no opinions. Just honest, well-documented government data ready to use.
| Dataset | Records | Period | Source | Description |
|---|---|---|---|---|
| seoul-apartment-trades | ~234k | 2020–2024 | MOLIT via data.go.kr | Apartment sale transactions in Seoul, all 25 districts |
More datasets coming — air quality, weather, transit, and more.
[data.go.kr API] → [kpubdata SDK] → [kpubdata-builder pipeline] → [HuggingFace Dataset]
We welcome contributions! If there is a Korean public dataset you would like to see on HuggingFace:
Datasets are published under licenses compatible with their original government data licenses. Most Korean public data uses 공공누리 (Korea Open Government License), mapped to CC-BY-4.0.
See individual dataset cards for specific licensing details.