All articles
Starter Guide

Getting Started with the Eurostat API

First pull, dataset codes, and what to expect when building pipelines on the Eurostat JSON-UI and SDMX APIs.

Eurostat API EU statistics unemployment JSON-UI API SDMX Python European Union dataset codes

Source on EconIndx: Eurostat β€” free, no registration, 7,000+ datasets, 27 EU member states + EEA.

Access & Pricing

Fully free, no registration, no API key. Eurostat is the European Union’s statistical office β€” all data is publicly available for commercial and non-commercial use with attribution. The JSON-UI API at ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/ is the easiest entry point.

Your First Data Pull

Eurostat organizes data by dataset code (e.g., namq_10_gdp for quarterly national accounts). Each dataset has dimensions you filter with query parameters:

πŸ“Œ Note: Eurostat dataset codes look opaque but follow a pattern: une_rt_m = unemployment (une) rate (rt) monthly (m). Browse the Eurostat Data Browser to find codes visually. The URL of any dataset page contains the code you need for the API.

import requests
import pandas as pd

EUROSTAT_BASE = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data"

def fetch_eurostat(dataset: str, **filters) -> pd.DataFrame:
    """Fetch a Eurostat dataset with dimension filters."""
    params = {"format": "JSON", "lang": "EN", **filters}
    r = requests.get(f"{EUROSTAT_BASE}/{dataset}", params=params)
    r.raise_for_status()
    data = r.json()

    # Unpack SDMX-lite JSON structure
    dims = data["dimension"]
    dim_order = data["id"]
    values = data["value"]
    size = data["size"]

    # Build index β†’ label maps for each dimension
    label_maps = {
        dim: {str(v["index"]): k for k, v in dims[dim]["category"]["label"].items()}
        for dim in dim_order
    }

    # Flatten the multi-dimensional array
    rows = []
    for flat_idx, obs_val in values.items():
        idx = int(flat_idx)
        coords = []
        for s in reversed(size):
            coords.append(idx % s)
            idx //= s
        coords.reverse()

        row = {dim: label_maps[dim].get(str(c), c)
               for dim, c in zip(dim_order, coords)}
        row["value"] = obs_val
        rows.append(row)

    return pd.DataFrame(rows)

# Quarterly GDP for Germany and France
gdp = fetch_eurostat(
    "namq_10_gdp",
    geo="DE,FR",
    unit="CP_MEUR",     # current prices, millions EUR
    na_item="B1GQ",     # GDP
    s_adj="NSA",        # not seasonally adjusted
    freq="Q"
)

print(f"Rows: {len(gdp)}")
print(gdp.tail(5)[["geo", "time", "value"]])

First Pull: What to Expect

DatasetDescriptionFilter exampleRows (2 countries, 20yr)
namq_10_gdpQuarterly national accountsgeo=DE,FR~200–400 per unit
une_rt_mMonthly unemployment rategeo=EU27_2020,DE,FR~600
prc_hicp_midxHICP monthly indexgeo=EU,DE,FR~800
ext_lt_maineuTrade with main partners---~50,000+ (large)
demo_pjanPopulation on 1 Jan, annualgeo=EU27_2020~300

Without geo filtering, a dataset like namq_10_gdp can return 100,000+ rows (all countries, all dimensions). Always filter by geo and time for your first pull.

Flags embedded in TSV: When using the bulk TSV format (different endpoint), flags like : (not available), b (break in series), e (estimated) are embedded in value cells (e.g., "1234.5 e"). The JSON-UI API returns clean numeric values without flags β€” use it for pipeline work.

Key Datasets to Start With

National accounts:

  • namq_10_gdp β€” quarterly GDP by expenditure approach
  • nama_10_gdp β€” annual GDP, broader coverage
  • nama_10_pc β€” GDP per capita, annual

Labor market:

  • une_rt_m β€” monthly unemployment rate, by sex and age
  • lfsq_urgan β€” quarterly unemployment by geography

Prices:

  • prc_hicp_midx β€” HICP monthly price index (EU inflation measure)
  • prc_hicp_aind β€” HICP annual average index

Trade:

  • ext_lt_maineu β€” extra-EU trade by main partners (large dataset)
  • tet00002 β€” exports/imports summary

Population:

  • demo_pjan β€” population on 1 January
  • demo_gind β€” population change indicators
# Monthly unemployment for all EU countries (one call)
unemp = fetch_eurostat(
    "une_rt_m",
    sex="T",      # total (M/F/T)
    age="TOTAL",
    unit="PC_ACT", # % of active population
    s_adj="SA",   # seasonally adjusted
    freq="M"
)
print(f"Countries available: {unemp['geo'].nunique()}")
print(f"Total rows: {len(unemp)}")
# Expected: ~30 countries Γ— ~300 months = ~9,000 rows

Data Tolerance & Validation

What’s normal:

  • Eurostat harmonizes national data, so coverage depends on member state reporting. New EU members (Bulgaria, Romania) have shorter series. Some indicators only go back to the year of EU accession.
  • Quarterly GDP (namq_10_gdp) is revised for 2+ years after each release. Download timestamps matter β€” store them.
  • NUTS geography levels add complexity: namq_10_r3 provides regional (NUTS 2) data, which is much sparser than national (NUTS 0) data.
  • The time dimension format is YYYY-QN for quarterly (2023-Q4), YYYY-MM for monthly, YYYY for annual.

⚠️ Flag codes: Eurostat uses observation flag codes alongside values. A p flag means provisional, e means estimated, b means break in series. Always parse the status dimension alongside the value dimension in JSON-UI responses and store flags in your schema β€” they affect how the data should be used in models.

Validation checks:

def validate_eurostat_pull(df: pd.DataFrame, dataset: str) -> dict:
    country_count = df["geo"].nunique() if "geo" in df.columns else None
    row_count = len(df)

    # For quarterly data, parse time to find latest
    if "time" in df.columns:
        # Handle both "2023-Q4" and "2023-12" formats
        times = df["time"].dropna().unique()
        latest = max(times)
    else:
        latest = None

    null_count = df["value"].isna().sum() if "value" in df.columns else None

    return {
        "dataset": dataset,
        "row_count": row_count,
        "country_count": country_count,
        "null_count": null_count,
        "latest_period": latest,
        "alert": row_count == 0,  # empty response = bad filter or dataset moved
    }

report = validate_eurostat_pull(gdp, "namq_10_gdp")
print(report)

Alert thresholds:

  • Zero rows returned: dataset code may have changed (Eurostat reorganizes periodically β€” check the catalog)
  • Country count drops more than 20% from last pull: investigate API or dimension filter change
  • Latest quarterly period more than 3 months behind current calendar quarter: data is stale
  • HICP for euro area more than 45 days old: data is stale (published ~2 weeks after month end)

Bulk TSV for Large Initial Loads

For datasets with millions of rows, use the bulk TSV endpoint:

import gzip
import io

def fetch_eurostat_bulk(dataset: str) -> pd.DataFrame:
    url = (f"https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/"
           f"{dataset}?format=TSV&compressed=true")
    r = requests.get(url, stream=True)
    with gzip.open(io.BytesIO(r.content)) as f:
        df = pd.read_csv(f, sep="\t", dtype=str)
    return df

# Example: full HICP dataset (~50MB compressed)
# hicp_bulk = fetch_eurostat_bulk("prc_hicp_midx")

Schema Stability

Dataset codes are stable but Eurostat reorganizes its catalog every few years. Track your dataset codes in a registry and add a health-check that calls the catalog endpoint to confirm they still exist. Dimension codes (geo, unit, na_item, etc.) follow SDMX codelists and are very stable. Geographic codes follow Eurostat conventions (e.g., EU27_2020 for current EU, DE for Germany).

Learn

Recent articles

View all β†’