complexly/gbb_code

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
sampledata		sampledata
0_get_familiy_id.ipynb		0_get_familiy_id.ipynb
1_ccmt_matching.ipynb		1_ccmt_matching.ipynb
2_cpc.ipynb		2_cpc.ipynb
3.1_cluster_GBB.ipynb		3.1_cluster_GBB.ipynb
3.2_cluster_sourcefields.ipynb		3.2_cluster_sourcefields.ipynb
3.3_name_clusters.ipynb		3.3_name_clusters.ipynb
4_assocition_source_gbb.ipynb		4_assocition_source_gbb.ipynb
5.1_firm_agg.ipynb		5.1_firm_agg.ipynb
5.2_data4reg.ipynb		5.2_data4reg.ipynb
5.3_entry_reg_R.ipynb		5.3_entry_reg_R.ipynb
5.4_marginplot_stata.ipynb		5.4_marginplot_stata.ipynb
6_viz_firm_cities.ipynb		6_viz_firm_cities.ipynb
7.1_firm_primary_ctry.ipynb		7.1_firm_primary_ctry.ipynb
7.2_complement_firm_ctry.ipynb		7.2_complement_firm_ctry.ipynb
7.3_complement_reg_R.ipynb		7.3_complement_reg_R.ipynb
7.4_fig_complement.ipynb		7.4_fig_complement.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

Patent Green-Technology Matching & Clustering

This repo provides notebooks to (i) identify green/CCMT patent families via CPC (Y02/Y04), (ii) match to non-green patents using text embeddings and ANN (HNSW), (iii) cluster/name green tech groups (HDBSCAN/UMAP), (iv) aggregate to firms/cities/countries and (v) analyze data in the paper "Green building blocks reveal the complex anatomy of climate change mitigation technologies".

Requirements

OS: Linux / macOS / Windows (WSL2 recommended on Windows)
Python: 3.11 (see requirements.txt for pinned packages)
R: 4.4.3 (for 5.3_entry_reg_R.ipynb, 7.3_complement_reg_R.ipynb)
Optional: Stata (for 5.4_marginplot_stata.ipynb).

Install with conda:

conda create -n gbb python=3.11 -y
conda activate gbb
conda install r=4.4.3 r-irkernel
pip install -r requirements.txt

Typical install time on a normal desktop: 10–25 minutes (most time is Python wheel downloads; nmslib may compile on some platforms).

Quickstart (demo)

Open notebooks to see the processing codes and results on whole datset.

A tiny demo dataset is in sampledata/, which are not the whole data used but a few sampled rows to show the structure. Execution of notebooks on these data will not get the same results.

Paths are now configured via an auto-inserted setup cell. You can also set an environment variable to relocate the project:

export GBB_PROJECT_ROOT=/path/to/this/repo

Typical workflow

A download of Patentsview dataset and PATSTAT (license needed) is necessary, and the raw data is recommended to store in Parquet files for faster loading.

Run in order: 0_get_familiy_id.ipynb → 1_ccmt_matching.ipynb → 2_cpc.ipynb → 3.1_cluster_GBB.ipynb → 3.2_cluster_sourcefields.ipynb → 3.3_name_clusters.ipynb → 4_assocition_source_gbb.ipynb → 5.1_firm_agg.ipynb → 5.2_data4reg.ipynb → 5.3_entry_reg_R.ipynb (R) → 5.4_marginplot_stata.ipynb (Stata) → 6_viz_firm_cities.ipynb → 7.1_firm_primary_ctry.ipynb → 7.2_complement_firm_ctry.ipynb → 7.3_complement_reg_R.ipynb (R) → 7.4_fig_complement.ipynb

Notes

Parquet IO requires pyarrow (already pinned).
R notebooks require IRkernel if run in Jupyter.
Stata notebook can be run in Stata directly or using a Jupyter kernel configured for Stata.
The matching and clustering step requires large memory, typically need memory of 64GB or larger for the whole patent dataset.

License

See LICENSE.

About

Data and code associated with Green Building Blocks research

Releases 4

init code for dataset and main results in GBB Latest

Aug 20, 2025

+ 3 releases

Packages

No packages published

Languages

Jupyter Notebook 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

complexly/gbb_code

Folders and files

Latest commit

History

Repository files navigation

Patent Green-Technology Matching & Clustering

Requirements

Quickstart (demo)

Typical workflow

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages

Languages

License

complexly/gbb_code

Folders and files

Latest commit

History

Repository files navigation

Patent Green-Technology Matching & Clustering

Requirements

Quickstart (demo)

Typical workflow

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages