Python pandas openpyxl Samples Features Targets Seed
A collaboratively engineered 200-sample Γγ°γ€ 48-feature dataset of refractory ceramics and composites,
built by a 4-member team using physics-informed feature engineering across 8 material property domains.
π Website : https://imi-project.vercel.app/
- Project Overview
- Dataset Summary
- Repository Structure
- Data Pipeline
- Team Contributions
- Feature Groups at a Glance
- Material Classes
- Target Variables
- Setup & Usage
- Output Files
This project constructs a structured, ML-ready dataset for Ultra-High Temperature Materials (UHTMs) β refractory ceramics used in hypersonic re-entry vehicles, thermal protection systems (TPS), and aerospace leading-edge components.
The dataset covers 10 material families (carbides, borides, nitrides) and their composites/doped variants across 8 physical property domains, with 3 supervised regression targets. All features are derived from 5 literature-anchored material properties using validated physical relations plus realistic Gaussian noise (np.random.seed(42)).
Key design principles:
- Each of the 4 team members owns a distinct physical domain β clean separation of concerns
- All features trace back to 5 shared anchors (
Tm,Ο,ve,ΞEN,Ps) β internally consistent dataset - Reproducible: fixed seed, shared base file, merge validation enforced programmatically
| Property | Value |
|---|---|
| Total Samples | 200 (100 experimental + 100 synthetic) |
| Feature Columns | 48 (F01βF48) |
| Target Columns | 3 (T1βT3) |
| Metadata Columns | 5 (Sample_ID, Material_System, Source_Type, Synthesis_Method, Crystal_Structure) |
| Total Columns | 56 |
| Material Families | 10 base + composites + doped variants |
| Random Seed | 42 (all scripts) |
| Final Output | UHTM_final_200x48.xlsx + UHTM_final_200x48.csv |
IMI-Project-main/
β
βββ Aadi-Dev/ # Member 1 β Thermodynamic + Electronic
β βββ dataset.py # β
Generates F01βF12
β βββ Aadi.xlsx # Output: 200 Γγ°γ€ 17 (meta + 12 features)
β βββ UHTM_base_200.xlsx # Base reference copy
β
βββ Krish/ # Member 2 β Mechanical + Thermal + Infrastructure
β βββ Intro.py # Branch onboarding note
β βββ LAB EVALUATION/
β βββ Krishh.py # β
Generates F13βF24
β βββ Krishh.xlsx # Output: 200 Γγ°γ€ 17
β βββ UHTM_base_200.xlsx
β βββ Merge/
β βββ Base.py # β
β
Generates UHTM_base_200.xlsx (run first)
β βββ mergeAll.py # β
β
Final merge of all 4 member files
β βββ AadiDev.xlsx # Member 1 snapshot for merge
β βββ Krishh.xlsx # Member 2 snapshot for merge
β βββ Niranjan.xlsx # Member 4 snapshot for merge
β βββ Salan.xlsx # Member 3 snapshot for merge
β βββ UHTM_final_200x48.xlsx # β
β
FINAL MERGED DATASET
β
βββ Niranjan/ # Member 4 β Phase + ML Descriptors + Targets
β βββ Niranjan.py # β
Generates F37βF48 + T1, T2, T3
β βββ Niranjan.xlsx # Output: 200 Γγ°γ€ 20
β βββ UHTM_base_200.xlsx
β
βββ Salan/ # Member 3 β Oxidation + Microstructural
β βββ lab evaluation/
β β βββ Salan.py # β
Generates F25βF36
β β βββ Salan_member3.xlsx # Output: 200 Γγ°γ€ 17
β β βββ UHTM_base_200.xlsx
β βββ Backup_datasets/
β βββ UHTM_Complete.csv
β βββ UHTM_Complete.xlsx
β βββ completefile.py
β
βββ UHTM_final_200x48.csv # β
β
ML-ready CSV (root copy)
βββ UHTM_final_200x48.xlsx # β
β
Final annotated Excel (root copy)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β STEP 1: Base.py β
β βββββββββββββ β
β Krish generates UHTM_base_200.xlsx β
β 200 rows Γγ°γ€ 10 cols (5 meta + 5 hidden anchors: Tm, Ο, ve, ΞEN, Ps) β
β β β
β βββββββββββββββββΌβββββββββββββββββ β
β βΌ βΌ βΌ βΌ β
β STEP 2a: Aadi 2b: Krish 2c: Salan 2d: Niranjan β
β dataset.py Krishh.py Salan.py Niranjan.py β
β F01βF12 F13βF24 F25βF36 F37βF48 + T1βT3 β
β Aadi.xlsx Krishh.xlsx Salan_m3.xlsx Niranjan.xlsx β
β β β β β β
β βββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββ β
β β β
β βΌ β
β STEP 3: mergeAll.py β
β ββββββββββββββββββββ β
β Validates 200 rows + Sample_ID match across all 4 files β
β Horizontally joins all feature groups β
β β β
β βΌ β
β STEP 4: UHTM_final_200x48.xlsx / .csv β
β 200 rows Γγ°γ€ 56 cols (5 meta + 48 features + 3 targets) β
β + Summary Stats sheet + Feature Legend sheet β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Script: Aadi-Dev/dataset.py | Output: Aadi.xlsx | Features: F01βF12
Aadi is responsible for the foundational material properties spanning thermodynamic stability and quantum electronic structure β the two groups that most directly determine a material's suitability as a UHTM candidate.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F01 | Melting Point | Literature anchor Tm + 2% noise |
K |
| F02 | Debye Temperature | ΞΈ_D β 300 + 0.15Β·Tm (Debye-GrΓΌneisen scaling) |
K |
| F03 | Cohesive Energy | E_coh = veΒ·0.82 + ΞENΒ·1.1 (metallic + ionic/covalent) |
eV/atom |
| F04 | Formation Enthalpy | ΞHf = -(ΞENΒ·45 + veΒ·8) (exothermic = stable) |
kJ/mol |
| F05 | Lattice Parameter a | a β 3.2 + Ο^(-0.3)Β·0.5 (inverse power law with density) |
Γ |
| F06 | GrΓΌneisen Parameter | Ξ³ β 0.4 + ΞENΒ·0.3 + veΒ·0.05 (anharmonicity index) |
β |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F07 | Band Gap | max(0, 0.5 - veΒ·0.05) β metallic carbides/nitrides β 0 |
eV |
| F08 | DOS at Fermi Level | N(Ef) β veΒ·0.55 + 1.2 (more valence eβ β higher DOS) |
states/eV |
| F09 | Bader Charge Transfer | Ξq β ΞENΒ·0.6 (Pauling-type charge transfer) |
eβ |
| F10 | Fermi Velocity | v_F = Γγ°γ€106 Β· (ve/8.5) (free-electron model) |
m/s |
| F11 | Valence Electron Density | n = ΟΒ·veΓγ°γ€1022 (electrons per unit volume) |
Γγ°γ€1022/cm3 |
| F12 | Work Function | Ο β 3.5 + ΞENΒ·0.8 - veΒ·0.05 (surface escape energy) |
eV |
Physical significance: F01 is the primary UHTM selection criterion. F07 distinguishes metallic from semiconducting behaviour at high T. F11 governs metallic bonding strength and shear modulus. F12 controls thermionic emission.
Scripts: Krishh.py, Base.py, mergeAll.py | Output: Krishh.xlsx | Features: F13βF24
Krish owns both mechanical integrity and thermal transport features β the two most critical property groups for structural aerospace applications. Krish also authored the base dataset generator (Base.py) and the final merge script (mergeAll.py), serving as the project's data infrastructure lead.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F13 | Young's Modulus | E β 200 + TmΒ·0.04 + veΒ·15 (Gilman-Cohen stiffness relation) |
GPa |
| F14 | Vickers Hardness | H_v β 15 + ΞENΒ·8 + veΒ·1.2 (bond character + electron count) |
GPa |
| F15 | Fracture Toughness K_Ic | K_Ic β 2.5 + 1.5/ΞEN (ionic bonds = more brittle) |
MPaβm |
| F16 | Compressive Strength | Ο_c β 0.6Β·E (empirical ratio for dense ceramics) |
GPa |
| F17 | Poisson's Ratio | Ξ½ β 0.18 + ΞENΒ·0.02 (covalent β 0.18; ionic β higher) |
β |
| F18 | Flexural Strength | Ο_f = 300 + EΒ·0.8 + HΒ·12 - porosityΒ·15 |
MPa |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F19 | Thermal Conductivity | ΞΊ = 20 + veΒ·4 - ΞENΒ·3 (electronic + ionic scattering) |
W/mΒ·K |
| F20 | Coeff. Thermal Expansion | CTE = 6.5 + ΞENΒ·0.8 - veΒ·0.3 |
Γγ°γ€10β6/K |
| F21 | Specific Heat Capacity | Cp β 180 + 800/Ο (inverse density, Dulong-Petit) |
J/kgΒ·K |
| F22 | Thermal Diffusivity | Ξ± = ΞΊ / (ΟΒ·Cp) (exact thermodynamic definition) |
m2/s |
| F23 | Max Service Temperature | T_max β 0.75Β·Tm (standard refractory engineering rule) |
K |
| F24 | Thermal Shock Resistance | R = Ο_fΒ·ΞΊ / (EΒ·CTE) (Hasselman R-parameter) |
W/m |
Infrastructure contributions:
Base.pydefines all 10 material anchors from literature (Materials Project, JARVIS-DFT, Fahrenholtz & Hilmas 2012).mergeAll.pyvalidates row counts and Sample_ID alignment before joining, preventing silent merge errors.
Script: Salan/lab evaluation/Salan.py | Output: Salan_member3.xlsx | Features: F25βF36
Salan handles the chemical stability and process-microstructure features β the groups that determine how a UHTM behaves over time in oxidising environments and how its properties are influenced by the synthesis route.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F25 | Oxidation Onset Temperature | T_ox = 800 + ΞENΒ·150 + TmΒ·0.05 |
K |
| F26 | Parabolic Rate Constant k_p | k_p = Γγ°γ€10β12 Β· exp(ΞENΒ·0.8) (Arrhenius) |
kg2/m4s |
| F27 | Oxidation Activation Energy | Ea = 120 + ΞENΒ·30 (diffusion barrier) |
kJ/mol |
| F28 | Gravimetric Parabolic Rate | k_p_grav = Γγ°γ€10β10 Β· ΞEN (TGA standard unit) |
g2/cm4s |
| F29 | Oxide Layer Stability Index | OLS = ΞENΒ·0.7 + veΒ·0.15 (protective scale adherence) |
β |
| F30 | Oxygen Diffusivity in Oxide | D_O = Γγ°γ€10β14 Β· exp(-ΞENΒ·1.2) |
m2/s |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F31 | Average Grain Size | d = 2 + (Ps^-0.3)Β·10 (Hall-Petch: smaller grains β harder) |
ΞΌm |
| F32 | Relative Density | Ο_rel = min(99.9, 92 + PsΒ·0.18) |
% |
| F33 | Porosity | P = max(0.1, 100 - Ο_rel) (complement of relative density) |
% |
| F34 | Crystallite Size (XRD) | D_xrd = grain_size_ΞΌm Β· 1000 Β· 0.08 (Scherrer ~8% of grain) |
nm |
| F35 | Dislocation Density | Ο_disl = Γγ°γ€1012/d^1.5 (inverse power law with grain size) |
Γγ°γ€1012/m2 |
| F36 | Grain Boundary Energy | Ξ³_gb = 0.3 + ΞENΒ·0.25 + veΒ·0.02 |
J/m2 |
Physical significance: F25 is the primary chemical stability criterion. F26 determines oxide scale growth rate β smaller k_p = better protection. F31βF33 directly link sintering pressure (Ps anchor) to microstructure, closing the processβproperty loop.
Script: Niranjan/Niranjan.py | Output: Niranjan.xlsx | Features: F37βF48 + Targets: T1βT3
Niranjan handles the highest-level features: composite system descriptors, dimensionless ML merit indices designed for Pareto optimisation, and all three supervised learning targets. This is the most interdependent feature block β many features in Groups G and H re-derive quantities from Groups AβF using the same seed to ensure cross-member consistency.
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F37 | Phase Stability Index | PSI = Tm/4000 + E_coh/12 (normalised thermodynamic stability) |
β |
| F38 | Secondary Phase Vol. Fraction | 0% monolithic; 5β20% composites (index-dependent) | % |
| F39 | Interfacial Energy | Ξ³_int = 0.5 + ΞENΒ·0.3 (ionic mismatch β delamination risk) |
J/m2 |
| F40 | CTE Mismatch Index | `ΞCTE = | CTE - 5.0 |
| F41 | Solid Solution Distortion Ξ΄ | Ξ΄ = ΞENΒ·0.12 + (ve%3)Β·0.05 (Hume-Rothery lattice distortion) |
β |
| F42 | Wettability Index | W = 1 - f_ion = veΒ·0.15/(ΞEN + veΒ·0.15) (covalent fraction) |
β |
| Feature | Name | Formula / Physics | Unit |
|---|---|---|---|
| F43 | Thermal Merit Index | TMI = ΞΊ/(CTEΒ·Ο) (Ashby figure of merit for TPS panels) |
W/kg |
| F44 | Toughness-Stiffness Index | TSI = K_IcΒ·βE (combined crack resistance and stiffness) |
GPaΒ·βGPa |
| F45 | Oxidation Merit Score | OMS = T_ox/(k_p_norm + 1) (Bayesian optimisation reward signal) |
β |
| F46 | Bond Ionicity Fraction | f_ion = ΞEN/(ΞEN + veΒ·0.15) (Pauling ionicity) |
β |
| F47 | Structural Stability Index | SSI = (Tm/4000)Β·(E_coh/10)Β·(1 - P/100) |
β |
| F48 | Creep Resistance Parameter | CR = (Tm/3000)Β·(E/400)Β·(1/d)^0.3 (grain size + stiffness) |
β |
All three targets are defined and computed by Niranjan (Member 4).
| Target | Name | Type | Description |
|---|---|---|---|
| T1 | Flexural Strength | Continuous (MPa) | Multi-factor structural target. Driven by Young's Modulus, Vickers Hardness, and Porosity. Primary design metric for load-bearing structures. |
| T2 | Oxidation Resistance Score | Continuous (0β10) | Composite weighted score: onset temperature (Γγ°γ€3) + rate constant (Γγ°γ€2) + stability index (Γγ°γ€2) + CTE match (Γγ°γ€1). |
| T3 | Thermal Shock Cycles | Integer (cycles) | Predicted cycles-to-failure under rapid thermal cycling. Driven by K_Ic, CTE, and composite mismatch index F40. |
| Group | Features | Member | Domain | Colour (in Excel) |
|---|---|---|---|---|
| A β Thermodynamic | F01βF06 | Aadi | Phase stability, bonding energetics | Teal |
| B β Electronic | F07βF12 | Aadi | DFT / quantum properties | Blue |
| C β Mechanical | F13βF18 | Krish | Structural integrity | Purple |
| D β Thermal Transport | F19βF24 | Krish | Heat transport & diffusion | Orange |
| E β Oxidation | F25βF30 | Salan | Chemical stability | Red |
| F β Microstructural | F31βF36 | Salan | Processβstructureβproperty | Green |
| G β Phase/Composite | F37βF42 | Niranjan | Multi-phase composite systems | Indigo |
| H β ML Descriptors | F43βF48 | Niranjan | Pareto / reward signals for inverse design | Teal-dark |
| Targets | T1βT3 | Niranjan | Supervised regression targets | Dark Red |
Base anchor values sourced from: Materials Project (mp-*), JARVIS-DFT, Fahrenholtz & Hilmas (2012), Cedillos-Barraza et al. (2016), Opeka et al. J. Eur. Ceram. Soc.
| Material | Crystal | T_m (K) | Ο (g/cm3) | v_e | ΞEN |
|---|---|---|---|---|---|
| HfC | FCC | 3900 | 12.20 | 8 | 1.3 |
| ZrC | FCC | 3420 | 6.73 | 8 | 1.3 |
| TaC | FCC | 3880 | 14.30 | 9 | 1.1 |
| HfB2 | HEX | 3380 | 10.50 | 6 | 0.9 |
| ZrB2 | HEX | 3245 | 6.09 | 6 | 0.9 |
| TiC | FCC | 3160 | 4.93 | 8 | 1.5 |
| NbC | FCC | 3600 | 7.79 | 9 | 1.2 |
| HfN | FCC | 3385 | 13.80 | 9 | 1.6 |
| ZrN | FCC | 2980 | 7.09 | 9 | 1.6 |
| TaN | HEX | 3090 | 16.30 | 10 | 1.4 |
Composite variants (experimental, index 50β99): HfC-SiC, ZrB2-SiC, HfB2-SiC, TaC-HfC, ZrC-TiC, HfC-TaC, ZrB2-ZrC, HfB2-MoSi2, TiC-TiB2, NbC-HfC
Doped variants (synthetic, index 150β199): HfC:Y, ZrC:La, TaC:W, HfB2:Al, ZrB2:Y, TiC:Nb, NbC:Ta, HfN:Zr, ZrN:Hf, TaN:Nb
pip install pandas numpy openpyxl
# From Krish/LAB EVALUATION/Merge/ python Base.py # Output: UHTM_base_200.xlsx # Contains: 200 rows Γγ°γ€ 10 cols (5 metadata + 5 physical anchors)
# Member 1 β Aadi cd Aadi-Dev/ python dataset.py # Output: Aadi.xlsx (200 Γγ°γ€ 17) # Member 2 β Krish cd "Krish/LAB EVALUATION/" python Krishh.py # Output: Krishh.xlsx (200 Γγ°γ€ 17) # Member 3 β Salan cd "Salan/lab evaluation/" python Salan.py # Output: Salan_member3.xlsx (200 Γγ°γ€ 17) # Member 4 β Niranjan cd Niranjan/ python Niranjan.py # Output: Niranjan.xlsx (200 Γγ°γ€ 20: 12 features + 3 targets + 5 meta)
# Copy all member xlsx files into Krish/LAB EVALUATION/Merge/ # Rename if needed: AadiDev.xlsx, Krishh.xlsx, Salan.xlsx, Niranjan.xlsx cd "Krish/LAB EVALUATION/Merge/" python mergeAll.py # Validates: 200 rows + Sample_ID match across all 4 files # Output: UHTM_final_200x48.xlsx # Sheet 1 β UHTM_Full_Dataset (200 Γγ°γ€ 56, colour-coded by group) # Sheet 2 β Summary Stats (describe() for all F and T columns) # Sheet 3 β Feature Legend (group β member β domain mapping)
All scripts use np.random.seed(42) at the top level. As long as the base file is generated first and member scripts are run with the same seed, all outputs are deterministic.
| File | Location | Description |
|---|---|---|
UHTM_base_200.xlsx |
Krish/LAB EVALUATION/Merge/ |
Base dataset with material anchors. Input to all 4 member scripts. |
Aadi.xlsx |
Aadi-Dev/ |
Member 1 output: F01βF12 |
Krishh.xlsx |
Krish/LAB EVALUATION/ |
Member 2 output: F13βF24 |
Salan_member3.xlsx |
Salan/lab evaluation/ |
Member 3 output: F25βF36 |
Niranjan.xlsx |
Niranjan/ |
Member 4 output: F37βF48 + T1βT3 |
UHTM_final_200x48.xlsx |
Root + Merge/ |
β Final merged dataset with summary and legend sheets |
UHTM_final_200x48.csv |
Root | β ML-ready flat CSV export |
| Member | Features | Scripts |
|---|---|---|
| Aadi | F01βF12 (Thermodynamic + Electronic) | Aadi-Dev/dataset.py |
| Krish | F13βF24 (Mechanical + Thermal) + Base + Merge | Krishh.py, Base.py, mergeAll.py |
| Salan | F25βF36 (Oxidation + Microstructural) | Salan/lab evaluation/Salan.py |
| Niranjan | F37βF48 (Phase + ML Descriptors) + T1βT3 | Niranjan/Niranjan.py |
IMI Project Β· 2026 Β· Ultra-High Temperature Materials Dataset