Commit 03c5c2f

committed

notes updates

1 parent 1a297b5 commit 03c5c2fCopy full SHA for 03c5c2f

File tree

8 files changed

+101

-55

lines changed

_blog
- misc
  - 25_data_science_benchmarks.md
- research
  - 23_data_explanation.md
  - assets
    - interpretable_models.svg
_includes
- 02_notes_main.html
- 03_experience.html
_notes
- assets
  - mlebench.png
- neuro
  - comp_neuro.md
- research_ovws
  - ovw_llms.md

8 files changed

+101

-55

lines changed

`‎_blog/misc/25_data_science_benchmarks.md`

Lines changed: 7 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -12,30 +12,30 @@ Some benchmarks focusing on getting insight directly from data using LLMs / LLM`
`12`	`12`	`- target output for every task is a self-contained Python file`
`13`	`13`	`- each task has (a) task instruction, (b) dataset info, (c) expert-provided info and (d) a groundtruth annotated program`
`14`	`14`
`15`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.19.17%E2%80%AFPM.png" class="noninverted full_image"/>`
	`15`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.19.17%E2%80%AFPM.png" class="big_image"/>`
`16`	`16`
`17`	`17`	`AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists ([li...huan sun, 2025](https://arxiv.org/abs/2506.08140)) - 5k scientific coding tasks automatically scraped from github repos for papers (as a sanity check, they manually verified that a subset were reasonable)`
`18`	`18`
`19`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.22.52%E2%80%AFPM.png" class="noninverted full_image"/>`
	`19`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.22.52%E2%80%AFPM.png" class="big_image"/>`
`20`	`20`
`21`	`21`	`DiscoveryBench: Towards Data-Driven Discovery with Large Language Models ([majumder...clark, 2024](https://arxiv.org/abs/2407.01725)) - 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from papers`
`22`	`22`	`- each task has datasets, metadata, natural-language discovery goal`
`23`	`23`
`24`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.18.31%E2%80%AFPM.png" class="noninverted full_image"/>`
	`24`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.18.31%E2%80%AFPM.png" class="big_image"/>`
`25`	`25`
`26`	`26`	`BLADE: Benchmarking Language Model Agents for Data-Driven Science ([gu...althoff, 2024](https://arxiv.org/pdf/2408.09667)) - 12 tasks, each has a (fairly open-ended) research question, dataset, and groundtruth expert-conducted analysis`
`27`	`27`
`28`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.22.04%E2%80%AFPM.png" class="noninverted full_image"/>`
	`28`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.22.04%E2%80%AFPM.png" class="big_image"/>`
`29`	`29`
`30`	`30`	`Mlagentbench: Benchmarking LLMs As AI Research Agents ([huang, vora, liang, & leskovec, 2023](https://arxiv.org/abs/2310.03302v2)) - 13 prediction tasks, e.g. CIFAR-10, BabyLM, kaggle (evaluate via test prediction perf.)`
`31`	`31`
`32`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.02.49%E2%80%AFPM.png" class="noninverted full_image"/>`
	`32`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.02.49%E2%80%AFPM.png" class="big_image"/>`
`33`	`33`
`34`	`34`	`IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis ([li...jordan, 2025](https://arxiv.org/pdf/2505.18223)) - scraped 25 notebooks from recent kaggle competitions, parse into goal + reference insights that incorporate domain knowledge`
`35`	`35`	`- paper emphasizes interactive setting: evaluates by using the instruction materials to build a knowledgeable user simulator and then tests data science agents' ability to help the user simulator improve predictive performance`
`36`	`36`
`37`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.39.46%E2%80%AFPM.png" class="noninverted full_image"/>`
	`37`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.39.46%E2%80%AFPM.png" class="big_image"/>`
`38`	`38`
`39`	`39`	`InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks ([hu...wu, 2024](https://arxiv.org/abs/2401.05507)) - 257 precise (relatively easy) questions that can be answered from 1 of 52 csv datasets`
`40`	`40`
`41`		`-<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%203.53.53%E2%80%AFPM.png" class="noninverted full_image"/>`
	`41`	`+<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%203.53.53%E2%80%AFPM.png" class="big_image"/>`

`‎_blog/research/23_data_explanation.md`

Lines changed: 5 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@ Many interpretable models have been proposed to interpret data involved in predi`
`17`	`17`	`\| :----------------------------------------------------------: \| :-----------------------------------------------------: \| :-----------------------------------------------------: \| :----------------------------------------------------------: \|`
`18`	`18`	`\| <img src="https://csinva.io/imodels/img/rule_set.jpg" class="full_image"> \| <img src="https://csinva.io/imodels/img/rule_list.jpg" class="full_image"> \| <img src="https://csinva.io/imodels/img/rule_tree.jpg" class="full_image"> \| <img src="https://csinva.io/imodels/img/algebraic_models.jpg" class="full_image"> \|`
`19`	`19`
`20`		`-<p align="center"><b>Figure 1. </b>Different types of interpretable models. See scikit-learn friendly implementations [here](https://github.com/csinva/imodels),</p>`
	`20`	`+<p align="center"><b>Figure 1. </b>Different types of interpretable models. See scikit-learn friendly implementations in the <ahref="https://github.com/csinva/imodels">imodels package</a>.</p>`
`21`	`21`
`22`	`22`	`# Adding LMs to interpretable models`
`23`	`23`
`@@ -26,12 +26,13 @@ Fig 2 shows some newer model forms that seek data explanations using LMs/ interp`
`26`	`26`	`In the most direct case, an LM is fed data corresponding to 2 groups (binary classification) and prompted to directly produce a description of the difference between the groups ([D3](https://proceedings.mlr.press/v162/zhong22a.html)/[D5](https://arxiv.org/abs/2302.14233)).`
`27`	`27`	`Alternatively, given a dataset and a pre-trained LM, [iPrompt](https://arxiv.org/abs/2210.01848) searches for a natural-language prompt that works well to predict on the dataset, which serves as a description of the data. This is more general than D3, as it is not restricted to binary groups, but is also more computationally intensive, as finding a good prompt often requires iterative LM calls.`
`28`	`28`	`Either of these approaches can also be applied recursively ([TreePrompt](https://arxiv.org/abs/2310.14034)), resulting in a hierarchical natural-language description of the data.`
	`29`	`+Alternatively, many LLM answers to different questions can be concatenated into an embedding ([QA-Emb](https://arxiv.org/abs/2405.16714)), potentially incorporating bayesian iteration ([BC-LLM](https://arxiv.org/abs/2410.15555)), which can then be used to train a fully interpretable model, e.g. a linear model.`
`29`	`30`
`30`	`31`	`<img src="assets/interpretable_models.svg" class="full_image">`
`31`		`-<p align="center" style="margin-top:-20px"><b>Figure 2. </b>Different types of interpretable models, with text-specific approaches in bold.</p>`
	`32`	`+<p align="center" style="margin-top:-20px"><b>Figure 2. </b>Different types of interpretable models, with text-specific approaches in bold. See scikit-learn friendly implementations in the <ahref="https://github.com/csinva/imodelsX">imodelsX package</a>.</p>`
`32`	`33`
`33`	`34`	`In parallel to these methods, [Aug-imodels](https://arxiv.org/abs/2209.11799) use LMs to improve fully interpretable models directly.`
`34`	`35`	`For example, Aug-Linear uses an LM to augment a linear model, resulting in a more accurate model that is still completely interpretable.`
`35`		`-Aug-Tree uses an LM to augment the keyphrases used in a decision tree split, resulting in a more accurate but still fully interpretable decsion tree.`
	`36`	`+Aug-Tree uses an LM to augment the keyphrases used in a decision tree split, resulting in a more accurate but still fully interpretable decision tree.`
`36`	`37`
`37`		`-This line of research is still in its infancy, but there is great potential in combining LMs and interpretable models!`
	`38`	`+This line of research is still in its infancy -- there's a lot to be done in combining LMs and interpretable models!`

`‎_blog/research/assets/interpretable_models.svg`

Lines changed: 1 addition & 1 deletion

Loading[フレーム]

`‎_includes/02_notes_main.html`

Lines changed: 5 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -42,17 +42,16 @@ <h3 align="center">research posts</h3>`
`42`	`42`	`benchmarks`
`43`	`43`	`</a> '25`
`44`	`44`	`</li>`
	`45`	`+ <li>`
	`46`	`+ <a href="{{ site.baseurl }}/blog/research/23_data_explanation"`
	`47`	`+ style="font-size:medium; font-weight: bolder;"> interpretable text models (with LLMs)`
	`48`	`+ </a> '24 ⭐`
	`49`	`+ </li>`
`45`	`50`	`<li>`
`46`	`51`	`<a href="{{ site.baseurl }}/blog/misc/24_tensor_product_repr" style="font-size:medium"> tensor product`
`47`	`52`	`representations`
`48`	`53`	`</a> '24`
`49`	`54`	`</li>`
`50`		`- <li>`
`51`		`- <a href="{{ site.baseurl }}/blog/research/23_data_explanation" style="font-size:medium"> explaining text`
`52`		`- data`
`53`		`- with LLMs`
`54`		`- </a> '23`
`55`		`- </li>`
`56`	`55`	`<li><a href="{{ site.baseurl }}/blog/misc/23_paper_writing_tips" style="font-size:medium"> paper-writing tips`
`57`	`56`	`</a> '23`
`58`	`57`	`</li>`

`‎_includes/03_experience.html`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -140,8 +140,8 @@ <h2 style="text-align: center;margin-top: -150px;"> experience </h2>`
`140`	`140`	`href="https://arxiv.org/abs/2411.00066">The generalized induction head</a>]</li>`
`141`	`141`	`<li><a href="https://robinwu218.github.io/">Ziyang Wu</a> ('24) [<a`
`142`	`142`	`href="https://arxiv.org/abs/2502.10385">Simplifying DINO</a>]</li>`
`143`		`- <li><a href="https://drogozhang.github.io/">Kai Zhang</a> ('24) [<a`
`144`		`- href="https://arxiv.org/abs/2503.10857">Evaluating LMM graphical perception</a>]</li>`
	`143`	`+ <!-- <li><a href="https://drogozhang.github.io/">Kai Zhang</a> ('24) [<a -->`
	`144`	`+ <!-- href="https://arxiv.org/abs/2503.10857">Evaluating LMM graphical perception</a>]</li> -->`
`145`	`145`	`<li><a href="https://vsahil.github.io/">Sahil Verma</a> ('24) [<a`
`146`	`146`	`href="https://arxiv.org/abs/2505.23856">OmniGuard</a>]</li>`
`147`	`147`	`<li><a href="https://www.linkedin.com/in/yufan-zhuang/">Yufan Zhuang</a> ('23) [<a`

`‎_notes/assets/mlebench.png`

132 KB

Loading[フレーム]

`‎_notes/neuro/comp_neuro.md`

Lines changed: 18 additions & 14 deletions

Original file line number	Diff line number	Diff line change
`@@ -933,15 +933,20 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne`
`933`	`933`	`- 345 subjects, 891 functional scans, and 27 diverse stories of varying duration totaling ~4.6 hours of unique stimuli (~43,000 words) and total collection time is ~6.4 days`
`934`	`934`	`- Preprocessed short datasets used in [AlKhamissi et al. 2025](https://arxiv.org/pdf/2503.01830) and available through [brain-score-language](https://github.com/brain-score/language/tree/main?tab=readme-ov-file)`
`935`	`935`	`- [Schoffelen et al. 2019](https://www.nature.com/articles/s41597-019-0020-y): 100 subjects recorded with fMRI and MEG, listening to de-contextualised sentences and word lists, no repeated session`
	`936`	`+ - Le Petit Prince multilingual naturalistic fMRI corpus ([li...hale, 2022](https://www.nature.com/articles/s41597-022-01625-7)) - 49 English speakers, 35 Chinese speakers and 28 French speakers listened to the same audiobook The Little Prince in their native language while fMRI was recorded`
`936`	`937`	`- [Huth et al. 2016](https://www.nature.com/articles/nature17637) released data from [one subject](https://github.com/HuthLab/speechmodeltutorial)`
`937`	`938`	`- Visual and linguistic semantic representations are aligned at the border of human visual cortex ([popham, huth et al. 2021](https://www.nature.com/articles/s41593-021-00921-6#data-availability)) - compared semantic maps obtained from two functional magnetic resonance imaging experiments in the same participants: one that used silent movies as stimuli and another that used narrative stories ([data link](https://berkeley.app.box.com/s/l95gie5xtv56zocsgugmb7fs12nujpog))`
`938`	`939`	`- MEG datasets`
`939`	`940`	`- MEG-MASC ([gwilliams...king, 2023](https://www.nature.com/articles/s41597-023-02752-5)) - 27 English-speaking subjects MEG, each ~2 hours of story listening, punctuated by random word lists and comprehension questions in the MEG scanner. Usually each subject listened to four distinct fictional stories twice`
`940`	`941`	`- WU-Minn human connectome project ([van Essen et al. 2013](https://www.nature.com/articles/s41597-022-01382-7)) - 72 subjects recorded with fMRI and MEG as part of the Human Connectome Project, listening to 10 minutes of short stories, no repeated session`
`941`	`942`	`- [Armeni et al. 2022](https://www.nature.com/articles/s41597-022-01382-7): 3 subjects recorded with MEG, listening to 10 hours of Sherlock Holmes, no repeated session`
	`943`	`+ - [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/participate/) - bunch of listening data (50+ hours) for single subject`
`942`	`944`	`- EEG`
`943`	`945`	`- [Brennan & Hale, 2019](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0207741): 33 subjects recorded with EEG, listening to 12 min of a book chapter, no repeated session`
`944`	`946`	`- [Broderick et al. 2018](https://www.cell.com/current-biology/pdf/S0960-9822(18)30146-5.pdf): 9–33 subjects recorded with EEG, conducting different speech tasks, no repeated sessions`
	`947`	`+ - DEAP: A Database for Emotion Analysis ;Using Physiological Signals ([koelstra...ebrahimi, 2012](https://ieeexplore.ieee.org/abstract/document/5871728)) - 32-channel system`
	`948`	`+ - SEED: Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks ([zheng & lu, 2015](https://ieeexplore.ieee.org/abstract/document/7104132)) - 64-channel system`
	`949`	`+ - HBN-EEG dataset ([shirazi...makeig, 2024](https://www.biorxiv.org/content/10.1101/2024.10.03.615261v2)) - EEG recordings from over 3,000 participants across six distinct cognitive tasks [used in eeg2025 NeurIPS competition]`
`945`	`950`	`- ECoG`
`946`	`951`	`- The "Podcast" ECoG dataset for modeling neural activity during`
`947`	`952`	`natural language comprehension ([zada...hasson, 2025](https://www.biorxiv.org/content/10.1101/2025.02.14.638352v1.full.pdf)) - 9 subjects listening to the same story`
`@@ -1018,19 +1023,6 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne`
`1018`	`1023`	`- joint prediction of different input/output relationships`
`1019`	`1024`	`- joint prediction of neurons from other areas`
`1020`	`1025`
`1021`		`-## eeg`
`1022`		`-`
`1023`		`-- directly model time series`
`1024`		`- - BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data ([kostas...rudzicz, 2021](https://arxiv.org/abs/2101.12037))`
`1025`		`- - Neuro-GPT: Developing A Foundation Model for EEG ([cui...leahy, 2023](https://arxiv.org/abs/2311.03764))`
`1026`		`-- model frequency bands`
`1027`		`- - EEG foundation model: Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling ([yi...dongsheng li, 2023](https://openreview.net/pdf?id=hiOUySN0ub))`
`1028`		`-- Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects ([michaelov...coulson, 2024](https://direct.mit.edu/nol/article/5/1/107/115605/Strong-Prediction-Language-Model-Surprisal))`
`1029`		`-- datasets`
`1030`		`- - DEAP: A Database for Emotion Analysis ;Using Physiological Signals ([koelstra...ebrahimi, 2012](https://ieeexplore.ieee.org/abstract/document/5871728)) - 32-channel system`
`1031`		`- - SEED: Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks ([zheng & lu, 2015](https://ieeexplore.ieee.org/abstract/document/7104132)) - 64-channel system`
`1032`		`- - HBN-EEG dataset ([shirazi...makeig, 2024](https://www.biorxiv.org/content/10.1101/2024.10.03.615261v2)) - EEG recordings from over 3,000 participants across six distinct cognitive tasks`
`1033`		`-`
`1034`	`1026`
`1035`	`1027`	`## cross-subject modeling`
`1036`	`1028`
`@@ -1039,7 +1031,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne`
`1039`	`1031`	- hyperalignment techniques have been developed in fMRI research to aggregate information across subjects into a unified information space while overcoming the misalignment of functional topographies across subjects ([Haxby et al., 2011](https://www.cell.com/neuron/fulltext/S0896-6273(15)00933-2); shared response model [Chen et al., 2015](https://proceedings.neurips.cc/paper/2015/hash/b3967a0e938dc2a6340e258630febd5a-Abstract.html); [Guntupalli...Haxby, 2016](https://academic.oup.com/cercor/article/26/6/2919/1754308); [Haxby et al., 2020](https://elifesciences.org/articles/56601); [Feilong et al., 2023](https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00032/117980))
`1040`	`1032`	`- shared response model [Chen et al., 2015](https://proceedings.neurips.cc/paper/2015/hash/b3967a0e938dc2a6340e258630febd5a-Abstract.html) - learns orthonormal, linear subject-specific transformations that map from each subject’s response space to a shared space based on a subset of training data, then uses these learned transformations to map a subset of test data into the shared space`
`1041`	`1033`
`1042`		`-# fMRI`
	`1034`	`+# language (mostly fMRI)`
`1043`	`1035`
`1044`	`1036`	`## language`
`1045`	`1037`
`@@ -1085,6 +1077,18 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne`
`1085`	`1077`	`- Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network ([kauf...andreas, fedorenko, 2024](https://direct.mit.edu/nol/article/5/1/7/116784/Lexical-Semantic-Content-Not-Syntactic-Structure)) - lexical semantic sentence content, not syntax, drive alignment.`
`1086`	`1078`	`- Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training ([hosseini...fedorenko, 2024](https://direct.mit.edu/nol/article/5/1/43/119156/Artificial-Neural-Network-Language-Models-Predict)) - models trained on a developmentally plausible amount of data (100M tokens) already align closely with human benchmarks`
`1087`	`1079`	`- Improving semantic understanding in speech language models via brain-tuning ([moussa...toneva, 2024](https://arxiv.org/abs/2410.09230))`
	`1080`	`+`
	`1081`	`+ - eeg models`
	`1082`	`+`
	`1083`	`+ - directly model time series`
	`1084`	`+`
	`1085`	`+ - BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data ([kostas...rudzicz, 2021](https://arxiv.org/abs/2101.12037))`
	`1086`	`+ - Neuro-GPT: Developing A Foundation Model for EEG ([cui...leahy, 2023](https://arxiv.org/abs/2311.03764))`
	`1087`	`+`
	`1088`	`+ - model frequency bands`
	`1089`	`+ - EEG foundation model: Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling ([yi...dongsheng li, 2023](https://openreview.net/pdf?id=hiOUySN0ub))`
	`1090`	`+`
	`1091`	`+ - Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects ([michaelov...coulson, 2024](https://direct.mit.edu/nol/article/5/1/107/115605/Strong-Prediction-Language-Model-Surprisal))`
`1088`	`1092`	`- changing experimental design`
`1089`	`1093`	`- Semantic representations during language comprehension are affected by context (i.e. how langauge is presented) ([deniz...gallant, 2021](https://www.biorxiv.org/content/10.1101/2021.12.15.472839v1.full.pdf)) - stimuli with more context (stories, sentences) evoke better responses than stimuli with little context (Semantic Blocks, Single Words)`
`1090`	`1094`	`- Combining computational controls with natural text reveals new aspects of meaning composition ([toneva, mitchell, & wehbe, 2022](https://www.biorxiv.org/content/biorxiv/early/2022/08/09/2020.09.28.316935.full.pdf)) - study word interactions by using encoding vector emb(phrase) - emb(word1) - emb(word2)...`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 03c5c2f

File tree

8 files changed

8 files changed

`‎_blog/misc/25_data_science_benchmarks.md`

`‎_blog/research/23_data_explanation.md`

`‎_blog/research/assets/interpretable_models.svg`

`‎_includes/02_notes_main.html`

`‎_includes/03_experience.html`

`‎_notes/assets/mlebench.png`

`‎_notes/neuro/comp_neuro.md`

0 commit comments