Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 03c5c2f

Browse files
committed
notes updates
1 parent 1a297b5 commit 03c5c2f

File tree

8 files changed

+101
-55
lines changed

8 files changed

+101
-55
lines changed

‎_blog/misc/25_data_science_benchmarks.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,30 +12,30 @@ Some benchmarks focusing on getting insight directly from data using LLMs / LLM
1212
- target output for every task is a self-contained Python file
1313
- each task has (a) task instruction, (b) dataset info, (c) expert-provided info and (d) a groundtruth annotated program
1414

15-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.19.17%E2%80%AFPM.png" class="noninverted full_image"/>
15+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.19.17%E2%80%AFPM.png" class="big_image"/>
1616

1717
**AutoSDT**: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists ([li...huan sun, 2025](https://arxiv.org/abs/2506.08140)) - 5k scientific coding tasks automatically scraped from github repos for papers (as a sanity check, they manually verified that a subset were reasonable)
1818

19-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.22.52%E2%80%AFPM.png" class="noninverted full_image"/>
19+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.22.52%E2%80%AFPM.png" class="big_image"/>
2020

2121
**DiscoveryBench**: Towards Data-Driven Discovery with Large Language Models ([majumder...clark, 2024](https://arxiv.org/abs/2407.01725)) - 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from papers
2222
- each task has datasets, metadata, natural-language discovery goal
2323

24-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.18.31%E2%80%AFPM.png" class="noninverted full_image"/>
24+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%202.18.31%E2%80%AFPM.png" class="big_image"/>
2525

2626
**BLADE**: Benchmarking Language Model Agents for Data-Driven Science ([gu...althoff, 2024](https://arxiv.org/pdf/2408.09667)) - 12 tasks, each has a (fairly open-ended) research question, dataset, and groundtruth expert-conducted analysis
2727

28-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.22.04%E2%80%AFPM.png" class="noninverted full_image"/>
28+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.22.04%E2%80%AFPM.png" class="big_image"/>
2929

3030
**Mlagentbench**: Benchmarking LLMs As AI Research Agents ([huang, vora, liang, & leskovec, 2023](https://arxiv.org/abs/2310.03302v2)) - 13 prediction tasks, e.g. CIFAR-10, BabyLM, kaggle (evaluate via test prediction perf.)
3131

32-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.02.49%E2%80%AFPM.png" class="noninverted full_image"/>
32+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.02.49%E2%80%AFPM.png" class="big_image"/>
3333

3434
**IDA-Bench**: Evaluating LLMs on Interactive Guided Data Analysis ([li...jordan, 2025](https://arxiv.org/pdf/2505.18223)) - scraped 25 notebooks from recent kaggle competitions, parse into goal + reference insights that incorporate domain knowledge
3535
- paper emphasizes interactive setting: evaluates by using the instruction materials to build a knowledgeable user simulator and then tests data science agents' ability to help the user simulator improve predictive performance
3636

37-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.39.46%E2%80%AFPM.png" class="noninverted full_image"/>
37+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%204.39.46%E2%80%AFPM.png" class="big_image"/>
3838

3939
**InfiAgent-DABench**: Evaluating Agents on Data Analysis Tasks ([hu...wu, 2024](https://arxiv.org/abs/2401.05507)) - 257 precise (relatively easy) questions that can be answered from 1 of 52 csv datasets
4040

41-
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%203.53.53%E2%80%AFPM.png" class="noninverted full_image"/>
41+
<img src="{{ site.baseurl }}/notes/assets/Screenshot%202025-06-19%20at%203.53.53%E2%80%AFPM.png" class="big_image"/>

‎_blog/research/23_data_explanation.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Many interpretable models have been proposed to interpret data involved in predi
1717
| :----------------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------: | :----------------------------------------------------------: |
1818
| <img src="https://csinva.io/imodels/img/rule_set.jpg" class="full_image"> | <img src="https://csinva.io/imodels/img/rule_list.jpg" class="full_image"> | <img src="https://csinva.io/imodels/img/rule_tree.jpg" class="full_image"> | <img src="https://csinva.io/imodels/img/algebraic_models.jpg" class="full_image"> |
1919

20-
<p align="center"><b>Figure 1. </b>Different types of interpretable models. See scikit-learn friendly implementations [here](https://github.com/csinva/imodels),</p>
20+
<p align="center"><b>Figure 1. </b>Different types of interpretable models. See scikit-learn friendly implementations in the <ahref="https://github.com/csinva/imodels">imodels package</a>.</p>
2121

2222
# Adding LMs to interpretable models
2323

@@ -26,12 +26,13 @@ Fig 2 shows some newer model forms that seek data explanations using LMs/ interp
2626
In the most direct case, an LM is fed data corresponding to 2 groups (binary classification) and prompted to directly produce a description of the difference between the groups ([D3](https://proceedings.mlr.press/v162/zhong22a.html)/[D5](https://arxiv.org/abs/2302.14233)).
2727
Alternatively, given a dataset and a pre-trained LM, [iPrompt](https://arxiv.org/abs/2210.01848) searches for a natural-language prompt that works well to predict on the dataset, which serves as a description of the data. This is more general than D3, as it is not restricted to binary groups, but is also more computationally intensive, as finding a good prompt often requires iterative LM calls.
2828
Either of these approaches can also be applied recursively ([TreePrompt](https://arxiv.org/abs/2310.14034)), resulting in a hierarchical natural-language description of the data.
29+
Alternatively, many LLM answers to different questions can be concatenated into an embedding ([QA-Emb](https://arxiv.org/abs/2405.16714)), potentially incorporating bayesian iteration ([BC-LLM](https://arxiv.org/abs/2410.15555)), which can then be used to train a fully interpretable model, e.g. a linear model.
2930

3031
<img src="assets/interpretable_models.svg" class="full_image">
31-
<p align="center" style="margin-top:-20px"><b>Figure 2. </b>Different types of interpretable models, with text-specific approaches in bold.</p>
32+
<p align="center" style="margin-top:-20px"><b>Figure 2. </b>Different types of interpretable models, with text-specific approaches in bold. See scikit-learn friendly implementations in the <ahref="https://github.com/csinva/imodelsX">imodelsX package</a>.</p>
3233

3334
In parallel to these methods, [Aug-imodels](https://arxiv.org/abs/2209.11799) use LMs to improve fully interpretable models directly.
3435
For example, Aug-Linear uses an LM to augment a linear model, resulting in a more accurate model that is still completely interpretable.
35-
Aug-Tree uses an LM to augment the keyphrases used in a decision tree split, resulting in a more accurate but still fully interpretable decsion tree.
36+
Aug-Tree uses an LM to augment the keyphrases used in a decision tree split, resulting in a more accurate but still fully interpretable decision tree.
3637

37-
This line of research is still in its infancy, but there is great potential in combining LMs and interpretable models!
38+
This line of research is still in its infancy -- there's a lot to be done in combining LMs and interpretable models!

‎_blog/research/assets/interpretable_models.svg

Lines changed: 1 addition & 1 deletion
Loading[フレーム]

‎_includes/02_notes_main.html

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,17 +42,16 @@ <h3 align="center">research posts</h3>
4242
benchmarks
4343
</a> '25
4444
</li>
45+
<li>
46+
<a href="{{ site.baseurl }}/blog/research/23_data_explanation"
47+
style="font-size:medium; font-weight: bolder;"> interpretable text models (with LLMs)
48+
</a> '24 ⭐
49+
</li>
4550
<li>
4651
<a href="{{ site.baseurl }}/blog/misc/24_tensor_product_repr" style="font-size:medium"> tensor product
4752
representations
4853
</a> '24
4954
</li>
50-
<li>
51-
<a href="{{ site.baseurl }}/blog/research/23_data_explanation" style="font-size:medium"> explaining text
52-
data
53-
with LLMs
54-
</a> '23
55-
</li>
5655
<li><a href="{{ site.baseurl }}/blog/misc/23_paper_writing_tips" style="font-size:medium"> paper-writing tips
5756
</a> '23
5857
</li>

‎_includes/03_experience.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ <h2 style="text-align: center;margin-top: -150px;"> experience </h2>
140140
href="https://arxiv.org/abs/2411.00066">The generalized induction head</a>]</li>
141141
<li><a href="https://robinwu218.github.io/">Ziyang Wu</a> ('24) [<a
142142
href="https://arxiv.org/abs/2502.10385">Simplifying DINO</a>]</li>
143-
<li><a href="https://drogozhang.github.io/">Kai Zhang</a> ('24) [<a
144-
href="https://arxiv.org/abs/2503.10857">Evaluating LMM graphical perception</a>]</li>
143+
<!-- <li><a href="https://drogozhang.github.io/">Kai Zhang</a> ('24) [<a -->
144+
<!-- href="https://arxiv.org/abs/2503.10857">Evaluating LMM graphical perception</a>]</li> -->
145145
<li><a href="https://vsahil.github.io/">Sahil Verma</a> ('24) [<a
146146
href="https://arxiv.org/abs/2505.23856">OmniGuard</a>]</li>
147147
<li><a href="https://www.linkedin.com/in/yufan-zhuang/">Yufan Zhuang</a> ('23) [<a

‎_notes/assets/mlebench.png

132 KB
Loading[フレーム]

‎_notes/neuro/comp_neuro.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -933,15 +933,20 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
933933
- 345 subjects, 891 functional scans, and 27 diverse stories of varying duration totaling ~4.6 hours of unique stimuli (~43,000 words) and total collection time is ~6.4 days
934934
- Preprocessed short datasets used in [AlKhamissi et al. 2025](https://arxiv.org/pdf/2503.01830) and available through [brain-score-language](https://github.com/brain-score/language/tree/main?tab=readme-ov-file)
935935
- [Schoffelen et al. 2019](https://www.nature.com/articles/s41597-019-0020-y): 100 subjects recorded with fMRI and MEG, listening to de-contextualised sentences and word lists, no repeated session
936+
- Le Petit Prince multilingual naturalistic fMRI corpus ([li...hale, 2022](https://www.nature.com/articles/s41597-022-01625-7)) - 49 English speakers, 35 Chinese speakers and 28 French speakers listened to the same audiobook *The Little Prince* in their native language while fMRI was recorded
936937
- [Huth et al. 2016](https://www.nature.com/articles/nature17637) released data from [one subject](https://github.com/HuthLab/speechmodeltutorial)
937938
- Visual and linguistic semantic representations are aligned at the border of human visual cortex ([popham, huth et al. 2021](https://www.nature.com/articles/s41593-021-00921-6#data-availability)) - compared semantic maps obtained from two functional magnetic resonance imaging experiments in the same participants: one that used silent movies as stimuli and another that used narrative stories ([data link](https://berkeley.app.box.com/s/l95gie5xtv56zocsgugmb7fs12nujpog))
938939
- MEG datasets
939940
- MEG-MASC ([gwilliams...king, 2023](https://www.nature.com/articles/s41597-023-02752-5)) - 27 English-speaking subjects MEG, each ~2 hours of story listening, punctuated by random word lists and comprehension questions in the MEG scanner. Usually each subject listened to four distinct fictional stories twice
940941
- WU-Minn human connectome project ([van Essen et al. 2013](https://www.nature.com/articles/s41597-022-01382-7)) - 72 subjects recorded with fMRI and MEG as part of the Human Connectome Project, listening to 10 minutes of short stories, no repeated session
941942
- [Armeni et al. 2022](https://www.nature.com/articles/s41597-022-01382-7): 3 subjects recorded with MEG, listening to 10 hours of Sherlock Holmes, no repeated session
943+
- [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/participate/) - bunch of listening data (50+ hours) for single subject
942944
- EEG
943945
- [Brennan & Hale, 2019](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0207741): 33 subjects recorded with EEG, listening to 12 min of a book chapter, no repeated session
944946
- [Broderick et al. 2018](https://www.cell.com/current-biology/pdf/S0960-9822(18)30146-5.pdf): 9–33 subjects recorded with EEG, conducting different speech tasks, no repeated sessions
947+
- DEAP: A Database for Emotion Analysis ;Using Physiological Signals ([koelstra...ebrahimi, 2012](https://ieeexplore.ieee.org/abstract/document/5871728)) - 32-channel system
948+
- SEED: Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks ([zheng & lu, 2015](https://ieeexplore.ieee.org/abstract/document/7104132)) - 64-channel system
949+
- HBN-EEG dataset ([shirazi...makeig, 2024](https://www.biorxiv.org/content/10.1101/2024.10.03.615261v2)) - EEG recordings from over 3,000 participants across six distinct cognitive tasks [used in eeg2025 NeurIPS competition]
945950
- ECoG
946951
- The "Podcast" ECoG dataset for modeling neural activity during
947952
natural language comprehension ([zada...hasson, 2025](https://www.biorxiv.org/content/10.1101/2025.02.14.638352v1.full.pdf)) - 9 subjects listening to the same story
@@ -1018,19 +1023,6 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
10181023
- joint prediction of different input/output relationships
10191024
- joint prediction of neurons from other areas
10201025

1021-
## eeg
1022-
1023-
- directly model time series
1024-
- BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data ([kostas...rudzicz, 2021](https://arxiv.org/abs/2101.12037))
1025-
- Neuro-GPT: Developing A Foundation Model for EEG ([cui...leahy, 2023](https://arxiv.org/abs/2311.03764))
1026-
- model frequency bands
1027-
- EEG foundation model: Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling ([yi...dongsheng li, 2023](https://openreview.net/pdf?id=hiOUySN0ub))
1028-
- Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects ([michaelov...coulson, 2024](https://direct.mit.edu/nol/article/5/1/107/115605/Strong-Prediction-Language-Model-Surprisal))
1029-
- datasets
1030-
- DEAP: A Database for Emotion Analysis ;Using Physiological Signals ([koelstra...ebrahimi, 2012](https://ieeexplore.ieee.org/abstract/document/5871728)) - 32-channel system
1031-
- SEED: Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks ([zheng & lu, 2015](https://ieeexplore.ieee.org/abstract/document/7104132)) - 64-channel system
1032-
- HBN-EEG dataset ([shirazi...makeig, 2024](https://www.biorxiv.org/content/10.1101/2024.10.03.615261v2)) - EEG recordings from over 3,000 participants across six distinct cognitive tasks
1033-
10341026

10351027
## cross-subject modeling
10361028

@@ -1039,7 +1031,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
10391031
- hyperalignment techniques have been developed in fMRI research to aggregate information across subjects into a unified information space while overcoming the misalignment of functional topographies across subjects ([Haxby et al., 2011](https://www.cell.com/neuron/fulltext/S0896-6273(15)00933-2); shared response model [Chen et al., 2015](https://proceedings.neurips.cc/paper/2015/hash/b3967a0e938dc2a6340e258630febd5a-Abstract.html); [Guntupalli...Haxby, 2016](https://academic.oup.com/cercor/article/26/6/2919/1754308); [Haxby et al., 2020](https://elifesciences.org/articles/56601); [Feilong et al., 2023](https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00032/117980))
10401032
- shared response model [Chen et al., 2015](https://proceedings.neurips.cc/paper/2015/hash/b3967a0e938dc2a6340e258630febd5a-Abstract.html) - learns orthonormal, linear subject-specific transformations that map from each subject’s response space to a shared space based on a subset of training data, then uses these learned transformations to map a subset of test data into the shared space
10411033

1042-
# fMRI
1034+
# language (mostly fMRI)
10431035

10441036
## language
10451037

@@ -1085,6 +1077,18 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
10851077
- Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network ([kauf...andreas, fedorenko, 2024](https://direct.mit.edu/nol/article/5/1/7/116784/Lexical-Semantic-Content-Not-Syntactic-Structure)) - lexical semantic sentence content, not syntax, drive alignment.
10861078
- Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training ([hosseini...fedorenko, 2024](https://direct.mit.edu/nol/article/5/1/43/119156/Artificial-Neural-Network-Language-Models-Predict)) - models trained on a developmentally plausible amount of data (100M tokens) already align closely with human benchmarks
10871079
- Improving semantic understanding in speech language models via brain-tuning ([moussa...toneva, 2024](https://arxiv.org/abs/2410.09230))
1080+
1081+
- eeg models
1082+
1083+
- directly model time series
1084+
1085+
- BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data ([kostas...rudzicz, 2021](https://arxiv.org/abs/2101.12037))
1086+
- Neuro-GPT: Developing A Foundation Model for EEG ([cui...leahy, 2023](https://arxiv.org/abs/2311.03764))
1087+
1088+
- model frequency bands
1089+
- EEG foundation model: Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling ([yi...dongsheng li, 2023](https://openreview.net/pdf?id=hiOUySN0ub))
1090+
1091+
- Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects ([michaelov...coulson, 2024](https://direct.mit.edu/nol/article/5/1/107/115605/Strong-Prediction-Language-Model-Surprisal))
10881092
- changing experimental design
10891093
- Semantic representations during language comprehension are affected by context (i.e. how langauge is presented) ([deniz...gallant, 2021](https://www.biorxiv.org/content/10.1101/2021.12.15.472839v1.full.pdf)) - stimuli with more context (stories, sentences) evoke better responses than stimuli with little context (Semantic Blocks, Single Words)
10901094
- Combining computational controls with natural text reveals new aspects of meaning composition ([toneva, mitchell, & wehbe, 2022](https://www.biorxiv.org/content/biorxiv/early/2022/08/09/2020.09.28.316935.full.pdf)) - study word interactions by using encoding vector emb(phrase) - emb(word1) - emb(word2)...

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /