You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _blog/misc/20_ml_coding_tips.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -87,8 +87,7 @@ displays
87
87
### environment
88
88
89
89
-[vscode](https://code.visualstudio.com) (with jupyter support) is the best ide for data science
90
-
-~~it's hard to pick a good ide for data science. [jupyter](https://jupyter.org/) notebooks are great for exploratory analysis, while more fully built ides like [pycharm](https://www.jetbrains.com/pycharm/) or [vscode](https://code.visualstudio.com) are better for large-scale projects~~
91
-
-~~using [atom](https://atom.io/) with the [hydrogen](https://atom.io/packages/hydrogen) plugin often strikes a nice balance~~ (sadly no longer maintained 😢)
90
+
- it's often easier to build with interactive cells (`#%%`) rather than jupyter notebooks for non-visualization tasks, so that they are easier to convert to scripts later on
92
91
93
92
-[github copilot](https://github.com/features/copilot) is a ~~nice~~ critical add-in
94
93
-[jupytertext](https://github.com/mwouts/jupytext) offers a nice way to use version control with jupyter
@@ -166,6 +165,7 @@ displays
166
165
167
166
### data
168
167
168
+
-[skrub](https://www.youtube.com/watch?v=hdWWhwmRpbA) is an awesome data cleaning package
169
169
- cool analysis / data from BuzzFeed [here](https://github.com/BuzzFeedNews/everything)
Copy file name to clipboardExpand all lines: _notes/neuro/comp_neuro.md
+25-12Lines changed: 25 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -927,6 +927,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
927
927
- 3 of the particpants have ~20 hours (~95 stories, 33k timepoints)
928
928
- Narratives Dataset ([Nastase et al. 2019](http://fcon_1000.projects.nitrc.org/indi/retro/Narratives.html)) - more subjects, less data per subject
929
929
- 345 subjects, 891 functional scans, and 27 diverse stories of varying duration totaling ~4.6 hours of unique stimuli (~43,000 words) and total collection time is ~6.4 days
930
+
- Preprocessed short datasets used in [AlKhamissi et al. 2025](https://arxiv.org/pdf/2503.01830) and available through [brain-score-language](https://github.com/brain-score/language/tree/main?tab=readme-ov-file)
930
931
-[Schoffelen et al. 2019](https://www.nature.com/articles/s41597-019-0020-y): 100 subjects recorded with fMRI and MEG, listening to de-contextualised sentences and word lists, no repeated session
931
932
-[Huth et al. 2016](https://www.nature.com/articles/nature17637) released data from [one subject](https://github.com/HuthLab/speechmodeltutorial)
932
933
- Visual and linguistic semantic representations are aligned at the border of human visual cortex ([popham, huth et al. 2021](https://www.nature.com/articles/s41593-021-00921-6#data-availability)) - compared semantic maps obtained from two functional magnetic resonance imaging experiments in the same participants: one that used silent movies as stimuli and another that used narrative stories ([data link](https://berkeley.app.box.com/s/l95gie5xtv56zocsgugmb7fs12nujpog))
@@ -1050,23 +1051,23 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
1050
1051
- test "compositionality" of features
1051
1052
-[Tracking the online construction of linguistic meaning through negation](https://www.biorxiv.org/content/10.1101/2022.10.14.512299.abstract) (zuanazzi, ..., remi-king, poeppel, 2022)
1052
1053
-[Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains](https://arxiv.org/abs/1906.01539) (abnar, ... zuidema, emnlp workshop, 2019) - use RSA to compare representations from language models with fMRI data from Wehbe et al. 2014
1053
-
-[Evidence of a predictive coding hierarchy in the human brain listening to speech](https://www.nature.com/articles/s41562-022-01516-2) (caucheteux, gramfot, & king, 2023)
1054
+
- Evidence of a predictive coding hierarchy in the human brain listening to speech ([caucheteux, gramfot, & king, 2023](https://www.nature.com/articles/s41562-022-01516-2))
1054
1055
- encoding models
1055
1056
1056
1057
- Seminal language-semantics fMRI study ([huth...gallant, 2016](https://www.nature.com/articles/nature17637)) - build mapping of semantic concepts across cortex using word vecs
1057
1058
- Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions ([benara et al. 2024](https://openreview.net/pdf?id=mxMvWwyBWe))
1058
1059
- A generative framework to bridge data-driven models and scientific theories in language neuroscience ([antonello et al. 2024](https://arxiv.org/abs/2410.00812))
1059
1060
- Explanations of Deep Language Models Explain Language
1060
1061
Representations in the Brain ([rahimi...daliri, 2025](https://arxiv.org/pdf/2502.14671)) - build features using attribution methods and find some small perf. improvements in early language areas
1061
-
- Deep language algorithms predict semantic comprehension from brain activity [(caucheteux, gramfort, & king, facebook, 2022)](https://www.nature.com/articles/s41598-022-20460-9) - predicts fMRI with gpt-2 on the narratives dataset
1062
+
- Deep language algorithms predict semantic comprehension from brain activity )([caucheteux, gramfort, & king, facebook, 2022](https://www.nature.com/articles/s41598-022-20460-9)) - predicts fMRI with gpt-2 on the narratives dataset
1062
1063
- GPT‐2 representations predict fMRI response + extent to which subjects understand corresponding narratives
1063
1064
- compared different encoding features: phoneme, word, gpt-2 layers, gpt-2 attention sizes
1064
1065
- brain mapping finding: auditory cortices integrate information over short time windows, and the fronto-parietal areas combine supra-lexical information over long time windows
-[Disentangling syntax and semantics in the brain with deep networks](https://proceedings.mlr.press/v139/caucheteux21a.html) (caucheteux, gramfort, & king, 2021) - identify which brain networks are involved in syntax, semantics, compositionality
1067
-
-[Incorporating Context into Language Encoding Models for fMRI](https://proceedings.neurips.cc/paper/2018/hash/f471223d1a1614b58a7dc45c9d01df19-Abstract.html) (jain & huth, 2018) - LSTMs improve encoding model
1068
-
-[The neural architecture of language: Integrative modeling converges on predictive processing](https://www.pnas.org/doi/abs/10.1073/pnas.2105646118) (schrimpf, .., tenenbaum, fedorenko, 2021) - transformers better predict brain responses to natural language (and larger transformers predict better)
1069
-
-[Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data | Neurobiology of Language](https://direct.mit.edu/nol/article/doi/10.1162/nol_a_00087/113632/Predictive-Coding-or-Just-Feature-Discovery-An)(antonello & huth, 2022)
- Disentangling syntax and semantics in the brain with deep networks ([caucheteux, gramfort, & king, 2021](https://proceedings.mlr.press/v139/caucheteux21a.html)) - identify which brain networks are involved in syntax, semantics, compositionality
1068
+
- Incorporating Context into Language Encoding Models for fMRI ([jain & huth, 2018](https://proceedings.neurips.cc/paper/2018/hash/f471223d1a1614b58a7dc45c9d01df19-Abstract.html)) - LSTMs improve encoding model
1069
+
- The neural architecture of language: Integrative modeling converges on predictive processing ([schrimpf, .., tenenbaum, fedorenko, 2021](https://www.pnas.org/doi/abs/10.1073/pnas.2105646118)) - transformers better predict brain responses to natural language (and larger transformers predict better)
1070
+
- Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data ([antonello & huth, 2022](https://direct.mit.edu/nol/article/doi/10.1162/nol_a_00087/113632/Predictive-Coding-or-Just-Feature-Discovery-An) )
1070
1071
- LLM brain encoding performance correlates not only with their perplexity, but also generality (skill at many different tasks) and translation performance
1071
1072
- Prediction with RNN beats ngram models on individual-sentence fMRI prediction ([anderson...lalor, 2021](https://www.jneurosci.org/content/41/18/4100))
1072
1073
- Interpret transformer-based models and find top predictions in specific regions, like left middle temporal gyrus (LMTG) and left occipital complex (LOC) ([sun et al. 2021](https://ieeexplore.ieee.org/document/9223750/))
@@ -1086,7 +1087,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
1086
1087
- Multilingual Computational Models Reveal Shared Brain Responses to 21 Languages ([gregor de varda, malik-moraleda...tuckute, fedorenko, 2025](https://www.biorxiv.org/content/10.1101/2025.02.01.636044v1))
1087
1088
- Constructed languages are processed by the same brain mechanisms as natural languages ([malik-moraleda...fedorenko, 2023](https://www.biorxiv.org/content/10.1101/2023.07.28.550667v2))
1088
1089
1089
-
## semantic decoding
1090
+
## semantic decoding / bmi
1090
1091
1091
1092
- duality between encoding and decoding (e.g. for probing smth like syntax in LLM)
1092
1093
- esp. when things are localized like in fMRI
@@ -1099,7 +1100,7 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
1099
1100
- Semantic reconstruction of continuous language from non-invasive brain recordings ([tang, lebel, jain, & huth, 2023](https://www.nature.com/articles/s41593-023-01304-9)) - reconstruct continuous natural language from fMRI, including to imagined speech
1100
1101
- Brain-to-Text Decoding: A Non-invasive Approach via Typing ([levy...king, 2025](https://scontent.fphl1-1.fna.fbcdn.net/v/t39.2365-6/475464888_600710912891423_9108680259802499048_n.pdf?_nc_cat=102&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=EryvneL7DMcQ7kNvgFI6M7D&_nc_oc=Adi15_Ln_aPZ_nUY7RyiXzmEzdKu0opFDIwv3J7P55siQ-yn-FUdKQ6_H6PZBKiwBiY&_nc_zt=14&_nc_ht=scontent.fphl1-1.fna&_nc_gid=A441zcs56M0HTpo4ZEEWBSk&oh=00_AYAZ7fX4RhYWqMu2aMria3GoOB6uMNIiIciUQzU0vXy3Tw&oe=67AC0C96)) - decode characters typed from MEG/EEG
1101
1102
- From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production ([zhang, levy, ...king, 2025](https://ai.meta.com/research/publications/from-thought-to-action-how-a-hierarchy-of-neural-dynamics-supports-language-production/)) - when decoding during typing, first decode phrase, then word, then syllable, then letter
- Decoding the Semantic Content of Natural Movies from Human Brain Activity ([huth...gallant, 2016](https://www.frontiersin.org/journals/systems-neuroscience/articles/10.3389/fnsys.2016.00081/full)) - direct decoding of concepts from movies using hierarchical logistic regression
@@ -1111,6 +1112,9 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
1111
1112
- Aligning brain functions boosts the decoding of visual semantics in novel subjects ([thual...king, 2023](https://arxiv.org/abs/2312.06467)) - align across subjects before doing decoding
1112
1113
- A variational autoencoder provides novel, data-driven features that explain functional brain representations in a naturalistic navigation task ([cho, zhang, & gallant, 2023](https://jov.arvojournals.org/article.aspx?articleid=2792546))
1113
1114
- What's the Opposite of a Face? Finding Shared Decodable Concepts and their Negations in the Brain ([efird...fyshe, 2024](https://arxiv.org/abs/2405.17663)) - build clustering shared across subjects in CLIP space
1115
+
- bmi
1116
+
- Accelerated learning of a noninvasive human brain-computer interface via manifold geometry ([busch...turk-brown, 2025](https://www.biorxiv.org/content/10.1101/2025.03.29.646109v1)) - train subjects to control avatar navigation through fMRI, then perturb environment and evaluate decoder
1117
+
1114
1118
1115
1119
1116
1120
## theories of explanation
@@ -1124,12 +1128,21 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
1124
1128
1125
1129
## speech / ECoG
1126
1130
1131
+
- A streaming brain-to-voice neuroprosthesis to restore naturalistic communication ([littlejohn...chang, anumanchipalli, 2025](https://www.nature.com/articles/s41593-025-01905-6)) - nearly realtime ECoG decoding of text production
1132
+
1127
1133
- Improving semantic understanding in speech language models via brain-tuning ([moussa, klakow, & toneva, 2024](https://arxiv.org/abs/2410.09230))
1128
1134
- BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language ([vattikonda, vaidya, antonello, & huth, 2025](https://arxiv.org/abs/2502.08866))
1129
1135
1130
-
- A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations ([zada...hasson, 2024](https://www.cell.com/neuron/fulltext/S0896-6273(24)00460-4))
1131
-
- previous inter-subject correlation analyses directly map between speaker’s brain activity & listener’s brain activity during communication
1132
-
- this work adds a semantic feature space to predict speaker/listener activity & partitions predicting the other person’s brain activity from these
1136
+
- see hasson lab + google overview blog post [here](https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/)
1137
+
- A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations ([goldstein...hasson, 2025](https://www.nature.com/articles/s41562-025-02105-9))
1138
+
- predict ECoG during both comprehension & production using speech embeddings & text embeddings - shows which areas are involved when between language and motor stuff
1139
+
1140
+
- A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations ([zada...hasson, 2024](https://www.cell.com/neuron/fulltext/S0896-6273(24)00460-4))
1141
+
- previous inter-subject correlation analyses directly map between speaker’s brain activity & listener’s brain activity during communication
1142
+
- this work adds a semantic feature space to predict speaker/listener activity & partitions predicting the other person’s brain activity from these
1143
+
1144
+
- Shared computational principles for language processing in humans and deep language models ([goldstein...hasson, 2022](https://www.nature.com/articles/s41593-022-01026-4)) - predict ECoG responses to podcasts from DL embeddings
0 commit comments