Commit 89fca1b

committed

update reasoning notes

1 parent d38eddd commit 89fca1bCopy full SHA for 89fca1b

File tree

1 file changed

+69

-39

lines changed

_notes/research_ovws
- ovw_llms.md

1 file changed

+69

-39

lines changed

`‎_notes/research_ovws/ovw_llms.md`

Lines changed: 69 additions & 39 deletions

Original file line number	Diff line number	Diff line change
`@@ -84,34 +84,6 @@ over time, ML has bounced from feature-engineering -> *architecture engineerin`
`84`	`84`	`## chain-of-thought`
`85`	`85`
`86`	`86`	`- [optimizing CoT papers](https://www.aussieai.com/research/cot-optimization#concise)`
`87`		`-- understanding chain-of-thought and its faithfulness`
`88`		`- - Faithful Chain-of-Thought Reasoning ([yu et al. 2023](https://arxiv.org/abs/2301.13379))`
`89`		`- - Contrastive Chain-of-Thought Prompting ([chia...bing, 2023](https://arxiv.org/abs/2311.09277))`
`90`		`- - Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks ([chen et al. 2022](https://arxiv.org/abs/2211.12588))`
`91`		`- - Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning ([chen...gao, 2024](https://arxiv.org/abs/2401.13986))`
`92`		`- - How Interpretable are Reasoning Explanations from Prompting Large Language Models? ([yeo...cambria, 2024](https://arxiv.org/abs/2402.11863))`
`93`		`- - Critiques`
`94`		`- - Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations ([yanda chen, zhong, ..., steinhardt, yu, mckeown, 2023](https://arxiv.org/abs/2307.08678))`
`95`		`- - Benchmarking and Improving Generator-Validator Consistency of Language Models ([lisa li...liang, 2023](https://arxiv.org/abs/2310.01846))`
`96`		`- - The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning ([ye & durrett, 2022](https://proceedings.neurips.cc/paper_files/paper/2022/file/c402501846f9fe03e2cac015b3f0e6b1-Paper-Conference.pdf))`
`97`		`- - Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting ([turpin, ..., bowman, 2023](https://arxiv.org/abs/2305.04388))`
`98`		`- - CoT explanations can be heavily influenced by biasing the model towards certain answers, thereby yielding invalid explanations`
`99`		`- - try biasing in 2 ways: answer is always (A), or setting where prompt suggests a certain answer`
`100`		`- - Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs ([chen, ..., bowman, cho, 2023](https://arxiv.org/abs/2305.14279)) - models fail at these 2 tasks:`
`101`		`- - hypothetical consistency (the ability for a model to predict what its output would be in a hypothetical other context)`
`102`		`- - compositional consistency (consistency of a model's outputs for a compositional task even when an intermediate step is replaced with the model's output for that step)`
`103`		`- - Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models ([xiong...lakkaraju, 2025](https://arxiv.org/abs/2505.13774))`
`104`		`- - faithfulness metric = model sensitivity to removing some of the explanation`
`105`		`- - Question Decomposition Improves the Faithfulness of Model-Generated Reasoning ([anthropic, 2023](https://www-files.anthropic.com/production/files/question-decomposition-improves-the-faithfulness-of-model-generated-reasoning.pdf)) - introduce factored decomposition to improve faithfulness metric`
`106`		`- - Measuring Faithfulness in Chain-of-Thought Reasoning ([anthropic, 2023](https://www-files.anthropic.com/production/files/measuring-faithfulness-in-chain-of-thought-reasoning.pdf)) - in addition to just removing some of the explanation, also add mistakes to it / paraphrase it`
`107`		`- - larger models become less faithful by this metric`
`108`		`- - Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI ([sia...zettlemoyer, mathias, 2023](https://ojs.aaai.org/index.php/AAAI/article/view/26174))`
`109`		`- - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals ([elazar...sameer singh, noah smith, 2023](https://arxiv.org/pdf/2311.09605.pdf))`
`110`		`- - Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals ([gat...reichart, 2023](https://arxiv.org/abs/2310.00603))`
`111`		`- - Counterfactually Aware Fair Text Generation ([banerjee...bhatia, 2023](https://arxiv.org/abs/2311.05451))`
`112`		`- - Causal Proxy Models for Concept-based Model Explanations ([wu...potts, 2023](https://proceedings.mlr.press/v202/wu23b.html))`
`113`		`- - Evaluating Models' Local Decision Boundaries via Contrast Sets ([gardner...zhou, 2020](https://arxiv.org/abs/2004.02709))`
`114`		`- - Are LLMs Post Hoc Explainers? ([kroeger...lakkaraju, 2023](https://arxiv.org/abs/2310.05797))`
`115`	`87`	`- Chain-of-Thought Prompting ([wei et al. 2022](https://arxiv.org/abs/2201.11903)): in few-shot prompts, don't just provide answer but also reasoning`
`116`	`88`	`- model outputs reasoning + answer, leading to improved performance`
`117`	`89`	`- Self-Discover: LLMs Self-Compose Reasoning Structures ([zhou...le...zheng, 2024](https://arxiv.org/abs/2402.03620)) - LLMs come up with their own step-by-step structure for a task`
`@@ -124,7 +96,6 @@ over time, ML has bounced from feature-engineering -> *architecture engineerin`
`124`	`96`	`- SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning ([miao, teh, & rainforth, 2023](https://arxiv.org/abs/2308.00436))`
`125`	`97`	`- EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning ([mekala...sameer singh, 2023](https://arxiv.org/pdf/2309.10687.pdf)) - replace let's think step by step with Let's repeat the question and also think step by step`
`126`	`98`	`- Let's Think Dot by Dot: Hidden Computation in Transformer Language Models ([pfau, merrill, & bowman, 2024](https://arxiv.org/abs/2404.15758))`
`127`		`- - Coconut: Training Large Language Models to Reason in a Continuous Latent Space ([hao...weston, tian, 2024](https://arxiv.org/abs/2412.06769)) - requires some extra finetuning`
`128`	`99`	`- Show Your Work: Scratchpads for Intermediate Computation with Language Models ([nye et al. 2021](https://arxiv.org/abs/2112.00114))`
`129`	`100`	`- selection inference ([creswell et al. 2022](https://arxiv.org/abs/2205.09712)) - generate set of facts, then iteratively generate inferences from the facts to yield the final answer`
`130`	`101`	`- least-to-most prompting ([zhou...quoc le et al. 2022](https://arxiv.org/abs/2205.10625)) - prompt LLM with context showing how to reduce into subproblems; then LLM sequentially solves the subproblems, using the previous answers`
`@@ -177,8 +148,7 @@ over time, ML has bounced from feature-engineering -> *architecture engineerin`
`177`	`148`	`- Calibrate Before Use: Improving Few-Shot Performance of Language Models ([zhao, ..., dan klein, sameer singh, 2021](https://arxiv.org/abs/2102.09690)) - to make prompting easier, first calibrate output distr by making it uniform when given null inputs, e.g. "N/A"`
`178`	`149`	`- Minimum Bayes Risk Decoding ([suzgun, ..., jurafsky, 2022](https://arxiv.org/abs/2211.07634)) or ([freitag et al. 2022](https://arxiv.org/pdf/2111.09388.pdf))`
`179`	`150`	`- A Frustratingly Simple Decoding Method for Neural Text Generation ([yang, ..., shi, 2023](https://arxiv.org/abs/2305.12675)) - build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated`
`180`		`- - Mixture of Inputs: Text Generation Beyond Discrete Token Sampling ([zhuang, liu, singh, shang, & gao, 2025](https://arxiv.org/abs/2505.14827))`
`181`		`- - Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space ([zhang...shen, xin eric wang, 2025](https://arxiv.org/abs/2505.15778))`
	`151`	`+ - Mixture of Inputs: Text Generation Beyond Discrete Token Sampling ([zhuang, liu, singh, shang, & gao, 2025](https://arxiv.org/abs/2505.14827)) - post-hoc (requires no finetuning)`
`182`	`152`
`183`	`153`	`## prompt chaining / ensembling`
`184`	`154`
`@@ -298,18 +268,20 @@ over time, ML has bounced from feature-engineering -> *architecture engineerin`
`298`	`268`	`- Scalable MatMul-free Language Modeling ([zhu...eshraghian, 2024](https://arxiv.org/abs/2406.02528)) - LM architecture that doesn't use matmuls, builds on GRU, and shows improved efficiency on FPGAs`
`299`	`269`	`- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits ([ma...wei, 2024](https://arxiv.org/abs/2402.17764))`
`300`	`270`	`- BitNet: Scaling 1-bit Transformers for Large Language Models ([wang...wei, 2023](https://arxiv.org/abs/2310.11453))`
`301`		`-`
`302`	`271`	`- Misc`
`303`	`272`	`- Tree Transformer: Integrating Tree Structures into Self-Attention ([wang, .., chen, 2019](https://arxiv.org/pdf/1909.06639.pdf))`
`304`	`273`	`- Waveformer: Linear-Time Attention with Forward and Backward Wavelet Transform ([zhuang...shang, 2022](https://arxiv.org/abs/2210.01989))`
`305`	`274`	`- White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? ([yaodong yu...yi ma, 2023](https://arxiv.org/abs/2311.13110))`
`306`	`275`
`307`		`-- Diffusion models`
`308`		`- - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution ([lou, meng, & ermon, 2024](https://arxiv.org/abs/2310.16834)) - model $p(\text{altered text}) / p(\text{orig text}),ドル and make alterations using word swaps at individual locations`
`309`		`- - From Denoising Diffusions to Denoising Markov Models ([benton...doucet, 2024](https://arxiv.org/abs/2211.03595))`
`310`		`- - Not clear that these are better than just iteratively masking/replacing a word with BERT`
`311`		`- - Energy-Based Diffusion Language Models for Text Generation ([xu...leskovec, ermon, & vahdat, 2024](https://arxiv.org/abs/2410.21357))`
`312`		`- - LLaDA: Large Language Diffusion Models ([nie, ..., li, 2025](https://arxiv.org/abs/2502.09992))`
	`276`	`+## diffusion models`
	`277`	`+`
	`278`	`+- Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution ([lou, meng, & ermon, 2024](https://arxiv.org/abs/2310.16834)) - model $p(\text{altered text}) / p(\text{orig text}),ドル and make alterations using word swaps at individual locations`
	`279`	`+ - From Denoising Diffusions to Denoising Markov Models ([benton...doucet, 2024](https://arxiv.org/abs/2211.03595))`
	`280`	`+ - Not clear that these are better than just iteratively masking/replacing a word with BERT`
	`281`	`+- Energy-Based Diffusion Language Models for Text Generation ([xu...leskovec, ermon, & vahdat, 2024](https://arxiv.org/abs/2410.21357))`
	`282`	`+- LLaDA: Large Language Diffusion Models ([nie, ..., li, 2025](https://arxiv.org/abs/2502.09992))`
	`283`	`+- Esoteric Language Models ([sahoo...vahdat, 2025](https://arxiv.org/abs/2506.01928)) - bridge AR and masked diffusion model (MDM) paradigms + introduce KV-caching for MDMs`
	`284`	`+- Accelerating Diffusion LLMs via Adaptive Parallel Decoding ([israel, van den broeck, grover, 2025](https://arxiv.org/abs/2506.00413)) - dynamically adjusts the number of tokens sampled in parallel using small autoregressive model to help (kind of like opposite of speculative decoding)`
`313`	`285`
`314`	`286`	`## mixture of experts (MoE) / routing`
`315`	`287`
`@@ -995,6 +967,51 @@ Editing is generally very similar to just adaptation/finetuning. One distinction`
`995`	`967`	`- Ravel: Evaluating Interpretability Methods on Disentangling Language Model Representations ([huang, wu, potts, geva, & geiger, 2024](https://arxiv.org/pdf/2402.17700v1.pdf))`
`996`	`968`
`997`	`969`
	`970`	`+## natural-language explanations: chain-of-thought faithfulness & reasoning faithfulness`
	`971`	`+`
	`972`	`+- prompting-based methods`
	`973`	`+ - Faithful Chain-of-Thought Reasoning ([yu et al. 2023](https://arxiv.org/abs/2301.13379))`
	`974`	`+ - Contrastive Chain-of-Thought Prompting ([chia...bing, 2023](https://arxiv.org/abs/2311.09277))`
	`975`	`+ - Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks ([chen et al. 2022](https://arxiv.org/abs/2211.12588))`
	`976`	`+ - Chain of Code: Reasoning with a Language Model-Augmented Code Emulator ([li...levine, fei-fei, xia, ichter, 2024](https://arxiv.org/abs/2312.04474)) - attempts to write and evaluate variables using code, otherwise evaluates them using LLM`
	`977`	`+- finetuning-based methods`
	`978`	`+ - Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning ([chen...gao, 2024](https://arxiv.org/abs/2401.13986)) - measure consistent NL explanations and finetune on consistent examples`
	`979`	`+ - Benchmarking and Improving Generator-Validator Consistency of Language Models ([lisa li...liang, 2023](https://arxiv.org/abs/2310.01846)) - measure generator-validator consistency and finetune on consistent examples`
	`980`	`+- measurements`
	`981`	`+ - Counterfactual Simulatability of Natural Language Explanations ([yanda chen, zhong, ..., steinhardt, yu, mckeown, 2023](https://arxiv.org/abs/2307.08678)) - metric evaluates LLM performance on counterfactuals given explanations`
	`982`	`+ - Faithfulness Tests for Natural Language Explanations ([atanasova...augenstein, 2023](https://arxiv.org/abs/2305.18029))`
	`983`	`+ - propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the explanation`
	`984`	`+ - reconstruct inputs from the reasons stated in the generated explanations and check how often they lead to the same prediction`
	`985`	`+ - How Interpretable are Reasoning Explanations from Prompting Large Language Models? ([yeo...cambria, 2024](https://arxiv.org/abs/2402.11863)) - evaluate different methods using paraphrases, counterfactuals, adding mistakes, and simulatability`
	`986`	`+- reasoning models`
	`987`	`+ - Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models ([xiong...lakkaraju, 2025](https://arxiv.org/abs/2505.13774))`
	`988`	`+ - Intra-Draft Faithfulness - uses counterfactual step insertions to assess whether individual reasoning steps causally influence subsequent steps and final draft conclusion`
	`989`	`+ - Draft-to-Answer Faithfulness - perturbs draft's concluding logic to assess whether final answers follow from the the thinking draft`
	`990`	`+ - Reasoning Models Don't Always Say What They Think ([yanda chen...bowman, leike, kaplan, & perez, 2025](https://arxiv.org/abs/2505.05410)) - prompt models to answer a multiple-choice question & the same question but with a hint inserted. In cases where the model produces non-hint answers without the hint and the hint answer with the hint, they measure whether the model acknowledges the hint when solving the question with hint`
	`991`	`+ - Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces! ([kambhampati...biswas, 2025](https://arxiv.org/abs/2504.09762))`
	`992`	`+ - Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens ([stechly...kambhampati, 2025](https://arxiv.org/abs/2505.13775))`
	`993`	`+ - Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation ([bhambri...kambhampati, 2025](https://arxiv.org/abs/2505.13792))`
	`994`	`+- Critiques`
	`995`	`+ - The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning ([ye & durrett, 2022](https://proceedings.neurips.cc/paper_files/paper/2022/file/c402501846f9fe03e2cac015b3f0e6b1-Paper-Conference.pdf))`
	`996`	`+ - Unfaithful Explanations in Chain-of-Thought Prompting ([turpin, ..., bowman, 2023](https://arxiv.org/abs/2305.04388))`
	`997`	`+ - CoT explanations can be heavily influenced by biasing the model towards certain answers, thereby yielding invalid explanations`
	`998`	`+ - try biasing in 2 ways: answer is always (A), or setting where prompt suggests a certain answer`
	`999`	`+ - Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs ([chen, ..., bowman, cho, 2023](https://arxiv.org/abs/2305.14279)) - models fail at these 2 tasks:`
	`1000`	`+ - hypothetical consistency (the ability for a model to predict what its output would be in a hypothetical other context)`
	`1001`	`+ - compositional consistency (consistency of a model's outputs for a compositional task even when an intermediate step is replaced with the model's output for that step)`
	`1002`	`+- faithfulness metric = model sensitivity to removing some of the explanation`
	`1003`	`+ - Question Decomposition Improves the Faithfulness of Model-Generated Reasoning ([anthropic, 2023](https://www-files.anthropic.com/production/files/question-decomposition-improves-the-faithfulness-of-model-generated-reasoning.pdf)) - introduce factored decomposition to improve faithfulness metric`
	`1004`	`+ - Measuring Faithfulness in Chain-of-Thought Reasoning ([anthropic, 2023](https://www-files.anthropic.com/production/files/measuring-faithfulness-in-chain-of-thought-reasoning.pdf)) - in addition to just removing some of the explanation, also add mistakes to it / paraphrase it`
	`1005`	`+ - larger models become less faithful by this metric`
	`1006`	`+ - Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI ([sia...zettlemoyer, mathias, 2023](https://ojs.aaai.org/index.php/AAAI/article/view/26174))`
	`1007`	`+- loosely related`
	`1008`	`+ - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals ([elazar...sameer singh, noah smith, 2023](https://arxiv.org/pdf/2311.09605.pdf))`
	`1009`	`+ - Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals ([gat...reichart, 2023](https://arxiv.org/abs/2310.00603))`
	`1010`	`+ - Counterfactually Aware Fair Text Generation ([banerjee...bhatia, 2023](https://arxiv.org/abs/2311.05451))`
	`1011`	`+ - Causal Proxy Models for Concept-based Model Explanations ([wu...potts, 2023](https://proceedings.mlr.press/v202/wu23b.html))`
	`1012`	`+ - Evaluating Models' Local Decision Boundaries via Contrast Sets ([gardner...zhou, 2020](https://arxiv.org/abs/2004.02709))`
	`1013`	`+ - Are LLMs Post Hoc Explainers? ([kroeger...lakkaraju, 2023](https://arxiv.org/abs/2310.05797))`
	`1014`	`+`
`998`	`1015`	`## directly learning algorithms`
`999`	`1016`
`1000`	`1017`	`- Empirical results`
`@@ -1418,7 +1435,6 @@ Editing is generally very similar to just adaptation/finetuning. One distinction`
`1418`	`1435`	`- Localizing Paragraph Memorization in Language Models ([stoehr, ..., lewis, 2024](https://arxiv.org/abs/2403.19851))`
`1419`	`1436`	`- Detecting Personal Information in Training Corpora: an Analysis ([subramani, luccioni, dodge, & mitchell, 2023](https://trustnlpworkshop.github.io/papers/28.pdf))`
`1420`	`1437`
`1421`		`-`
`1422`	`1438`	`## symbolic reasoning`
`1423`	`1439`
`1424`	`1440`	`See also notes on [📌 comp neuro](https://csinva.io/notes/research_ovws/ovw_comp_neuro.html).`
`@@ -1455,6 +1471,20 @@ Editing is generally very similar to just adaptation/finetuning. One distinction`
`1455`	`1471`	`- Logical Transformers: Infusing Logical Structures into Pre-Trained Language Models ([wang, huang, ..., gao, 2023](https://aclanthology.org/2023.findings-acl.111/)) - use logical model to alter embeddings before feeding to LLM`
`1456`	`1472`	`- Implicit Chain of Thought Reasoning via Knowledge Distillation ([deng...smolensky..., 2023](https://arxiv.org/abs/2311.01460))`
`1457`	`1473`
	`1474`	`+`
	`1475`	`+`
	`1476`	`+## reasoning models`
	`1477`	`+`
	`1478`	`+- Coconut: Training Large Language Models to Reason in a Continuous Latent Space ([hao...weston, tian, 2024](https://arxiv.org/abs/2412.06769)) - requires some extra finetuning, reason directly within continuous latent spaces, using final hidden states as embeddings to achieve reasoning without explicit CoT`
	`1479`	`+ - Pretraining Language Models to Ponder in Continuous Space ([zeng...lin, 2025](https://arxiv.org/abs/2505.20674)) - reason by recycling embeddings derived from predicted probs. of LLM`
	`1480`	`+ - Looped Transformers as Programmable Computers ([giannou...papailiopoulos, 2023](https://proceedings.mlr.press/v202/giannou23a.html)) - recycle output hidden states back into input embeddings for algorithmic tasks`
	`1481`	`+- Training-free continuous latent reasoning`
	`1482`	`+ - Mixture of Inputs: Text Generation Beyond Discrete Token Sampling ([zhuang, liu, singh, shang, & gao, 2025](https://arxiv.org/abs/2505.14827)) - post-hoc (requires no finetuning)`
	`1483`	`+ - Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space ([zhang...shen, xin eric wang, 2025](https://arxiv.org/abs/2505.15778)) - post-hoc (requires no finetuning, outperformed by mixture of inputs)`
	`1484`	`+`
	`1485`	`+- Reasoning Activation in LLMs via Small Model Transfer ([ouyang...jiawei han, 2025](https://ozyyshr.github.io/RAST/)) - perform RL finetuning on small model, then take [difference between RL-finetuned small model and original small model] and add difference to logits from big model`
	`1486`	`+- reasoning gym: https://github.com/open-thought/reasoning-gym`
	`1487`	`+`
`1458`	`1488`	`## tool use / agents`
`1459`	`1489`
`1460`	`1490`	`- private`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 89fca1b

File tree

1 file changed

1 file changed

`‎_notes/research_ovws/ovw_llms.md`

0 commit comments