Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gentaiscool/code-switching-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

161 Commits

Repository files navigation

Code-switching Research Resources

This is the list of tutorials, workshops, papers, and resources on computational linguistic approaches to code-switching research. The list will be updated over the time. You are welcome to send a pull request for updating the list and be one of the contributors!

📌 I plan to collect theses and books on code-switching and list them here. If you have one, don't hesitate to contact me or create a pull request!

Table of Contents

🚀 Highlights

  • We will be organizing the code-switching workshop at NAACL 2025! We will soon update the website! [Website]
  • If you are new on code-switching or looking for a new research direction, we have written a comprehensive survey paper on code-switching: The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges [Paper]. Feel free to read and let us know if you have any suggestions! Thanks to Alham Fikri Aji, Zheng-Xin Yong, and Thamar Solorio to make this possible 😊
  • We organized the code-switching workshop at EMNLP 2023! [Website]
  • We (I, Marina Zhukova, and Sudipta Kar) organized a bird-of-a-feather session at EMNLP 2022 in Abu Dhabi. We have around 30 people joining (in-person and online). Thanks for coming!
  • 📔 There was a comprehensive tutorial about code-mixing by Microsoft Research (Monojit Choudhury, Kalika Bali, Anirudh Srinivasan, and Sandipan Dandapat) at EMNLP 2019, you can check the following link.

🏫 Workshops

This is the list of the code-switching workshop series:

  • First Workshop on Computational Approaches to Code-switching, EMNLP 2014 [Website]
  • Second Workshop on Computational Approaches to Code-switching, EMNLP 2016
  • Third Workshop on Computational Approaches to Linguistic Code-switching, ACL 2018 [Website]
  • Fourth Workshop on Computational Approaches to Linguistic Code-switching, LREC 2020 [Website]
  • First Workshop on Speech Technologies for Code-switching in Multilingual Communities, Interspeech 2020 [Website]
  • Fifth Workshop on Computational Approaches to Linguistic Code-switching, NAACL 2021 [Website]
  • Sixth Workshop on Computational Approaches to Linguistic Code-switching, EMNLP 2023 [Website]
  • Seventh Workshop on Computational Approaches to Linguistic Code-switching, NAACL 2025 [Website (will open soon)]

📑 Research Papers

Survey Paper

  • Winata, et al. (2023) The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. ACL Findings [Paper]
  • Doğruöz, et al (2021) A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies. ACL [Paper]
  • Jose, et al. (2020) A Survey of Current Datasets for Code-Switching Research. International Conference on Advanced Computing and Communication Systems (ICACCS) [Paper]
  • Sitaram, et al. (2019) A Survey of Code-switched Speech and Language Processing. Arxiv [Paper]

Large Language Models

  • Igor Sterner and Simone Teufel (2025) Minimal Pair-Based Evaluation of Code-Switching. ACL [Paper] [Code]
  • Winata, et al. (2024) MINERS: Multilingual Language Models as Semantic Retrievers. EMNLP Findings [Paper] [Code]
  • Yoo, et al. (2024) Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding. Arxiv [Paper]
  • Leon, et al., (2024) Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text. LREC [Paper] [Code]
  • Huzaifah, et al. (2024) Evaluating Code-Switching Translation with Large Language Models. LREC-COLING [Paper]
  • Yong, et al. (2023) Prompting Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages. CALCS, EMNLP [Paper]

Language Identification and POS Tagging

  • Igor Sterner (2024) Multilingual Identification of English Code-Switching. VarDial, NAACL [Paper] [Code]
  • Burchell, et al. (2024) Code-Switched Language Identification is Harder Than You Think. EACL [Paper]
  • Igor Sterner and Simone Teufel (2023) TongueSwitcher: Fine-Grained Identification of German-English Code-Switching. CALCS, EMNLP [Paper] [Code]
  • Ostapenko, et al. (2022) Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching. ACL [Paper]
  • Nguyen, et al. (2021) Automatic Language Identification in Code-Switched Hindi-English Social Media Text. Journal of Open Humanities Data [Paper]
  • Tarunesh, et al. (2021) From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text. ACL [Paper]
  • Gustavo Aguilar and Thamar Solorio. (2020) From English to Code-Switching: Transfer Learning with Strong Morphological Clues. ACL [Paper] [Code]
  • Mager, et al. (2019) Subword-Level Language Identification for Intra-Word Code-Switching. NAACL [Paper]
  • Zhang, et al. (2018) A Fast, Compact, Accurate Model for Language Identification of Codemixed Text. EMNLP [Paper]
  • Kelsey Ball and Dan Garrette. (2018) Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification. EMNLP [Paper]
  • Zeynep Yirmibesoglu and Gulsen Eryigit. (2018) Detecting Code-Switching between Turkish-English Language Pair. Workshop W-NUT, EMNLP [Paper]
  • Mavem, et al. (2018) Language Identification and Analysis of Code-Switched Social Media Text. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Victor Soto and Julia Hirschberg. (2018) Joint Part-of-Speech and Language ID Tagging for Code-Switched Data. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Bullock, et al. (2018) Predicting the presence of a Matrix Language in code-switching. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Soto, et al. (2018) The Role of Cognate Words, POS Tags, and Entrainment in Code-Switching. Interspeech [Paper]
  • Barman, et al. (2016) Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline,Stacking and Joint Modelling. 2nd Workshop on Computational Approaches to Code-Switching, ACL [Paper]
  • Vyas, et al. (2014) POS Tagging of English-Hindi Code-Mixed Social Media Content. EMNLP [Paper]
  • Heba Elfardy and Mona Diab. (2012) Token Level Identification of Linguistic Code Switching. COLING [Paper]
  • Thamar Solorio and Yang Liu. (2008) Learning to Predict Code-Switching Points. EMNLP [Paper]
  • Dau-Cheng Lyu and Ren-Yuan Lyu. (2008) Language Identification on Code-Switching Utterances Using Multiple Cues. Interspeech [Paper]

Corpus

  • Winata, et al. (2026) Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?. Arxiv [Paper] [Code] [Dataset]
  • Farhansyah, et al. (2026) PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues. Arxiv [Paper]
  • Kuwanto, et al. (2024) Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models. Arxiv [Paper] [Code] [Dataset]
  • Ruochen Zhang and Carsten Eickhoff (2024) CroCosum: A Benchmark Dataset for Cross-Lingual Code-switched Summarization. LREC [Paper] [Dataset]
  • Whitehouse, et al. (2022) EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching. EMNLP [Paper] [Code]
  • Lovenia, et al. (2022) ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation. LREC [Paper] [Dataset]
  • Nguyen, et al. (2020) CanVEC-the Canberra Vietnamese-English Code-switching Natural Speech Corpus. LREC [Paper]
  • Umapathy, et al. (2020) Investigating Modelling Techniques for Natural Language Inference on Code-Switched Dialogues in Bollywood Movies. First Workshop on Speech Technologies for Code-switching in Multilingual Communities, Interspeech 2020 [Dataset]
  • Xiang, et al. (2020) Sina Mandarin Alphabetical Words:A Web-driven Code-mixing Lexical Resource. AACL-IJCNLP [TBC]
  • Chakravarthi, et al. (2020) Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text. Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages Workshop, LREC [Paper]
  • Khanuja, et al. (2020) A New Dataset for Natural Language Inference from Code-mixed Conversations. 4th Workshop of Computational Approaches to Linguistic Code-switching, LREC [Paper]
  • Barik, et al. (2019) Normalization of Indonesian-English Code-Mixed Twitter Data. W-NUT, EMNLP [Paper] [Dataset]
  • Singh, et al. (2018) A Twitter Corpus for Hindi-English Code Mixed POS Tagging. Sixth International Workshop on Natural Language Processing for Social Media, ACL [Paper]
  • Li, et al. (2012) A Mandarin-English Code-Switching Corpus. LREC [Paper]
  • Lyu, et al. (2010) SEAME: A Mandarin-English Code-Switching Speech Corpus in South-East Asia. Interspeech [Paper]
  • Lyu, et al. (2010) An Analysis of a Mandarin-English Code-switching Speech Corpus: SEAME. Age [Paper]

Language Modeling and Speech Recognition

  • Yu, et al. (2023) Code-switching text generation and injection in mandarin-english asr. ICASSP [Paper]
  • Tolúlopé, et al. (2023) Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching. Sixth Workshop on Computational Approaches to Linguistic Code-Switching. [Paper]
  • Kumar, et al. (2020) Machine Learning based Language Modelling of Code Switched Data. International Conference on Electronics and Sustainable Communication Systems (ICESC) [Paper]
  • Madhumani, et al. (2020) Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition. Arxiv [Paper]
  • Shah, et al. (2020) Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition. Arxiv [Paper]
  • Winata, et al. (2020) Meta-Transfer Learning for Code-Switched Speech Recognition. ACL [Paper] [Code]
  • Chandu, et al. (2020) Style Variation as a Vantage Point for Code-Switching. Arxiv [Paper]
  • Ganji Sreeram and Rohit Sinha (2020) Exploration of End-to-End Framework for Code-Switching Speech Recognition Task: Challenges and Enhancements. IEEE Access [Paper]
  • Winata, et al. (2019) Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences. CoNLL [Paper]
  • Hila Gonen and Yoav Goldberg (2019) Language Modeling for Code-Switching:Evaluation, Integration of Monolingual Data, and Discriminative Training. EMNLP [Paper]
  • Lee, et al. (2019) Linguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling. Interspeech [Paper]
  • Victor Soto and Julia Hirschberg (2019) Improving Code-Switched Language Modeling Performance Using Cognate Features. Interspeech [Paper]
  • Chang, et al. (2019) Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. Interspeech [Paper]
  • Zeng, et al. (2019) On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition. Interspeech [Paper]
  • Taneja, et al. (2019) Exploiting Monolingual Speech Corpora for Code-mixed Speech Recognition. Interspeech [Paper]
  • Shan, et al. (2019) Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) [Paper]
  • Grandee Lee, Haizhou Li. (2019) Word and Class Common Space Embedding for Code-switch Language Modelling. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) [Paper]
  • Hamed, et al. (2019) Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English. International Conference on Speech and Computer [Paper]
  • Winata, et al. (2018) Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling. Arxiv [Paper]
  • Winata, et al. (2018) Towards End-to-end Automatic Code-Switching Speech Recognition. Arxiv [Paper]
  • Nakayama, et al. (2018) Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS. IEEE Spoken Language Technology Workshop (SLT) [Paper]
  • Jesse Emond, Bhuwana Ramabhadran, Brian Roark, Pedro Moreno, and Min Ma. (2018) Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance, IEEE Spoken Language Technology Workshop (SLT) [Paper]
  • Ganji Sreeram and Rohit Sinha. (2018) Exploiting Parts-of-Speech for Improved Textual Modeling of Code-Switching Data. 2018 Twenty Fourth National Conference on Communications (NCC) [Paper]
  • Garg, et al. (2018) Code-switched Language Models Using Dual RNNs and Same-Source Pretraining. EMNLP [Paper]
  • Ewald van der Westhuizen and Thomas R. Niesler. (2018) Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs. Computer Speech and Language [Paper]
  • Biswal, et al. (2018) Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech. Interspeech [Paper]
  • Winata, et al. (2018) Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper] [Code]
  • Chandu, et al. (2018) Language Informed Modeling of Code-Switched Text. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Pratapa, et al. (2018) Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data. ACL [Paper]
  • Sivasankaran, et al. (2018) Phone Merging For Code-Switched Speech Recognition. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Garg, et al. (2018) Dual Language Models for Code Switched Speech Recognition. Interspeech [Paper]
  • Baheti, et al. (2017) Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks. ICON [Paper]
  • Adel, et al. (2015) Syntactic and Semantic Features For Code-Switching Factored Language Models. IEEE Transactions on Audio, Speech, and Language Processing [Paper]
  • Ying Li and Pascale Fung. (2014) Code switch language modeling with Functional Head Constraint. ICASSP [Paper]
  • Ying Li and Pascale Fung. (2014) Language Modeling with Functional Head Constraint for Code Switching Speech Recognition. EMNLP [Paper]
  • Adel, et al. (2013) Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling. ACL [Paper]
  • Adel, et al. (2013) Recurrent neural network language modeling for code switching conversational speech. ICASSP [Paper]
  • Vu, et al. (2012) A First Speech Recognition System for Mandarin-English Code-Switch Conversational Speech. ICASSP [Paper]
  • Ying Li and Pascale Fung. (2012) Code-switch Language Model with Inversion Constraints for Mixed Language Speech Recognition. COLING [Paper]
  • Li, et al. (2011) Asymmetric acoustic modeling of mixed language speech. ICASSP [Paper]

Discourse

  • Sravani, et al. (2021) Political Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches. Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL [Paper]

Generation

  • Gupta, et al. (2020) A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning. Findings of EMNLP [Paper]
  • Bryan Gregorius and Takeshi Okadome (2022) Generating Code-Switched Text from Monolingual Text with Dependency Tree. The 20th Annual Workshop of the Australasian Language Technology Association [Paper] [Code]

Speech Synthesis

  • Sai Krishna Rallabandi and Alan W Black (2019) Variational Attention using Articulatory Priors for generating Code Mixed Speech using Monolingual Corpora. Interspeech [Paper]
  • Sai Krishna Rallabandi and Alan W Black (2017) On Building Mixed Lingual Speech Synthesis Systems. Interspeech [Paper]
  • Chandu, et al. (2017) Speech Synthesis for Mixed-Language Navigation Instructions. Interspeech [Paper]

Metric

  • Guzman, et al. (2017) Metrics for modeling code-switching across corpora. Interspeech [Paper]

Representation Learning

  • Adilazuarda, et al. (2023) IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages. Proceedings of the First Workshop on Scaling Up Multilingual Evaluation, AACL [Paper] [Code]
  • Prasad, et al. (2021) The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding. Proceedings of the 1st Workshop on Multilingual Representation Learning, EMNLP [Paper]
  • Winata, et al. (2021) Are Multilingual Models Effective in Code-Switching?. Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL [Paper]
  • Rizal, et al. (2020) Evaluating Word Embeddings for Indonesian–English Code-Mixed Text Based on Synthetic Data. Proceedings of the 4th Workshop on Computational Approaches to Code Switching (CALCS), LREC [Paper]
  • Winata, et al. (2019) Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition. EMNLP [Paper] [Code]
  • Pratapa, et al. (2018) Word Embeddings for Code-Mixed Language Processing. EMNLP [Paper]

Machine Translation

  • Pengpun, et al. (2024) On Creating an English-Thai Code-switched Machine Translation in Medical Domain. EMNLP [Paper]
  • Gaser, et al. (2023) Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text. EACL [Paper]
  • Kuwanto, et al. (2021) Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages. Arxiv [Paper]
  • Vivek Srivastava and Mayank Singh (2020) PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation. W-NUT, EMNLP [Paper] [Dataset]
  • Thoudam Doren Singh and Thamar Solorio. (2017) Towards Translating Mixed-Code Comments from Social Media. CICLing [Paper]

Speech Translation

  • Alastruey, et al. (2023) Towards Real-World Streaming Speech Translation for Code-Switched Speech. CALCS, EMNLP [Paper]

Natural Language Understanding

  • Krishnan, et al. (2021) Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling. MRL, EMNLP [Paper]

Named Entity Recognition

  • Priyadharshini, et al. (2020) Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding. 6th International Conference on Advanced Computing and Communication Systems (ICACCS) [Paper]
  • Winata, et al. (2019) Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition. RepL4NLP, ACL [Paper] [Code]
  • Aguilar, et al. (2018) Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Wang, et al. (2018) Code-Switched Named Entity Recognition with Embedding Attention. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Winata, et al. (2018) Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition. 3rd Workshop of Computational Approaches to Linguistic Code-switching, ACL [Paper]
  • Aguilar, et al. (2017) A Multi-task Approach for Named Entity Recognition in Social Media Data. 3rd Workshop on Noisy User-generated Text, EMNLP [Paper]

Linguistics

  • Li Nyuyen. (2018) Borrowing or Code-switching? Traces of community norms in Vietnamese-English speech. Australian Journal of Linguistics 38.4 (2018): 443-466. [Paper]
  • Fairchild, Sarah, and Janet G. Van Hell. (2017) Determiner-noun code-switching in Spanish heritage speakers. Bilingualism: Language and Cognition 20.1 (2017): 150-161. [Paper]
  • Bhatt, Rakesh M., and Agnes Bolonyai. (2011) Code-switching and the optimal grammar of bilingual language use. Bilingualism: Language and Cognition 14.4 (2011): 522-546. [Paper]
  • Lipski (2005) Code-switching or Borrowing? No sé so no puedo decir, you know. Second Workshop on Spanish Sociolinguistics [Paper]
  • Roberto R. Heredia and Jeanette Altarriba (2001) Bilingual Language Mixing: Why Do Bilinguals Code-Switch? SAGE Publications [Paper]
  • Belazi, et al. (1994) Code switching and X-bar theory: The functional head constraint. Linguistic inquiry Vol 25 No.2 Spring [Paper]
  • Shana Poplack (1980) Sometimes i’ll start a sentence in spanish y termino en espanol: toward a typology of code-switching1. Linguistics 18(7-8) [Paper]
  • Pfaff, Carol W. (1979) Constraints on language mixing: intrasentential code-switching and borrowing in Spanish/English. Language: 291-318. [Paper]
  • Shana Poplack (1978) Syntactic structure and social function of code-switching. Vol. 2. Centro de Estudios Puertorriqueños, City University of New York [Paper]
  • Gumperz, J. J., & Hernandez, E. (1969) Cognitive aspects of bilingual communication. Institute of International Studies, University of California [Paper]

Affective Computing

  • Chakravarthi, et al. (2021) DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text. Arxiv [Paper] [Code and Dataset]
  • Siddharth Yadav (2020) Unsupervised Sentiment Analysis for Code-mixed Data. Arxiv[Paper] [Code]
  • Wang, et al. (2017) Emotion Analysis in Code-Switching Text With Joint Factor Graph Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing [Paper]
  • Wang, et al. (2016) A Bilingual Attention Network for Code-switched Emotion Prediction. COLING [Paper]
  • Sophia Lee and Zhongqing Wang (2015) Emotion in Code-switching Texts: Corpus Construction and Analysis. Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing [Paper]
  • Wang, et al. (2015) Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information. ACL [Paper]

Dialog and Conversational System

  • Gupta, et al. (2018) Uncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural based Question Answering. CoNLL [Paper]

Discourse

  • Sravani, et al. (2021) Political Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL [Paper]

Syntax

  • Igor Sterner and Simone Teufel (2025) Code-Switching and Syntax: A Large-Scale Experiment. ACL Findings [Paper] [Code]
  • Kodali, et al. (2022) SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing. Findings of ACL [Paper]
  • Özlem Çetinoglu and Çagrı Çöltekin (2019) Challenges of Annotating a Code-Switching Treebank. SyntaxFest [Paper]

Adversarial Attack

  • Samson Tan and Shafiq Joty (2021) Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots. NAACL [Paper]

Social Linguistics

  • Bolock, et al. (2020) Who, When and Why: The 3 Ws of Code-Switching. International Conference on Practical Applications of Agents and Multi-Agent Systems [Paper]
  • Yoder, et al. (2017) Code-Switching as a Social Act:The Case of Arabic Wikipedia Talk Pages. Proceedings of the Second Workshop on Natural Language Processing and Computational Social Science, ACL [Paper]
  • Agrawal, et al. (2017) Agarwal, Prabhat, et al. I may talk in English but gaali toh Hindi mein hi denge: A study of English-Hindi code-switching and swearing pattern on social networks. International Conference on Communication Systems and Networks (COMSNETS) [Paper]

Benchmark

  • Khanuja, et al. (2020) GLUECoS : An Evaluation Benchmark for Code-Switched NLP. ACL [Paper]
  • Aguilar, et al. (2020) LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation. LREC [Paper]

Social Media

  • Bali, et al. (2014) "I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook. Proceedings of The First Workshop on Computational Approaches to Code Switching [Paper]

Text Normalization

  • Dwija Parikh and Thamar Solorio (2021) Normalization and Back-Transliteration for Code­Switched Data. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL [Paper]

Toolkit

Sentence Segmentation

  • Frohmann, et al. (2024) Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation . EMNLP [Paper] [Code]

Synthetic Data Generation Toolkit

  • Jayanthi, et al. (2021) CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing. CALCS Proceedings of the 5th Workshop on Computational Approaches to Code Switching (CALCS), NAACL [Paper] [Code]
  • Rizvi, et al. (2021) GCM: A Toolkit for Generating Synthetic Code-mixed Text. EACL (System Demonstrations) [Paper] [Code]

Annotation Toolkit

  • Shah, et al. (2019) CoSSAT: Code-Switched Speech Annotation Tool. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP [Paper]

Summarization

  • Mehnaz, et al. (2021) GupShup: Summarizing Open-Domain Code-Switched Conversations. EMNLP

Question Answering

  • Gupta, et al. (2020) A Unified Framework for Multilingual and Code-Mixed Visual Question Answering. AACL-IJCNLP [TBA]

Dialog and Conversational System

  • Bawa, et al. (2020) Do Multilingual Users Prefer Chat-bots that Code-mix? Let's Nudge and Find Out!. ACM on Human-Computer Interaction [Paper]
  • Banerjee, et al. (2018) A Dataset for Building Code-Mixed Goal Oriented Conversation Systems. COLING [Paper]

Position Paper

  • Nguyen, et al. (2022) Building Educational Technologies for Code-Switching: Current Practices, Difficulties and Future Directions. Languages [Paper]

Books

  • Caciullos and Travis (2018) Bilingualism in the Community. Cambridge University Press

Theses

  • Genta Indra Winata (2021) Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling. [Thesis]
  • Gustavo Aguilar (2020) Neural Sequence Labeling on Social Media Text. [Thesis]
  • Victor Soto Martinez (2020) Identifying and Modeling Code-Switched Language. [Thesis]

About

A curated list of research papers and resources on code-switching

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /