Newest 'linguistics' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

318 questions

2 votes

1 answer

62 views

Vespa indexing anomaly on `exact`-indexed field with diacritical variants and non-latin Scripts

I’m using the Vespa Python client (pyvespa 0.54.0) to query a Vespa index, and I’m running into an issue where Vespa doesn't find a document it has just returned in a previous query. I have this field ...

Stephen Gadd's user avatar

Stephen Gadd

asked Mar 7, 2025 at 9:55

0 votes

1 answer

126 views

geom_smooth() producing a linear fit

I'm using R to model the tone contour (pitch) of words in a language and I have two main questions. Note that I am new to R and don't have a data science background, so any help is really appreciated. ...

SanguineEpitaph's user avatar

SanguineEpitaph

asked Nov 15, 2024 at 0:06

-1 votes

1 answer

65 views

Automatic Word Boundary Detection for German

I want to rephrase that: I need a corpus of German words so that I can check if a segment is a word. My solution so far is to take the string, check if it's in the dictionary and if not, delete the ...

kiwi-123's user avatar

kiwi-123

asked Mar 27, 2024 at 11:24

1 vote

0 answers

100 views

Query Wikidata via SPARQL to get specific word etymology from Wiktionary

I'm trying to get the specific word etymology from Wikidata. For example, this query to get the word "exact" in Wikidata but I wasn't able to get the etymology part for this word. SELECT ...

Omar Al-Howeiti's user avatar

Omar Al-Howeiti

1,325

asked Sep 8, 2023 at 19:45

0 votes

1 answer

957 views

What does "assign A to B" mean?

If I say "assign A to B", does it mean (a) A ← B or (b) B ← A? In other words, is it (a) A or (b) B that is being modified? (a) makes sense because A has responsibility over B, so A is ...

glibg10b's user avatar

glibg10b

asked Aug 3, 2023 at 4:41

0 votes

1 answer

87 views

Problems with reproducing the training of the spaCy pipeline

I'm trying to reproduce the training of one of the spaCy pipeline for Italian language: it_core_news_sm. This pipeline is trained on 2 datasets: UD_Italian-ISDT for the conllu tasks WikiNer for NET ...

Andrea Lavista's user avatar

Andrea Lavista

asked Aug 1, 2023 at 14:34

1 vote

0 answers

157 views

In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities

NLTK has a generate method which enumerates sentences for a given CFG. It also has a PCFG class for probabilistic context-free grammars. Is it possible generate a sample of sentences with respect to ...

Albert Gevorgyan's user avatar

Albert Gevorgyan

asked Jul 22, 2023 at 13:54

0 votes

1 answer

307 views

Weighted Distance Matrix for QWERTZ Keyboard for Levenshtein Distance Algorithm

I have a weight Matrix for a Levenshtein Distance Algorithm which looks like this int[,] weights = new int[6, 6] { { 0, 1, 2, 1, 1, 2 }, { 1, 0, 1, 2, 1, 2 }, { 2, 1, 0,...

Marco-rm-f's user avatar

Marco-rm-f

asked Apr 12, 2023 at 13:02

2 votes

1 answer

786 views

How to develop a corpus(corpus analysis)

I am goingt to build a linguistic corpus, but i don't understand which technologies should i use for it. Is it true, that for developing a courpus for any language i necessarily have to use IMS Corpus ...

Murad Mammadzada's user avatar

Murad Mammadzada

asked Mar 3, 2023 at 9:02

0 votes

1 answer

138 views

Tool for detecting differences between text passages from two different groups

I have text data from two different groups. In total I have around 4000 text passages with around 300 words. I am searching for a tool that allows me to analyze the difference between these two groups....

Irazall's user avatar

Irazall

asked Jan 22, 2023 at 18:01

0 votes

0 answers

33 views

R - readtext and list of .xml files

I'm trying to create a corpus and a vcorpus with a bulk of .xml files, for quantitative linguistics With txt files I usually write library(tm) library(stopwords) library(magrittr) library(dplyr) ...

SubotnikOne's user avatar

SubotnikOne

asked Dec 12, 2022 at 15:34

2 votes

0 answers

66 views

How can I determine if a word is a part of an english word or is a portmanteau (a word created by combining parts of valid English words)?

I am trying to create a validator that takes in words and tries to determine if the word is one of the following: It is a valid English word It is a part of an English word It is an abbreviation It ...

user3188603's user avatar

user3188603

asked Dec 8, 2022 at 21:00

1 vote

0 answers

211 views

Customization of Wav2Vec2CTCTokenizer with rules

my goal is to fine-tune an ASR model, WavLM, that relies on the pretrained tokenizer Wav2Vec2CTCTokenizer. I want to fine-tune this ASR model with another language and to perform the tokenization ...

Sara Picciau's user avatar

Sara Picciau

asked Aug 24, 2022 at 16:28

1 vote

0 answers

340 views

spaCy custom tokenizer to separate word with underscore and also to include the whole word

After referring to the link: How to tokenize word with hyphen in Spacy I got to know how to tokenize by separating words containing hyphen/underscore but my requirement is to tokenize by separating it ...

Shakkir Moulana's user avatar

Shakkir Moulana

asked Aug 17, 2022 at 14:50

1 vote

1 answer

2k views

How to do the post hoc test in the linear mixed model if I have three predictors (two factor variables and one numeric variable)

I'm using a linear mixed effects model to analyze the reaction time of learners of English as a second language. I have two factor variables - grammaticality (grammatical v.s. ungrammatical) and ...

Yang Cao's user avatar

Yang Cao

asked Aug 12, 2022 at 11:13

15 30 50 per page

2 3 4 5

...

22 Next

CollectivesTM on Stack Overflow

Vespa indexing anomaly on `exact`-indexed field with diacritical variants and non-latin Scripts

geom_smooth() producing a linear fit

Automatic Word Boundary Detection for German

Query Wikidata via SPARQL to get specific word etymology from Wiktionary

What does "assign A to B" mean?

Problems with reproducing the training of the spaCy pipeline

In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities

Weighted Distance Matrix for QWERTZ Keyboard for Levenshtein Distance Algorithm

How to develop a corpus(corpus analysis)

Tool for detecting differences between text passages from two different groups

R - readtext and list of .xml files

How can I determine if a word is a part of an english word or is a portmanteau (a word created by combining parts of valid English words)?

Customization of Wav2Vec2CTCTokenizer with rules

spaCy custom tokenizer to separate word with underscore and also to include the whole word

How to do the post hoc test in the linear mixed model if I have three predictors (two factor variables and one numeric variable)

Hot Network Questions