Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
2 votes
1 answer
62 views

I’m using the Vespa Python client (pyvespa 0.54.0) to query a Vespa index, and I’m running into an issue where Vespa doesn't find a document it has just returned in a previous query. I have this field ...
0 votes
1 answer
126 views

I'm using R to model the tone contour (pitch) of words in a language and I have two main questions. Note that I am new to R and don't have a data science background, so any help is really appreciated. ...
-1 votes
1 answer
65 views

I want to rephrase that: I need a corpus of German words so that I can check if a segment is a word. My solution so far is to take the string, check if it's in the dictionary and if not, delete the ...
1 vote
0 answers
100 views

I'm trying to get the specific word etymology from Wikidata. For example, this query to get the word "exact" in Wikidata but I wasn't able to get the etymology part for this word. SELECT ...
0 votes
1 answer
957 views

If I say "assign A to B", does it mean (a) A ← B or (b) B ← A? In other words, is it (a) A or (b) B that is being modified? (a) makes sense because A has responsibility over B, so A is ...
0 votes
1 answer
87 views

I'm trying to reproduce the training of one of the spaCy pipeline for Italian language: it_core_news_sm. This pipeline is trained on 2 datasets: UD_Italian-ISDT for the conllu tasks WikiNer for NET ...
1 vote
0 answers
157 views

NLTK has a generate method which enumerates sentences for a given CFG. It also has a PCFG class for probabilistic context-free grammars. Is it possible generate a sample of sentences with respect to ...
0 votes
1 answer
307 views

I have a weight Matrix for a Levenshtein Distance Algorithm which looks like this int[,] weights = new int[6, 6] { { 0, 1, 2, 1, 1, 2 }, { 1, 0, 1, 2, 1, 2 }, { 2, 1, 0,...
2 votes
1 answer
786 views

I am goingt to build a linguistic corpus, but i don't understand which technologies should i use for it. Is it true, that for developing a courpus for any language i necessarily have to use IMS Corpus ...
0 votes
1 answer
138 views

I have text data from two different groups. In total I have around 4000 text passages with around 300 words. I am searching for a tool that allows me to analyze the difference between these two groups....
0 votes
0 answers
33 views

I'm trying to create a corpus and a vcorpus with a bulk of .xml files, for quantitative linguistics With txt files I usually write library(tm) library(stopwords) library(magrittr) library(dplyr) ...
2 votes
0 answers
66 views

I am trying to create a validator that takes in words and tries to determine if the word is one of the following: It is a valid English word It is a part of an English word It is an abbreviation It ...
1 vote
0 answers
211 views

my goal is to fine-tune an ASR model, WavLM, that relies on the pretrained tokenizer Wav2Vec2CTCTokenizer. I want to fine-tune this ASR model with another language and to perform the tokenization ...
1 vote
0 answers
340 views

After referring to the link: How to tokenize word with hyphen in Spacy I got to know how to tokenize by separating words containing hyphen/underscore but my requirement is to tokenize by separating it ...
1 vote
1 answer
2k views

I'm using a linear mixed effects model to analyze the reaction time of learners of English as a second language. I have two factor variables - grammaticality (grammatical v.s. ungrammatical) and ...

15 30 50 per page
1
2 3 4 5
...
22

AltStyle によって変換されたページ (->オリジナル) /