Questions tagged [machine-learning]
Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.
20,439 questions
- Bountied 1
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
2
votes
0
answers
9
views
Help matching an ordered list of events (no timestamps) to a noisy timestamped time-series
I’m stuck and this is starting to feel pretty convoluted, so I’ll try to be clear.
What I have:
A timestamped stochastic time-series (e.g. market prices). It’s noisy but when an event happens the ...
3
votes
0
answers
19
views
Why do "good" loss functions in ML need both Lipschitz continuity and smoothness?
I’m trying to understand the common assumptions in machine-learning optimization theory, where a "well-behaved" loss function is often required to be both L-Lipschitz and β-smooth (i.e., have β-...
0
votes
0
answers
24
views
What data mining freeware is available that replicates SAS EMiner's interactive Decision Tree node?
Its 2025, and yes I'm still using SAS EMiner's Decision Tree..... If anyone knows a modern freeware version that replicates the Interactive mode effectively (with controlling split cutoff values, a ...
0
votes
0
answers
23
views
Modeling recurring monthly transactions with weekend-shift effects: DBSCAN vs rule-based temporal detection?
I have 3 months of categorized bank transaction data and need to identify recurring cash inflows and outflows for lending risk modeling.
Complications:
1. Income dates shift earlier when payday falls ...
2
votes
0
answers
38
views
Restrict training data to only rows with values for most important variable? [closed]
My training data is mostly missing values for the feature that I know will be the most important variable. This missingness is semi-random. For example, I know the value is missing for this feature ...
0
votes
0
answers
23
views
Is the figure showing margin violation for the support vector machine correct?
I am listening to a lecture on soft margin SVM https://youtu.be/XUj5JbQihlU?si=b66SblRnw9mmczVU&t=2969
The lecturer says that the blue dot represents a violation of the margin.
I don't really ...
3
votes
1
answer
102
views
+50
Accuracy in Machine Learning vs. Accuracy in Statistics vs. pass@1,1 in Generative Modeling: What's the Difference?
I've encountered the term "accuracy" used differently across several evaluation contexts, and I want to clearly understand their mathematical and conceptual distinctions using consistent ...
1
vote
0
answers
29
views
Is the strong duality of the hard-margin SVM really trivially satisfied all the time?
It is widely known that if you were to calculate the maximizer of the dual SVM program (denote as $\alpha^*$), then the primal minimizer of the hard-margin SVM program,
\begin{aligned}&{\underset {...
0
votes
1
answer
51
views
Guidance for communicating insights to inform breakdown companies how to assess breakdown risk [closed]
I come from a machine learning background, however I am trying to learn more traditional data science. I have a dataset of vehicles and the target is the Breakdown Likelihood (1 to 3, 1 being lowest), ...
0
votes
0
answers
26
views
Time-based regression: is it leakage if training includes snapshots closer to the event than those used at prediction?
I’m building a regression model that predicts the final number of vehicles booked for a ferry trip.
Each training row represents the state of bookings for a given trip N days before departure.
Example ...
0
votes
0
answers
41
views
Definition(s) of "data augmentation"
The first paragraph of the Wikipedia page for "data augmentation" seems to conflate two different meanings of the term.
The more classical definition comes from Bayesian computation: ...
0
votes
0
answers
59
views
Extending the TVD-MI mechanism beyond information-based questions for scalable oversight [closed]
TVD-MI (Total Variation Distance–Mutual Information) has been proposed as a mechanism for evaluating the trustworthiness of judges (such as LLMs scoring code correctness or theorem validity) without ...
0
votes
0
answers
16
views
Clarifying notation for agent/item indices in TVD-MI mechanism
In the context of the TVD-MI (Total Variation Distance–Mutual Information) mechanism described by Zachary Robertson et al., what precisely do the indices (i, j) represent? Specifically, are (i, j) ...
1
vote
0
answers
37
views
Designing a demand forecasting model with a dynamic daily update and a final horizon prediction — best practices to avoid leakage?
I am working on a demand forecasting problem for ferry vehicle capacity.
For each voyage, I have daily snapshots of the cumulative reservations from the opening date until departure day.
So each ...
3
votes
2
answers
67
views
Should the minimum and maximum of each feature be contained in the train set for machine learning?
When using machine learning algorithms for regressions, I know that the prediction of the final model will be best when the features are within the ranges used for training, to avoid extrapolation. ...