Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Computer Science

Questions tagged [gradient-descent]

The tag has no summary.

Filter by
Sorted by
Tagged with
1 vote
0 answers
33 views

Conditions on LR in Gradient Descent

In Introductory Lectures in Convex Optimization by Yurii Nesterov, Section 1.2.3 shows that gradient descent is guaranteed to converge if the step size is chosen either with a fixed step size or ...
0 votes
2 answers
382 views

Find minimum of a function only knowing the ordering of a set of input points

Suppose I have a function $f: \mathbb{R}^n\rightarrow\mathbb{R}$. All I know about the function is, I have a set of pairs of vectors ($\vec{v}_a,ドル $\vec{v}_b$) for which I know which one is greater (i....
1 vote
1 answer
174 views

What does RSGD stand for?

I'm reading a paper that involves an algorithm for RSGD. It's clearly a form of stochastic gradient descent, but I can't find what the R stands for. The authors provide their own implementation of it, ...
Dalop's user avatar
  • 125
1 vote
0 answers
55 views

Understanding gradient flow of a linearized wide neural network

I've been trying to fully understand the paper "Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent" (available here), but I'm stuck on the linearization part, ...
0 votes
0 answers
179 views

Create a simple Neural Network of n layers in python from scratch with numpy to solve XOR example problem using Batch Gradient Descent

I'm a young programmer that was interested by machine learning. I watched videos and read articles about the theory behind simple neural networks. However, I can't manage to set it up correctly. I've ...
0 votes
0 answers
122 views

How to calculate the upper bound of the gredient of a multi layer ReLu neural network

Layers: We shall denote in the following the layer number by the upper script $\ell$. We have $\ell=0$ for the input layer, $\ell=1$ for the first hidden layer, and $\ell=L$ for the output layer. The ...
2 votes
0 answers
47 views

Convergence rate of quasi-newton method for non-convex objective function

Consider a real-valued $L$-smooth and non-convex objective function $f: \mathbb{R}^n \mapsto \mathbb{R}$. There exists a bound on number of iterations in order to find a (local) minima using ordinary ...
1 vote
1 answer
213 views

Why when a function is quadratic, the approximation by Newton's method is exact, and the algorithm converges to the global minimum in a single step?

Suppose we want to find the value of $x$ that minimizes $$ f(x)=\frac{1}{2}\|A x-b\|_{2}^{2} . $$ Specialized linear algebra algorithms can solve this problem efficiently; however, we can also explore ...
1 vote
1 answer
110 views

The preliminary of the Bandit Gradient Algorithm

In the papers introducing The Bandit Gradient Algorithm as Stochastic Gradient Ascent, the following relationship: is always considered as a preliminary and lacks proof for it. Does anyone know how ...
1 vote
0 answers
124 views

RMSProp Momentum and Decay

I'm making an application of MobileNetV2 and according to their article: We train our models using TensorFlow. We use the standard RMSPropOptimizer with both decay and momentum set to 0.9. We use ...
1 vote
0 answers
35 views

Reinforcement learning with 0 rewards and costs

Suppose we have a hallway environment, i.e, $N$ nodes from left to right, and we can either move left or right. Moving left at the leftmost node does nothing and reaching the right most node gives you ...
1 vote
0 answers
51 views

Searching for the underyling affine transformation in a ridge function

Quoting from Wikipedia: A ridge function is any function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ that can be written as the composition of a univariate function with an affine transformation, that is: $...
0 votes
0 answers
347 views

Coordinate descent for Lasso, Question about algorithm

I'm not sure why the algorithm computes $c_k$ with $\sum_{j \neq k} w_j x_{i, j}$. Why does one need to ignore the $k^{th}$ feature here? I'm not sure how this is derived. Is this the result of taking ...
1 vote
1 answer
330 views

How does Gradient Descent treat multiple features?

As far as I know, when you reach the step, in a gradient descent algorithm, to calculate step_size, you calculate ...
2 votes
1 answer
63 views

SGD statistical guarantee

I have a question regard online learning with SGD. Is there a way to give a statistical guarantee that the value obtained after $n$ samples deviates at most $\epsilon$ from the real value?

15 30 50 per page
1
2 3 4

AltStyle によって変換されたページ (->オリジナル) /