Questions tagged [gradient-descent]

The gradient-descent tag has no summary.

47 questions

1 vote

0 answers

33 views

Conditions on LR in Gradient Descent

In Introductory Lectures in Convex Optimization by Yurii Nesterov, Section 1.2.3 shows that gradient descent is guaranteed to converge if the step size is chosen either with a fixed step size or ...

Kyle's user avatar

Kyle

asked Oct 14, 2024 at 20:11

0 votes

2 answers

382 views

Find minimum of a function only knowing the ordering of a set of input points

Suppose I have a function $f: \mathbb{R}^n\rightarrow\mathbb{R}$. All I know about the function is, I have a set of pairs of vectors ($\vec{v}_a,ドル $\vec{v}_b$) for which I know which one is greater (i....

XerneraC's user avatar

XerneraC

asked Feb 6, 2024 at 12:41

1 vote

1 answer

174 views

What does RSGD stand for?

I'm reading a paper that involves an algorithm for RSGD. It's clearly a form of stochastic gradient descent, but I can't find what the R stands for. The authors provide their own implementation of it, ...

gradient-descent

Dalop's user avatar

Dalop

asked Aug 17, 2023 at 12:25

1 vote

0 answers

55 views

Understanding gradient flow of a linearized wide neural network

I've been trying to fully understand the paper "Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent" (available here), but I'm stuck on the linearization part, ...

user161590's user avatar

user161590

asked Jul 2, 2023 at 12:16

0 votes

0 answers

179 views

Create a simple Neural Network of n layers in python from scratch with numpy to solve XOR example problem using Batch Gradient Descent

I'm a young programmer that was interested by machine learning. I watched videos and read articles about the theory behind simple neural networks. However, I can't manage to set it up correctly. I've ...

NolanGio's user avatar

NolanGio

asked May 30, 2023 at 22:32

0 votes

0 answers

122 views

How to calculate the upper bound of the gredient of a multi layer ReLu neural network

Layers: We shall denote in the following the layer number by the upper script $\ell$. We have $\ell=0$ for the input layer, $\ell=1$ for the first hidden layer, and $\ell=L$ for the output layer. The ...

gradient-descent

river7816's user avatar

river7816

asked Mar 30, 2023 at 3:15

2 votes

0 answers

47 views

Convergence rate of quasi-newton method for non-convex objective function

Consider a real-valued $L$-smooth and non-convex objective function $f: \mathbb{R}^n \mapsto \mathbb{R}$. There exists a bound on number of iterations in order to find a (local) minima using ordinary ...

Leeseok Kim's user avatar

Leeseok Kim

asked Jul 11, 2022 at 22:33

1 vote

1 answer

213 views

Why when a function is quadratic, the approximation by Newton's method is exact, and the algorithm converges to the global minimum in a single step?

Suppose we want to find the value of $x$ that minimizes $$ f(x)=\frac{1}{2}\|A x-b\|_{2}^{2} . $$ Specialized linear algebra algorithms can solve this problem efficiently; however, we can also explore ...

gradient-descent

Revolucion for Monica's user avatar

Revolucion for Monica

asked Feb 4, 2022 at 17:53

1 vote

1 answer

110 views

The preliminary of the Bandit Gradient Algorithm

In the papers introducing The Bandit Gradient Algorithm as Stochastic Gradient Ascent, the following relationship: is always considered as a preliminary and lacks proof for it. Does anyone know how ...

WilliamW's user avatar

WilliamW

asked Oct 18, 2021 at 3:33

1 vote

0 answers

124 views

RMSProp Momentum and Decay

I'm making an application of MobileNetV2 and according to their article: We train our models using TensorFlow. We use the standard RMSPropOptimizer with both decay and momentum set to 0.9. We use ...

Ricardo's user avatar

Ricardo

asked Oct 15, 2021 at 23:37

1 vote

0 answers

35 views

Reinforcement learning with 0 rewards and costs

Suppose we have a hallway environment, i.e, $N$ nodes from left to right, and we can either move left or right. Moving left at the leftmost node does nothing and reaching the right most node gives you ...

Just_A_Doubt's user avatar

Just_A_Doubt

asked May 6, 2021 at 5:49

1 vote

0 answers

51 views

Searching for the underyling affine transformation in a ridge function

Quoting from Wikipedia: A ridge function is any function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ that can be written as the composition of a univariate function with an affine transformation, that is: $...

prolyx's user avatar

prolyx

asked Apr 17, 2021 at 14:42

0 votes

0 answers

347 views

Coordinate descent for Lasso, Question about algorithm

I'm not sure why the algorithm computes $c_k$ with $\sum_{j \neq k} w_j x_{i, j}$. Why does one need to ignore the $k^{th}$ feature here? I'm not sure how this is derived. Is this the result of taking ...

user134661's user avatar

user134661

asked Mar 28, 2021 at 20:03

1 vote

1 answer

330 views

How does Gradient Descent treat multiple features?

As far as I know, when you reach the step, in a gradient descent algorithm, to calculate step_size, you calculate ...

alexandrosangeli's user avatar

alexandrosangeli

asked Jan 30, 2021 at 22:10

2 votes

1 answer

63 views

SGD statistical guarantee

I have a question regard online learning with SGD. Is there a way to give a statistical guarantee that the value obtained after $n$ samples deviates at most $\epsilon$ from the real value?

Mark Regev's user avatar

Mark Regev

asked Sep 19, 2020 at 16:31

15 30 50 per page

2 3 4 Next

Stack Exchange Network

Questions tagged [gradient-descent]

Conditions on LR in Gradient Descent

Find minimum of a function only knowing the ordering of a set of input points

What does RSGD stand for?

Understanding gradient flow of a linearized wide neural network

Create a simple Neural Network of n layers in python from scratch with numpy to solve XOR example problem using Batch Gradient Descent

How to calculate the upper bound of the gredient of a multi layer ReLu neural network

Convergence rate of quasi-newton method for non-convex objective function

Why when a function is quadratic, the approximation by Newton's method is exact, and the algorithm converges to the global minimum in a single step?

The preliminary of the Bandit Gradient Algorithm

RMSProp Momentum and Decay

Reinforcement learning with 0 rewards and costs

Searching for the underyling affine transformation in a ridge function

Coordinate descent for Lasso, Question about algorithm

How does Gradient Descent treat multiple features?

SGD statistical guarantee

Hot Network Questions