: SGD

stochastic_gradient_descent_brothers

随机梯度下降算法兄弟们：SGD，SGD+Momentum，Nesterov Momentum，RMSProp，Adam

2024-11-11

/posts/optimazationmethods/sgd/2024-11-11-gradient_descent/ map[email:1522009317@qq.com name:fmh]

#SGD

Linear Regression: frequentist and bayesian

You’ve got to draw the line somewhere.

2021-08-25

/posts/regressionmodels/linear_regression/ map[email:1522009317@qq.com name:fmh]

Book Notes: Optimization algorithms

## 8.3 Basic Algorithms

Most deep learning algorithms involve optimization of some sort. Optimization refers to the task of either minimizing or maximizing some function $f(x)$ by altering $x$. The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function.

Suppose we have a function $y = f(x)$, where both $x$ and $y$ are real numbers. The derivative of this function is denoted as $f’(x)$ or as $\frac{dx}{dy}$. The derivative $f’(x)$ gives the slope of $f(x)$ at the point $x$. In other words, it specifies how to scale a small change in the input to obtain the corresponding change in the output: $f(x + \epsilon) \approx f(x) + \epsilon f’(x)$.

2020-06-16

/posts/optimazationmethods/sgd/optimization_algorithms/ map[email:1522009317@qq.com name:fmh]

#SGD