Abstract

Sometimes you have a set of data in which each data point is not equivalently relevant. Here I will walk through a variant of the standard derivation for least squares regression that incorporates an arbitrary weight vector that will allow certain data points to have a greater or lesser effect on the model's fit.

Method

Essentially this will be a derivation for closed-form least squares regression (LSR) but with the added complication of a $w$ vector. Begin with stating the end goal,
$$
\newcommand{\mysum}{\sum}
\newcommand{\ew}{\epsilon_w}
\newcommand{\w}{\sum w_i}
\newcommand{\wx}{\sum w_i X_i}
\newcommand{\wy}{\sum w_i Y_i}
\newcommand{\wxy}{\sum w_i X_i Y_i}
\newcommand{\wxx}{\sum w_i X_i^2}
\newcommand{\linesp}{\\[16pt]}
\hat{Y} = a + bX_i \label{a} \tag{1}
$$
for $i \in (1, n)$ where $n$ is the total number of observations. So our goal is to find $a$ and $b$ such that we minimize total ** weighted** square error $\epsilon_w$,
$$ \epsilon_w = \sum_{i=1}^n w_i (Y_i - \hat{Y})^2 \label{b} \tag{2}$$
Notice that each observation's contribution to the total error is weighted by the corresponding $w_i$. This means that we can arbitrarily lessen the impact of individual observations by changing $w$. Combining ($\ref{a}$) and ($\ref{b}$) we have
$$ \epsilon_w = \sum_{i=1}^n w_i (Y_i - a - bX_i)^2 \label{c} \tag{3}$$
So now that we have the error of $\hat{Y}$ in terms of $a$ and $b$, we just need to minimize it using some basic multivariable calculus. In order to find the

So we have our final closed-form solution, $$ a = \frac{\sum w_i Y_i - b \sum w_i X_i}{\sum w_i} \linesp b = \frac{\w \wxy - \wx \wy}{\w \wxx - \wx^2} $$

References

http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf