Abstract

Sometimes you have a set of data in which each data point is not equivalently relevant. Here I will walk through a variant of the standard derivation for least squares regression that incorporates an arbitrary weight vector that will allow certain data points to have a greater or lesser effect on the model's fit.

Method

Essentially this will be a derivation for closed-form least squares regression (LSR) but with the added complication of a $w$ vector. Begin with stating the end goal, $$\newcommand{\mysum}{\sum} \newcommand{\ew}{\epsilon_w} \newcommand{\w}{\sum w_i} \newcommand{\wx}{\sum w_i X_i} \newcommand{\wy}{\sum w_i Y_i} \newcommand{\wxy}{\sum w_i X_i Y_i} \newcommand{\wxx}{\sum w_i X_i^2} \newcommand{\linesp}{\\[16pt]} \hat{Y} = a + bX_i \label{a} \tag{1}$$ for $i \in (1, n)$ where $n$ is the total number of observations. So our goal is to find $a$ and $b$ such that we minimize total weighted square error $\epsilon_w$, $$\epsilon_w = \sum_{i=1}^n w_i (Y_i - \hat{Y})^2 \label{b} \tag{2}$$ Notice that each observation's contribution to the total error is weighted by the corresponding $w_i$. This means that we can arbitrarily lessen the impact of individual observations by changing $w$. Combining ($\ref{a}$) and ($\ref{b}$) we have $$\epsilon_w = \sum_{i=1}^n w_i (Y_i - a - bX_i)^2 \label{c} \tag{3}$$ So now that we have the error of $\hat{Y}$ in terms of $a$ and $b$, we just need to minimize it using some basic multivariable calculus. In order to find the minimum error we need to find which $a$ and $b$ have partial derivatives equal to $0$. We will begin this process by taking the partial derivative with respect to $a$,
Note: for clarity, I will now be now be omitting the bounds of all sum notation. Just know that it will always be over $(1, n)$. \begin{align} \frac{\partial \ew}{\partial a} & = \mysum -2w_i (Y_i - a - bX_i) \linesp & = 2 \mysum w_i a + w_i b X_i - w_i X_i \linesp & = 2 (a \mysum w_i + b \mysum w_i X_i - \mysum w_i Y_i) \label{d} \tag{4} \end{align} Recalling that we are aiming for this derivative to be equal to $0$, \begin{align} 0 & = 2 (a \mysum w_i + b \mysum w_i X_i - \mysum w_i Y_i) \linesp a \mysum w_i & = \mysum w_i Y_i - b \mysum w_i X_i \linesp a & = \frac{\sum w_i Y_i - b \sum w_i X_i}{\sum w_i} \label{e} \tag{5} \end{align} That gives us our formula for the intercept $a$ which just leaves us with the slope $b$ for which we will follow a similar, yet slightly more involved strategy. \begin{align} \frac{\partial \ew}{\partial b} = 0 & = \mysum -2 w_i X_i (Y_i - a - bX_i) \linesp & = a \mysum w_i X_i + b \mysum w_i X_i^2 - \mysum w_i X_i Y_i \label{f} \tag{6} \end{align} From here we substitute our formula for $a$, \begin{align} \frac{\wx \wy - b(\wx)^2}{\w} + b\wxx - \wxy & = 0 \linesp \frac{\wx \wy + b (\w \wxx - \wx^2)}{\w} - \wxy & = \linesp b \frac{(\w \wxx - \wx^2)}{\w} & = \wxy - \frac{\wx \wy}{\w} \linesp \end{align} Finally, dividing to isolate $b$, $$b = \frac{\w \wxy - \wx \wy}{\w \wxx - \wx^2} \label{g} \tag{7}$$

So we have our final closed-form solution, $$a = \frac{\sum w_i Y_i - b \sum w_i X_i}{\sum w_i} \linesp b = \frac{\w \wxy - \wx \wy}{\w \wxx - \wx^2}$$

References

http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf