What is the difference between sklearn’s LinearRegression and TweedieRegressor

Tracyrenee
5 min readNov 5, 2021

I have been very interested in sklearn’s TweedieRegressor for quite some time, so since I had a little time on my hands, I decided to explore this regressor to see what it does. The first thing I noticed when I looked up the estimator on sklearn is the fact that it is a linear model, just like LinearRegression. I had a good feeling about this estimator.

Sklearn documentation states the TweedieRegressor is a generalised estimation model with a Tweedie distribution. Tweedie distributions are a family of probability distributions that include the purely continuous normal, gamma, and Inverse Gaussian distributions, the purely discrete scaled Poisson Distribution, and the class of compound Poisson-gamma distributions that have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalised linear models.

With the knowledge that the TweedieRegressor is a linear model, I decided to try it out on a dataset to see how it works. The dataset I chose is the 1000 companies dataset and it can be found here:- https://raw.githubusercontent.com/boosuro/profit_estimation_of_companies/master/1000_Companies.csv

I wrote the script in Google Colab, which is a free online Jupyter Notebook hosted by Google. Google Colab is my Jupyter Notebook of choice, but the website does have a few niggles that must be worked around before I can achieve maximum efficiency with this platform.

For reasons that I don’t understand, Google does not keep the most current versions of libraries on their platform, so I had to install the most current version of sklearn onto Google Colab before I could use the TweedieRegressor:-

I then imported the libraries that I would need to use to run the script. I normally only import libraries as I need them, but on this occasion I have imported numpy, pandas, sklearn, matplotlib and seaborn. Numpy creates arrays and performs algebraic computations, pandas creates and manipulates dataframes, sklearn houses the numerous…

--

--

Tracyrenee

I have close to five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.