Welcome to symbolic-learn’s documentation!
symbolic-learn is a sklearn-compatible package that implements a symbolic regression model.
What is symbolic regression?
Symbolic regression is a type of regression model that combines mathematical blocks to find the function that best fits the data. Here each function is represented as a binary tree like this one:
The model initially generates a random population of such functions. It then uses genetic programming techniques on it to find out the function that best fits our dataset. As this model is based on scikit-learn’s base estimator, it can be used the same way you would use any sklearn model. Thus, you can use it in pipelines or apply fine-tuning techniques such as GridSearchCV on it.
Symbolic regression is best used when you want to take a naive approach to solving a regression problem. Unlike most existing models, it does not come with an a priori specification of a model. Therefore it is a good idea to use it when you want to find out and understand the mathematical structures in your data.
Example
Here is how to instantiate and train a symbolic regression model:
>>> from sblearn.models import SymbolicRegressor
>>> model = SymbolicRegressor()
>>> model.fit(X_train, y_train)
After training your model, you can use access the fitted functions’ simplified formulas and full function trees through the model’s attributes formulas
and trees
. Here is an example with a toy dataset where there is a linear relation between the features and our target:
>>> print(model.formulas)
['y0 = 21.227012634277344*x0 + 49.040491104125977']
>>> print(model.trees[0])
y0_tree:
(*)
|
| ─────── (/)
| |
| | ─────── x0
| |
| | ─────── x0
|
| ─────── (+)
|
| ─────── (*)
| |
| | ─────── x0
| |
| | ─────── 21.227012634277344
|
| ─────── (+)
|
| ─────── 15.254457473754883
|
| ─────── 33.786033630371094
Installation
In order to install the package, use this command:
pip install symbolic-learn
Note for Windows users: Microsoft Visual C++ 2014 or higher is required!