pandas-ml-utils-0.0.15.tar.gz资源-CSDN下载

需积分: 1 160 浏览量 2024-03-07 12:45:08 上传评论收藏 755KB GZ 举报

共78个文件

py：51个

png：7个

rst：4个

资源推荐

资源详情

资源评论

收起资源包目录

pandas-ml-utils-0.0.15.tar.gz （78个子文件）

pandas-ml-utils-0.0.15

setup.py 1KB

LICENSE 1KB

PKG-INFO 227B

pandas_ml_utils

utils.py 2KB

__init__.py 3KB

train_test_data.py 5KB

reinforcement

__init__.py 0B

agent.py 2KB

summary.py 734B

gym.py 2KB

pandas_utils_extension.py 1KB

error

__init__.py 0B

functions.py 126B

analysis

__init__.py 0B

correlation_analysis.py 1KB

selection.py 6KB

datafetching

__init__.py 0B

fetch_yahoo.py 2KB

style.html 209B

regression

__init__.py 0B

regressor.py 2KB

summary.py 378B

model

fitter.py 10KB

__init__.py 0B

fit.py 2KB

models.py 12KB

summary.py 122B

features_and_Labels.py 9KB

fit.py.html 618B

classification

__init__.py 0B

classification_plots.py 674B

summary.py.html 3KB

summary.py 4KB

classifier.py 4KB

constants.py 186B

wrappers

__init__.py 0B

lazy_dataframe.py 2KB

hashable_dataframe.py 532B

extern

__init__.py 0B

loss_functions.py 3KB

.readthedocs.yml 447B

docs

__init__.py 0B

feature_analysis.rst 490B

index.rst 1KB

make.bat 760B

Makefile 634B

apply_models_on_dataframes.rst 1KB

conf.py 2KB

requirements.txt 57B

_static

burritos.csv 66KB

api.rst 2KB

pyproject.toml 830B

requirements.txt 201B

deploy.sh 279B

test

test__fitter.py 7KB

test__features_and_labels.py 3KB

test__utils.py 1KB

z_make_doc_test.py 533B

test__fetch_yahoo.py 1KB

test__feature_selection.py 574B

test__classification_summary.py 3KB

z_component_test.csv 463KB

test__hashable_dataframe.py 507B

test__model.py 3KB

z_component_test.py 14KB

test__save_load.py 2KB

test__classifier.py 1KB

test__lazy_dataframe.py 827B

test__make_train_test_data.py 11KB

.gitignore 72B

README.md 9KB

Readme_files

Readme_2_5.png 11KB

Readme_2_7.png 10KB

Readme_2_0.png 51KB

Readme_2_9.png 10KB

Readme_2_4.png 12KB

fit_burritos.png 469KB

Readme_2_2.png 11KB

![PyPI - Downloads](https://img.shields.io/pypi/dw/pandas-ml-utils) ![pandas-ml-utils.readthedocs.io](https://readthedocs.org/projects/pandas-ml-utils/badge/?version=latest&style=plastic ) # Pandas ML Utils Pandas ML Utils is intended to help you through your journey of applying statistical oder machine learning models to data while you never need to leave the world of pandas. 1. install 1. analyze your features 1. find a model 1. save and reuse your model Or [read the docs](https://pandas-ml-utils.readthedocs.io/en/latest/). ## Install ```bash pip install pandas-ml-utils ``` ## Analyze your Features The feature_selection functionality helps you to analyze your features, filter out highly correlated once and focus on the most important features. This function also applies an auto regression and embeds and ACF plot. ```python import pandas_ml_utils as pmu import pandas as pd df = pd.read_csv('burritos.csv')[["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall"]] df.feature_selection(label_column="overall") ``` ![png](Readme_files/Readme_2_0.png) Tortilla overall Synergy Fillings Temp Salsa \ Tortilla 1.0 0.403981 0.367575 0.345613 0.290702 0.267212 Meat Uniformity Meat:filling Wrap Tortilla 0.260194 0.208666 0.207518 0.160831 label is continuous: True ![png](Readme_files/Readme_2_2.png) Feature ranking: ['Synergy', 'Meat', 'Fillings', 'Meat:filling', 'Wrap', 'Tortilla', 'Uniformity', 'Salsa', 'Temp'] TOP 5 features Synergy Meat Fillings Meat:filling Wrap Synergy 1.0 0.601545 0.663328 0.428505 0.08685 filtered features with correlation < 0.5 Synergy Meat:filling Wrap Tortilla 0.367575 0.207518 0.160831 ![png](Readme_files/Readme_2_4.png) ![png](Readme_files/Readme_2_5.png) Synergy 1.000000 Synergy_0 1.000000 Synergy_1 0.147495 Synergy_56 0.128449 Synergy_78 0.119272 Synergy_55 0.111832 Synergy_79 0.086466 Synergy_47 0.085117 Synergy_53 0.084786 Synergy_37 0.084312 Name: Synergy, dtype: float64 ![png](Readme_files/Readme_2_7.png) Meat:filling 1.000000 Meat:filling_0 1.000000 Meat:filling_15 0.185946 Meat:filling_35 0.175837 Meat:filling_1 0.122546 Meat:filling_87 0.118597 Meat:filling_33 0.112875 Meat:filling_73 0.103090 Meat:filling_72 0.103054 Meat:filling_71 0.089437 Name: Meat:filling, dtype: float64 ![png](Readme_files/Readme_2_9.png) Wrap 1.000000 Wrap_0 1.000000 Wrap_63 0.210823 Wrap_88 0.189735 Wrap_1 0.169132 Wrap_87 0.166502 Wrap_66 0.146689 Wrap_89 0.141822 Wrap_74 0.120047 Wrap_11 0.115095 Name: Wrap, dtype: float64 best lags are [(1, '-1.00'), (2, '-0.15'), (88, '-0.10'), (64, '-0.07'), (19, '-0.07'), (89, '-0.06'), (36, '-0.05'), (43, '-0.05'), (16, '-0.05'), (68, '-0.04'), (90, '-0.04'), (87, '-0.04'), (3, '-0.03'), (20, '-0.03'), (59, '-0.03'), (75, '-0.03'), (91, '-0.03'), (57, '-0.03'), (46, '-0.02'), (48, '-0.02'), (54, '-0.02'), (73, '-0.02'), (25, '-0.02'), (79, '-0.02'), (76, '-0.02'), (37, '-0.02'), (71, '-0.02'), (15, '-0.02'), (49, '-0.02'), (12, '-0.02'), (65, '-0.02'), (40, '-0.02'), (24, '-0.02'), (78, '-0.02'), (53, '-0.02'), (8, '-0.02'), (44, '-0.01'), (45, '0.01'), (56, '0.01'), (26, '0.01'), (82, '0.01'), (77, '0.02'), (22, '0.02'), (83, '0.02'), (11, '0.02'), (66, '0.02'), (31, '0.02'), (80, '0.02'), (92, '0.02'), (39, '0.03'), (27, '0.03'), (70, '0.04'), (41, '0.04'), (51, '0.04'), (4, '0.04'), (7, '0.05'), (13, '0.05'), (97, '0.06'), (60, '0.06'), (42, '0.06'), (96, '0.06'), (95, '0.06'), (30, '0.07'), (81, '0.07'), (52, '0.07'), (9, '0.07'), (61, '0.07'), (84, '0.07'), (29, '0.08'), (94, '0.08'), (28, '0.11')] ## Fit a Model Once you know your features you can start to try out different models i.e. a very basic Logistic Regression. It is also possible to search through a set of hyper parameters. ```python import pandas as pd import pandas_ml_utils as pmu from sklearn.linear_model import LogisticRegression df = pd.read_csv('burritos.csv') df["with_fires"] = df["Fries"].apply(lambda x: str(x).lower() == "x") df["price"] = df["Cost"] * -1 df = df[["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall", "with_fires", "price"]].dropna() fit = df.fit_classifier(pmu.SkitModel(LogisticRegression(solver='lbfgs'), pmu.FeaturesAndLabels(["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall"], ["with_fires"], targets=("price", "price")))) fit ``` ![png](Readme_files/fit_burritos.png) ## Save and use your model Once you are happy with your model you can save it and apply it on any DataFrame which serves the needed columns by your features. ```python fit.save_model("/tmp/burrito.model") ``` ```python df = pd.read_csv('burritos.csv') df["price"] = df["Cost"] * -1 df = df[["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall", "price"]].dropna() df.classify(pmu.Model.load("/tmp/burrito.model")).tail() ``` <div> <table border="1" class="dataframe"> <thead> <tr> <th></th> <th colspan="3" halign="left">price</th> </tr> <tr> <th></th> <th colspan="2" halign="left">prediction</th> <th>target</th> </tr> <tr> <th></th> <th>value</th> <th>value_proba</th> <th>value</th> </tr> </thead> <tbody> <tr> <th>380</th> <td>False</td> <td>0.251311</td> <td>-6.85</td> </tr> <tr> <th>381</th> <td>False</td> <td>0.328659</td> <td>-6.85</td> </tr> <tr> <th>382</th> <td>False</td> <td>0.064751</td> <td>-11.50</td> </tr> <tr> <th>383</th> <td>False</td> <td>0.428745</td> <td>-7.89</td> </tr> <tr> <th>384</th> <td>False</td> <td>0.265546</td> <td>-7.89</td> </tr> </tbody> </table> </div> ## TODO * allow multiple class for classification * replace hard coded summary objects by a summary provider function * add more tests * add Proximity https://stats.stackexchange.com/questions/270201/pooling-levels-of-categorical-variables-for-regression-trees/275867#275867 ## Wanna help? * currently I only need binary classification * maybe you want to add a feature for multiple classes * for non classification problems you might want to augment the `Summary` * write some tests * add different more charts for a better understanding/interpretation of the models * add whatever you need for yourself and share it with us ## Change Log ### 0.0.12 * added sphinx documentation * added multi model as regular model which has quite a big impact * features and labels signature changed * multiple targets has now the consequence that a lot of things a returning a dict now * everything is using now DataFrames instead of arrays after plain model invoke * added some tests * fixed some bugs a long the way ### 0.0.11 * Added Hyper parameter tuning ```python from hyperopt import hp fit = df.fit_classifier( pdu.SkitModel(MLPClassifier(activation='tanh', hidden_layer_sizes=(60, 50), random_state=42), pdu.FeaturesAndLabels(features=['vix_Close'], labels=['label'], targets=("vix_Open", "spy_Volume"))), test_size=0.4,

评论收藏

内容反馈