{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "id": "8bPV9aEwTKC8" }, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import sklearn\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "jFHJbjkfeepf" }, "outputs": [], "source": [ "RANDOM_SEED = 0x0" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "ykbI8UnR6PsU" }, "source": [ "# TASK 1 (2 Points): \n", "\n", "We work with the \"Wine Recognition\" dataset. You can read more about this dataset at [https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-recognition-dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-recognition-dataset).\n", "\n", "The data is the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators.\n", "The data is loaded below and split into `data` and `target`. `data` is a `Dataframe` that contains the result of the chemical analysis while `target` contains an integer representing the wine cultivator." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "em6VCOuE6MRU" }, "outputs": [], "source": [ "from sklearn.datasets import load_wine\n", "(data, target) = load_wine(return_X_y=True, as_frame=True)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "HJoAuMNR6MgM" }, "outputs": [ { "data": { "text/html": [ "
\n", " | alcohol | \n", "malic_acid | \n", "ash | \n", "alcalinity_of_ash | \n", "magnesium | \n", "total_phenols | \n", "flavanoids | \n", "nonflavanoid_phenols | \n", "proanthocyanins | \n", "color_intensity | \n", "hue | \n", "od280/od315_of_diluted_wines | \n", "proline | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "14.23 | \n", "1.71 | \n", "2.43 | \n", "15.6 | \n", "127.0 | \n", "2.80 | \n", "3.06 | \n", "0.28 | \n", "2.29 | \n", "5.64 | \n", "1.04 | \n", "3.92 | \n", "1065.0 | \n", "
1 | \n", "13.20 | \n", "1.78 | \n", "2.14 | \n", "11.2 | \n", "100.0 | \n", "2.65 | \n", "2.76 | \n", "0.26 | \n", "1.28 | \n", "4.38 | \n", "1.05 | \n", "3.40 | \n", "1050.0 | \n", "
2 | \n", "13.16 | \n", "2.36 | \n", "2.67 | \n", "18.6 | \n", "101.0 | \n", "2.80 | \n", "3.24 | \n", "0.30 | \n", "2.81 | \n", "5.68 | \n", "1.03 | \n", "3.17 | \n", "1185.0 | \n", "
3 | \n", "14.37 | \n", "1.95 | \n", "2.50 | \n", "16.8 | \n", "113.0 | \n", "3.85 | \n", "3.49 | \n", "0.24 | \n", "2.18 | \n", "7.80 | \n", "0.86 | \n", "3.45 | \n", "1480.0 | \n", "
4 | \n", "13.24 | \n", "2.59 | \n", "2.87 | \n", "21.0 | \n", "118.0 | \n", "2.80 | \n", "2.69 | \n", "0.39 | \n", "1.82 | \n", "4.32 | \n", "1.04 | \n", "2.93 | \n", "735.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
173 | \n", "13.71 | \n", "5.65 | \n", "2.45 | \n", "20.5 | \n", "95.0 | \n", "1.68 | \n", "0.61 | \n", "0.52 | \n", "1.06 | \n", "7.70 | \n", "0.64 | \n", "1.74 | \n", "740.0 | \n", "
174 | \n", "13.40 | \n", "3.91 | \n", "2.48 | \n", "23.0 | \n", "102.0 | \n", "1.80 | \n", "0.75 | \n", "0.43 | \n", "1.41 | \n", "7.30 | \n", "0.70 | \n", "1.56 | \n", "750.0 | \n", "
175 | \n", "13.27 | \n", "4.28 | \n", "2.26 | \n", "20.0 | \n", "120.0 | \n", "1.59 | \n", "0.69 | \n", "0.43 | \n", "1.35 | \n", "10.20 | \n", "0.59 | \n", "1.56 | \n", "835.0 | \n", "
176 | \n", "13.17 | \n", "2.59 | \n", "2.37 | \n", "20.0 | \n", "120.0 | \n", "1.65 | \n", "0.68 | \n", "0.53 | \n", "1.46 | \n", "9.30 | \n", "0.60 | \n", "1.62 | \n", "840.0 | \n", "
177 | \n", "14.13 | \n", "4.10 | \n", "2.74 | \n", "24.5 | \n", "96.0 | \n", "2.05 | \n", "0.76 | \n", "0.56 | \n", "1.35 | \n", "9.20 | \n", "0.61 | \n", "1.60 | \n", "560.0 | \n", "
178 rows × 13 columns
\n", "\n", " | Average | \n", "Standard Deviation | \n", "Minimum | \n", "Maximum | \n", "Range | \n", "
---|---|---|---|---|---|
Un-Normalized | \n", "1.297101e+01 | \n", "0.851975 | \n", "11.030000 | \n", "14.830000 | \n", "3.800000 | \n", "
Standard Deviation | \n", "-1.160603e-15 | \n", "1.000000 | \n", "-2.278245 | \n", "2.181979 | \n", "4.460224 | \n", "
Min/Max Normalized | \n", "5.107917e-01 | \n", "0.224204 | \n", "0.000000 | \n", "1.000000 | \n", "1.000000 | \n", "