REGRESSION, SEGMENTATION, CLUSTERING, AND PREDICTION PROJECTS WITH PYTHON

Name: REGRESSION, SEGMENTATION, CLUSTERING, AND PREDICTION PROJECTS WITH PYTHON
Rating: 4 (2 reviews)

Vivian Siahaan · Rismon Hasiholan Sianipar

feb. 2022 · BALIGE PUBLISHING

4,0

2 recenzii

Carte electronică

623

Pagini

Evaluările și recenziile nu sunt verificate Află mai multe

Despre această carte electronică

PROJECT 1: TIME-SERIES WEATHER: FORECASTING AND PREDICTION WITH PYTHON

Weather data are described and quantified by the variables of Earth's atmosphere: temperature, air pressure, humidity, and the variations and interactions of these variables, and how they change over time. Different spatial scales are used to describe and predict weather on local, regional, and global levels.

The dataset used in this project contains weather data for New Delhi, India. This data was taken out from wunderground. It contains various features such as temperature, pressure, humidity, rain, precipitation, etc. The main target is to develop a prediction model accurate enough for forecasting temperature and predicting target variable (condition).

Time-series weather forecasting will be done using ARIMA models. The machine learning models used in this project to predict target variable (condition) are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

PROJECT 2: HOUSE PRICE: ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON

The dataset used in this project is taken from the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.

The data contains information from the 1990 California census. Although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning. The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows: longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, median_house_value, and ocean_proximity.

The machine learning models used in this project used to perform regression on median_house_value and to predict it as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

PROJECT 3: CUSTOMER PERSONALITY ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON

Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns of different types of customers. Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

Following are the features in the dataset: ID = Customer's unique identifier; Year_Birth = Customer's birth year; Education = Customer's education level; Marital_Status = Customer's marital status; Income = Customer's yearly household income; Kidhome = Number of children in customer's household; Teenhome = Number of teenagers in customer's household; Dt_Customer = Date of customer's enrollment with the company; Recency = Number of days since customer's last purchase; MntWines = Amount spent on wine in the last 2 years; MntFruits = Amount spent on fruits in the last 2 years; MntMeatProducts = Amount spent on meat in the last 2 years; MntFishProducts = Amount spent on fish in the last 2 years; MntSweetProducts = Amount spent on sweets in the last 2 years; MntGoldProds = Amount spent on gold in the last 2 years; NumDealsPurchases = Number of purchases made with a discount; NumWebPurchases = Number of purchases made through the company's web site; NumCatalogPurchases = Number of purchases made using a catalogue; NumStorePurchases = Number of purchases made directly in stores; NumWebVisitsMonth = Number of visits to company's web site in the last month; AcceptedCmp3 = 1 if customer accepted the offer in the 3rd campaign, 0 otherwise; AcceptedCmp4 = 1 if customer accepted the offer in the 4th campaign, 0 otherwise; AcceptedCmp5 = 1 if customer accepted the offer in the 5th campaign, 0 otherwise; AcceptedCmp1 = 1 if customer accepted the offer in the 1st campaign, 0 otherwise; AcceptedCmp2 = 1 if customer accepted the offer in the 2nd campaign, 0 otherwise; Response = 1 if customer accepted the offer in the last campaign, 0 otherwise; and Complain = 1 if customer complained in the last 2 years, 0 otherwise. The target in this project is to perform clustering and predicting to summarize customer segments.

In this project, you will perform clustering using KMeans to get 4 clusters. The machine learning models used in this project to perform regression on total number of purchase and to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

PROJECT 4: CUSTOMER SEGMENTATION, CLUSTERING, AND PREDICTION WITH PYTHON

In this project, you will develop a customer segmentation, clustering, and prediction to define marketing strategy. The sample dataset summarizes the usage behavior of about 9000 active credit card holders during the last 6 months. The file is at a customer level with 18 behavioral variables.

Following is the Data Dictionary for Credit Card dataset: CUSTID: Identification of Credit Card holder (Categorical); BALANCE: Balance amount left in their account to make purchases; BALANCEFREQUENCY: How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated); PURCHASES: Amount of purchases made from account; ONEOFFPURCHASES: Maximum purchase amount done in one-go; INSTALLMENTSPURCHASES: Amount of purchase done in installment; CASHADVANCE: Cash in advance given by the user; PURCHASESFREQUENCY: How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased); ONEOFFPURCHASESFREQUENCY: How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased); PURCHASESINSTALLMENTSFREQUENCY: How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done); CASHADVANCEFREQUENCY: How frequently the cash in advance being paid; CASHADVANCETRX: Number of Transactions made with "Cash in Advanced"; PURCHASESTRX: Number of purchase transactions made; CREDITLIMIT: Limit of Credit Card for user; PAYMENTS: Amount of Payment done by user; MINIMUM_PAYMENTS: Minimum amount of payments made by user; PRCFULLPAYMENT: Percent of full payment paid by user; and TENURE: Tenure of credit card service for user.

In this project, you will perform clustering using KMeans to get 5 clusters. The machine learning models used in this project to perform regression on total number of purchase and to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Evaluări și recenzii

4,0

2 recenzii

Despre autor

Vivian Siahaan is a fast-learner who likes to do new things. She was born, raised in Hinalang Bagasan, Balige, on the banks of Lake Toba, and completed high school education from SMAN 1 Balige. She started herself learning Java, Android, JavaScript, CSS, C ++, Python, R, Visual Basic, Visual C #, MATLAB, Mathematica, PHP, JSP, MySQL, SQL Server, Oracle, Access, and other programming languages. She studied programming from scratch, starting with the most basic syntax and logic, by building several simple and applicable GUI applications. Animation and games are fields of programming that are interests that she always wants to develop. Besides studying mathematical logic and programming, the author also has the pleasure of reading novels. Vivian Siahaan has written dozens of ebooks that have been published on Sparta Publisher: Data Structure with Java; Java Programming: Cookbook; C ++ Programming: Cookbook; C Programming For High Schools / Vocational Schools and Students; Java Programming for SMA / SMK; Java Tutorial: GUI, Graphics and Animation; Visual Basic Programming: From A to Z; Java Programming for Animation and Games; C # Programming for SMA / SMK and Students; MATLAB For Students and Researchers; Graphics in JavaScript: Quick Learning Series; JavaScript Image Processing Methods: From A to Z; Java GUI Case Study: AWT & Swing; Basic CSS and JavaScript; PHP / MySQL Programming: Cookbook; Visual Basic: Cookbook; C ++ Programming for High Schools / Vocational Schools and Students; Concepts and Practices of C ++; PHP / MySQL For Students; C # Programming: From A to Z; Visual Basic for SMA / SMK and Students; C # .NET and SQL Server for High School / Vocational School and Students. At the ANDI Yogyakarta publisher, Vivian Siahaan also wrote a number of books including: Python Programming Theory and Practice; Python GUI Programming; Python GUI and Database; Build From Zero School Database Management System In Python / MySQL; Database Management System in Python / MySQL; Python / MySQL For Management Systems of Criminal Track Record Database; Java / MySQL For Management Systems of Criminal Track Records Database; Database and Cryptography Using Java / MySQL; Build From Zero School Database Management System With Java / MySQL.

Rismon Hasiholan Sianipar was born in Pematang Siantar, in 1994. After graduating from SMAN 3 Pematang Siantar 3, the writer traveled to the city of Jogjakarta. In 1998 and 2001 the author completed his Bachelor of Engineering (S.T) and Master of Engineering (M.T) education in the Electrical Engineering of Gadjah Mada University, under the guidance of Prof. Dr. Adhi Soesanto and Prof. Dr. Thomas Sri Widodo, focusing on research on non-stationary signals by analyzing their energy using time-frequency maps. Because of its non-stationary nature, the distribution of signal energy becomes very dynamic on a time-frequency map. By mapping the distribution of energy in the time-frequency field using discrete wavelet transformations, one can design non-linear filters so that they can analyze the pattern of the data contained in it. In 2003, the author received a Monbukagakusho scholarship from the Japanese Government. In 2005 and 2008, he completed his Master of Engineering (M.Eng) and Doctor of Engineering (Dr.Eng) education at Yamaguchi University, under the guidance of Prof. Dr. Hidetoshi Miike. Both the master's thesis and his doctoral thesis, R.H. Sianipar combines SR-FHN (Stochastic Resonance Fitzhugh-Nagumo) filter strength with cryptosystem ECC (elliptic curve cryptography) 4096-bit both to suppress noise in digital images and digital video and maintain its authenticity. The results of this study have been documented in international scientific journals and officially patented in Japan. One of the patents was published in Japan with a registration number 2008-009549. He is active in collaborating with several universities and research institutions in Japan, particularly in the fields of cryptography, cryptanalysis and audio / image / video digital forensics. R.H. Sianipar also has experience in conducting code-breaking methods (cryptanalysis) on a number of intelligence data that are the object of research studies in Japan. R.H. Sianipar has a number of Japanese patents, and has written a number of national / international scientific articles, and dozens of national books. R.H. Sianipar has also participated in a number of workshops related to cryptography, cryptanalysis, digital watermarking, and digital forensics. In a number of workshops, R.H. Sianipar helps Prof. Hidetoshi Miike to create applications related to digital image / video processing, steganography, cryptography, watermarking, non-linear screening, intelligent descriptor-based computer vision, and others, which are used as training materials. Field of interest in the study of R.H. Sianipar is multimedia security, signal processing / digital image / video, cryptography, digital communication, digital forensics, and data compression / coding. Until now, R.H. Sianipar continues to develop applications related to analysis of signal, image, and digital video, both for research purposes and for commercial purposes based on the Python programming language, MATLAB, C ++, C, VB.NET, C # .NET, R, and Java.

Evaluează cartea electronică

Spune-ne ce crezi.

Informații despre lectură

Smartphone-uri și tablete

Instalează aplicația Cărți Google Play pentru Android și iPad/iPhone. Se sincronizează automat cu contul tău și poți să citești online sau offline de oriunde te afli.

Laptopuri și computere

Poți să asculți cărțile audio achiziționate pe Google Play folosind browserul web al computerului.

Dispozitive eReader și alte dispozitive

Ca să citești pe dispozitive pentru citit cărți electronice, cum ar fi eReaderul Kobo, trebuie să descarci un fișier și să îl transferi pe dispozitiv. Urmează instrucțiunile detaliate din Centrul de ajutor pentru a transfera fișiere pe dispozitivele eReader compatibile.

Raportează conținut ilegal