Skip to main content

Multi linear regression for heart disease risk prediction system

 Multi linear regression for heart disease risk prediction system. 

Step 1: Import Required Libraries

import pandas as pd
import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score import matplotlib.pyplot as plt import seaborn as sns

Step 2: Load and Prepare the Dataset

For this example, I'll create a synthetic dataset. In a real scenario, you would load your dataset from a file.

# Creating a synthetic dataset
np.random.seed(42) data_size = 200 age = np.random.randint(30, 70, data_size) cholesterol = np.random.randint(150, 300, data_size) blood_pressure = np.random.randint(80, 180, data_size) smoking = np.random.randint(0, 2, data_size) # 0 for non-smoker, 1 for smoker diabetes = np.random.randint(0, 2, data_size) # 0 for no diabetes, 1 for diabetes # Risk score (synthetic target variable) risk_score = ( 0.3 * age + 0.2 * cholesterol + 0.3 * blood_pressure + 10 * smoking + 8 * diabetes + np.random.normal(0, 10, data_size) ) # Creating a DataFrame df = pd.DataFrame({ 'Age': age, 'Cholesterol': cholesterol, 'Blood Pressure': blood_pressure, 'Smoking': smoking, 'Diabetes': diabetes, 'Risk Score': risk_score }) # Display the first few rows of the dataset print(df.head())

Step 3: Exploratory Data Analysis (EDA)

# Pairplot to visualize relationships between features and target
sns.pairplot(df) plt.show() # Correlation matrix to check relationships between features corr_matrix = df.corr() sns.heatmap(corr_matrix, annot=True, cmap="coolwarm") plt.show()

Step 4: Split the Dataset into Training and Testing Sets


# Features and target variable X = df[['Age', 'Cholesterol', 'Blood Pressure', 'Smoking', 'Diabetes']] y = df['Risk Score'] # Splitting the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, linear Regression Model
# Creating and training the model
model = LinearRegression() model.fit(X_train, y_train) # Model coefficients print("Coefficients:", model.coef_) print("Intercept:", model.intercept_)

Step 6: Make Predictions and Evaluate the Model

# Making predictions on the test set
y_pred = model.predict(X_test) # Evaluating the model mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse}") print(f"R-squared: {r2}")

Step 7: Visualize the Results


:
  1. Data Generation: A synthetic dataset is created with features like Age, Cholesterol, Blood Pressure, Smoking, and Diabetes to predict a synthetic Risk Score.

  2. EDA: Exploratory Data Analysis helps understand the relationships between the features and the target variable.

  3. Model Training: The multiple linear regression model is trained on the dataset. The model’s coefficients indicate the weight of each feature in predicting the risk score.

  4. Evaluation: The model's performance is evaluated using Mean Squared Error (MSE) and R-squared values.

  5. Visualization: Visualizing actual vs. predicted values and residuals helps in assessing the model's fit.

Real Dataset Consideration:

Replace the synthetic data generation part with your actual dataset, ensuring that your data is clean and well-preprocessed. You might need to handle missing values, normalize/standardize features, and encode categorical variables depending on your dataset's characteristics.

This code provides a foundation for building a heart disease risk prediction system using multiple linear regression. Let me know if you need further assistance with your specific dataset or model improvements!

Comments

Popular posts from this blog

Mastering Machine Learning with scikit-learn: A Comprehensive Guide for Enthusiasts and Practitioners

Simplifying Machine Learning with Scikit-Learn: A Programmer's Guide Introduction: In today's digital age, machine learning has become an integral part of many industries. As a programmer, diving into the world of machine learning can be both exciting and overwhelming. However, with the help of powerful libraries like Scikit-Learn, the journey becomes much smoother. In this article, we will explore Scikit-Learn and how it simplifies the process of building machine learning models. What is Scikit-Learn? Scikit-Learn, also known as sklearn, is a popular open-source machine learning library for Python. It provides a wide range of tools and algorithms for various tasks, including classification, regression, clustering, and dimensionality reduction. With its user-friendly interface and extensive documentation, Scikit-Learn has become the go-to choice for many programmers and data scientists . Key Features of Scikit-Learn:  Simple and Consistent API: Scikit-Learn follows a consiste...

An Introduction to LangChain: Simplifying Language Model Applications

  An Introduction to LangChain: Simplifying Language Model Applications LangChain is a powerful framework designed to streamline the development and deployment of applications that leverage language models. As the capabilities of language models continue to expand, LangChain offers a unified interface and a set of tools that make it easier for developers to build complex applications, manage workflows, and integrate with various data sources. Let's explore what LangChain is, its key features, and how it can be used to create sophisticated language model-driven applications. What is LangChain? LangChain is an open-source framework that abstracts the complexities of working with large language models (LLMs) and provides a consistent, modular approach to application development. It is particularly well-suited for tasks that involve natural language processing (NLP), such as chatbots, data analysis, content generation, and more. By providing a cohesive set of tools and components, Lang...

Hugging Face: Revolutionizing Natural Language Processing

  Hugging Face: Revolutionizing Natural Language Processing Hugging Face has emerged as a pivotal player in the field of Natural Language Processing (NLP), driving innovation and accessibility through its open-source model library and powerful tools. Founded in 2016 as a chatbot company, Hugging Face has since pivoted to become a leader in providing state-of-the-art machine learning models for NLP tasks, making these sophisticated models accessible to researchers, developers, and businesses around the world. What is Hugging Face? Hugging Face is best known for its Transformers library, a highly popular open-source library that provides pre-trained models for various NLP tasks. These tasks include text classification, sentiment analysis, translation, summarization, question answering, and more. The library is built on top of deep learning frameworks such as PyTorch and TensorFlow, offering seamless integration and ease of use. Key Components of Hugging Face Transformers Library : T...