Skip to main content

Multi linear regression for heart disease risk prediction system

 Multi linear regression for heart disease risk prediction system. 

Step 1: Import Required Libraries

import pandas as pd
import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score import matplotlib.pyplot as plt import seaborn as sns

Step 2: Load and Prepare the Dataset

For this example, I'll create a synthetic dataset. In a real scenario, you would load your dataset from a file.

# Creating a synthetic dataset
np.random.seed(42) data_size = 200 age = np.random.randint(30, 70, data_size) cholesterol = np.random.randint(150, 300, data_size) blood_pressure = np.random.randint(80, 180, data_size) smoking = np.random.randint(0, 2, data_size) # 0 for non-smoker, 1 for smoker diabetes = np.random.randint(0, 2, data_size) # 0 for no diabetes, 1 for diabetes # Risk score (synthetic target variable) risk_score = ( 0.3 * age + 0.2 * cholesterol + 0.3 * blood_pressure + 10 * smoking + 8 * diabetes + np.random.normal(0, 10, data_size) ) # Creating a DataFrame df = pd.DataFrame({ 'Age': age, 'Cholesterol': cholesterol, 'Blood Pressure': blood_pressure, 'Smoking': smoking, 'Diabetes': diabetes, 'Risk Score': risk_score }) # Display the first few rows of the dataset print(df.head())

Step 3: Exploratory Data Analysis (EDA)

# Pairplot to visualize relationships between features and target
sns.pairplot(df) plt.show() # Correlation matrix to check relationships between features corr_matrix = df.corr() sns.heatmap(corr_matrix, annot=True, cmap="coolwarm") plt.show()

Step 4: Split the Dataset into Training and Testing Sets


# Features and target variable X = df[['Age', 'Cholesterol', 'Blood Pressure', 'Smoking', 'Diabetes']] y = df['Risk Score'] # Splitting the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, linear Regression Model
# Creating and training the model
model = LinearRegression() model.fit(X_train, y_train) # Model coefficients print("Coefficients:", model.coef_) print("Intercept:", model.intercept_)

Step 6: Make Predictions and Evaluate the Model

# Making predictions on the test set
y_pred = model.predict(X_test) # Evaluating the model mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse}") print(f"R-squared: {r2}")

Step 7: Visualize the Results


:
  1. Data Generation: A synthetic dataset is created with features like Age, Cholesterol, Blood Pressure, Smoking, and Diabetes to predict a synthetic Risk Score.

  2. EDA: Exploratory Data Analysis helps understand the relationships between the features and the target variable.

  3. Model Training: The multiple linear regression model is trained on the dataset. The model’s coefficients indicate the weight of each feature in predicting the risk score.

  4. Evaluation: The model's performance is evaluated using Mean Squared Error (MSE) and R-squared values.

  5. Visualization: Visualizing actual vs. predicted values and residuals helps in assessing the model's fit.

Real Dataset Consideration:

Replace the synthetic data generation part with your actual dataset, ensuring that your data is clean and well-preprocessed. You might need to handle missing values, normalize/standardize features, and encode categorical variables depending on your dataset's characteristics.

This code provides a foundation for building a heart disease risk prediction system using multiple linear regression. Let me know if you need further assistance with your specific dataset or model improvements!

Comments

Popular posts from this blog

Website hosting on EC2 instances AWS Terminal

Website hosting on EC2 instances  In the world of web development and server management, Apache HTTP Server, commonly known as Apache, stands as one of the most popular and powerful web servers. Often, developers and administrators require custom images with Apache server configurations for various purposes, such as deploying standardized environments or distributing applications. In this guide, we'll walk through the process of creating a custom image with Apache server (httpd) installed on an AWS terminal.   Setting Up AWS Environment: Firstly, ensure you have an AWS account and access to the AWS Management Console. Once logged in: 1. Launch an EC2 Instance: Navigate to EC2 service and launch a new instance. Choose an appropriate Amazon Machine Image (AMI) based on your requirements. It's recommended to select a base Linux distribution such as Amazon Linux. 2. Connect to the Instance: After launching the instance, connect to it using SSH or AWS Systems Manager Session Manage...

An Introduction to LangChain: Simplifying Language Model Applications

  An Introduction to LangChain: Simplifying Language Model Applications LangChain is a powerful framework designed to streamline the development and deployment of applications that leverage language models. As the capabilities of language models continue to expand, LangChain offers a unified interface and a set of tools that make it easier for developers to build complex applications, manage workflows, and integrate with various data sources. Let's explore what LangChain is, its key features, and how it can be used to create sophisticated language model-driven applications. What is LangChain? LangChain is an open-source framework that abstracts the complexities of working with large language models (LLMs) and provides a consistent, modular approach to application development. It is particularly well-suited for tasks that involve natural language processing (NLP), such as chatbots, data analysis, content generation, and more. By providing a cohesive set of tools and components, Lang...

"Mastering Computer Vision: An In-Depth Exploration of OpenCV"

                                     OPEN CV  What is OPEN CV?   OpenCV  is a huge open-source library for computer vision, machine learning, and image processing. OpenCV supports a wide variety of programming languages like Python, C++, Java, etc. It can process images and videos to identify objects, faces, or even the handwriting of a human. When it is integrated with various libraries, such as  Numpy   which is a highly optimized library for numerical operations, then the number of weapons increases in your Arsenal i.e. whatever operations one can do in Numpy can be combined with OpenCV. With its easy-to-use interface and robust features, OpenCV has become the favorite of data scientists and computer vision engineers. Whether you’re looking to track objects in a video stream, build a face recognition system, or edit images creatively, OpenCV Python implementation is...