Fake News Detection project In Python

Here's a simple example of fake news detection using machine learning in Python.




\




We'll be using the TfidfVectorizer and PassiveAggressiveClassifier from scikit-learn library.

First, let's import the required libraries:

python


Copy code

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, confusion_matrix

from sklearn.linear_model import PassiveAggressiveClassifier

Next, let's load the dataset into a Pandas DataFrame and split it into training and testing data:

python


Copy code

# Load dataset into a Pandas DataFrame

df = pd.read_csv('news.csv')


# Split the data into training and testing data

X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)


After splitting the data, we'll convert the text data into numerical form using TfidfVectorizer:

python


Copy code

# Initialize TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)


# Fit and transform training data

tfidf_train = tfidf_vectorizer.fit_transform(X_train)


# Transform testing data

tfidf_test = tfidf_vectorizer.transform(X_test)


Then, we'll train our model using the PassiveAggressiveClassifier algorithm:

python


Copy code

# Initialize PassiveAggressiveClassifier

pac = PassiveAggressiveClassifier(max_iter=50)


# Fit the model

pac.fit(tfidf_train, y_train)


# Predict on the testing data

y_pred = pac.predict(tfidf_test)


Finally, we'll evaluate the model's performance by calculating the accuracy score and the confusion matrix:

python


Copy code


# Calculate the accuracy score

score = accuracy_score(y_test, y_pred)

print(f'Accuracy: {round(score*100,2)}%')


# Calculate the confusion matrix

confusion_matrix(y_test, y_pred)


That's it! Here's the complete source code:

python


Copy code

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, confusion_matrix

from sklearn.linear_model import PassiveAggressiveClassifier


# Load dataset into a Pandas DataFrame

df = pd.read_csv('news.csv')


# Split the data into training and testing data

X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)


# Initialize TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)


# Fit and transform training data

tfidf_train = tfidf_vectorizer.fit_transform(X_train)


# Transform testing data

tfidf_test = tfidf_vectorizer.transform(X_test)


# Initialize PassiveAggressiveClassifier

pac = PassiveAggressiveClassifier(max_iter=50)


# Fit the model

pac.fit(tfidf_train, y_train)


# Predict on the testing data

y_pred = pac.predict(tfidf_test)


# Calculate the accuracy score

score = accuracy_score(y_test, y_pred)

print(f'Accuracy: {round(score*100,2)}%')


# Calculate the confusion matrix

confusion_matrix(y_test, y_pred)

Post a Comment

Previous Post Next Post