Here's a simple example of fake news detection using machine learning in Python.
We'll be using the TfidfVectorizer and PassiveAggressiveClassifier from scikit-learn library.
First, let's import the required libraries:
python
Copy code
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.linear_model import PassiveAggressiveClassifier
Next, let's load the dataset into a Pandas DataFrame and split it into training and testing data:
python
Copy code
# Load dataset into a Pandas DataFrame
df = pd.read_csv('news.csv')
# Split the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)
After splitting the data, we'll convert the text data into numerical form using TfidfVectorizer:
python
Copy code
# Initialize TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
# Fit and transform training data
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
# Transform testing data
tfidf_test = tfidf_vectorizer.transform(X_test)
Then, we'll train our model using the PassiveAggressiveClassifier algorithm:
python
Copy code
# Initialize PassiveAggressiveClassifier
pac = PassiveAggressiveClassifier(max_iter=50)
# Fit the model
pac.fit(tfidf_train, y_train)
# Predict on the testing data
y_pred = pac.predict(tfidf_test)
Finally, we'll evaluate the model's performance by calculating the accuracy score and the confusion matrix:
python
Copy code
# Calculate the accuracy score
score = accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(score*100,2)}%')
# Calculate the confusion matrix
confusion_matrix(y_test, y_pred)
That's it! Here's the complete source code:
python
Copy code
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.linear_model import PassiveAggressiveClassifier
# Load dataset into a Pandas DataFrame
df = pd.read_csv('news.csv')
# Split the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)
# Initialize TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
# Fit and transform training data
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
# Transform testing data
tfidf_test = tfidf_vectorizer.transform(X_test)
# Initialize PassiveAggressiveClassifier
pac = PassiveAggressiveClassifier(max_iter=50)
# Fit the model
pac.fit(tfidf_train, y_train)
# Predict on the testing data
y_pred = pac.predict(tfidf_test)
# Calculate the accuracy score
score = accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(score*100,2)}%')
# Calculate the confusion matrix
confusion_matrix(y_test, y_pred)