Detecting Fake News with Deep Learning: A Keras-Based Text Classification Approach

Tensorflow
Pandas
NumPy
NLTK
Author

Emil Blaignan

Published

March 15, 2025

Introduction

In this project, we’ll attempt to classify news articles as legitimate or fake news. This project uses deep learning techniques in Keras to classify news articles as real or fake based on their titles and content. The model pipeline includes text preprocessing with NLTK, text vectorization, and word embeddings to extract meaningful linguistic patterns. We’ll build and train three neural network architectures: one using only article titles, another using only article text, and a third combining both for improved accuracy.

Preprocessing

In this section, we’ll import and process the text to prepare the data for our models.

Setup & Data Acquisition

First, we’ll need to import the necessary libraries and load our dataset:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import string
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import TextVectorization
from nltk.corpus import stopwords
import nltk

# Download NLTK stopwords
nltk.download('stopwords', quiet = True)

# Define our training data URL
train_url = "https://raw.githubusercontent.com/pic16b-ucla/25W/refs/heads/main/datasets/fake_news_train.csv"

# Read the data
train = pd.read_csv(train_url)

# Display the first few rows to understand the data
train.head()
Unnamed: 0 title text fake
0 17366 Merkel: Strong result for Austria's FPO 'big c... German Chancellor Angela Merkel said on Monday... 0
1 5634 Trump says Pence will lead voter fraud panel WEST PALM BEACH, Fla.President Donald Trump sa... 0
2 17487 JUST IN: SUSPECTED LEAKER and “Close Confidant... On December 5, 2017, Circa s Sara Carter warne... 1
3 12217 Thyssenkrupp has offered help to Argentina ove... Germany s Thyssenkrupp, has offered assistance... 0
4 5535 Trump say appeals court decision on travel ban... President Donald Trump on Thursday called the ... 0

Our dataset contains article titles, the full text content, and a binary label indicating whether each article is fake news (1) or real news (0).

Data Processing

Next, we’ll create a function, make_dataset(), to process our data by removing stopwords and creating a TensorFlow dataset. The function will do three things:

  1. Change the text to lowercase: .lower().
  2. Remove stopwords (i.e. “the,” “and,” or “but”) from text and title using NLTK
  3. Construct and return a tf.data.Dataset with two inputs (title, text) and one output (fake). The dataset is also batched with size 100.
def make_dataset(df):
    # Convert to lowercase
    df['title'] = df['title'].str.lower()
    df['text'] = df['text'].str.lower()
    
    # Get stopwords from NLTK
    stop_words = set(stopwords.words('english'))
    
    # Function to remove stopwords
    def remove_stopwords(text):
        if isinstance(text, str):
            words = text.split()
            filtered_words = [word for word in words if word not in stop_words]
            return ' '.join(filtered_words)
        return ""
    
    # Apply stopword removal
    df['title'] = df['title'].apply(remove_stopwords)
    df['text'] = df['text'].apply(remove_stopwords)
    
    # Create dataset with multiple inputs
    titles = df['title'].values
    texts = df['text'].values
    labels = df['fake'].values
    
    # Create dictionary datasets for multiple inputs
    dataset = tf.data.Dataset.from_tensor_slices(
        ({"title": titles, "text": texts}, labels)
    )
    
    return dataset.batch(100)  # Batch size of 100

# Process the training data
dataset = make_dataset(train)

Split Training and Validation Data

Next, we’ll split our data into training and validation sets using a 80/20 split:

# Calculate the size of the training set (80% of the data)
train_size = int(0.8 * len(train))
val_size = len(train) - train_size

# Split the dataset
train_dataset = dataset.take(train_size//100)  # Divide by batch size
val_dataset = dataset.skip(train_size//100)

# Calculate base rate
base_rate = train['fake'].mean() * 100
print(f"Base rate of fake news: {base_rate:.2f}%")
Base rate of fake news: 52.30%

The base rate tells us what percentage of our dataset consists of fake news. This gives us a baseline accuracy to compare our models against.

Text Vectorization Layer

Now we’ll create a text vectorization layer that will convert our text data into numerical vectors:

# Preparing text vectorization layer for tf model
size_vocabulary = 2000

def standardization(input_data):
    lowercase = tf.strings.lower(input_data)
    no_punctuation = tf.strings.regex_replace(lowercase, 
                                 '[%s]' % re.escape(string.punctuation), '')
    return no_punctuation 

# Create a shared vectorization layer for both title and text
vectorize_layer = TextVectorization(
    standardize=standardization,
    max_tokens=size_vocabulary,  # only consider this many words (2000)
    output_mode='int',
    output_sequence_length=500) 

# Create text-only dataset for adaptation
text_dataset = train_dataset.unbatch().map(lambda x, y: x["text"])
# Create title-only dataset for adaptation
title_dataset = train_dataset.unbatch().map(lambda x, y: x["title"])

# Combine title and text datasets for adaptation
combined_texts = text_dataset.concatenate(title_dataset)
vectorize_layer.adapt(combined_texts)

The text vectorization layer transforms our text data into numerical representations that our neural network can process.

Creating Models

We’ll create three different models (title only, text only, both) to compare their performance:

Title-Only Model

We’ll start with the title-only model, which classifies articles as Real or Fake based on the words in headlines:

# Title-only model
def create_title_model():
    # Input layer for title
    title_input = keras.Input(shape=(1,), dtype=tf.string, name="title")
    
    # Vectorize the title
    title_vectorized = vectorize_layer(title_input)
    
    # Embedding layer
    embedding = layers.Embedding(size_vocabulary, 10, name="embedding")(title_vectorized)
    
    # Add dropout for regularization
    x = layers.Dropout(0.2)(embedding)
    
    # Global pooling layer
    x = layers.GlobalAveragePooling1D()(x)
    
    # Add another dropout layer
    x = layers.Dropout(0.2)(x)
    
    # Output layer
    output = layers.Dense(1, activation="sigmoid")(x)
    
    # Create the model
    model = keras.Model(inputs=title_input, outputs=output, name="title_model")
    
    # Compile the model
    model.compile(
        loss="binary_crossentropy",
        optimizer="adam",
        metrics=["accuracy"]
    )
    
    return model

Let’s create the model and summary:

# Creatd title model
title_model = create_title_model()

title_model.summary()
Model: "title_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ title (InputLayer)              │ (None, 1)              │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ text_vectorization              │ (None, 500)            │             0 │
│ (TextVectorization)             │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ embedding (Embedding)           │ (None, 500, 10)        │        20,000 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 500, 10)        │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling1d        │ (None, 10)             │             0 │
│ (GlobalAveragePooling1D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 10)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 1)              │            11 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 20,011 (78.17 KB)
 Trainable params: 20,011 (78.17 KB)
 Non-trainable params: 0 (0.00 B)

Now, we’re ready to train the title-only model:

# Train title model
title_history = title_model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=10,
    verbose=1
)
Epoch 1/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7679 - loss: 0.5436 - val_accuracy: 0.7857 - val_loss: 0.5237
Epoch 2/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7705 - loss: 0.5262 - val_accuracy: 0.7791 - val_loss: 0.5069
Epoch 3/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7736 - loss: 0.5103 - val_accuracy: 0.7887 - val_loss: 0.4933
Epoch 4/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7833 - loss: 0.4962 - val_accuracy: 0.7991 - val_loss: 0.4821
Epoch 5/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7843 - loss: 0.4834 - val_accuracy: 0.7927 - val_loss: 0.4692
Epoch 6/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7882 - loss: 0.4755 - val_accuracy: 0.7918 - val_loss: 0.4593
Epoch 7/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7915 - loss: 0.4645 - val_accuracy: 0.8022 - val_loss: 0.4503
Epoch 8/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7971 - loss: 0.4549 - val_accuracy: 0.8033 - val_loss: 0.4417
Epoch 9/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8015 - loss: 0.4461 - val_accuracy: 0.8079 - val_loss: 0.4338
Epoch 10/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8031 - loss: 0.4395 - val_accuracy: 0.8074 - val_loss: 0.4267
# Visualize the model
from tensorflow.keras.utils import plot_model
utils.plot_model(title_model, "title_model.png", 
                 show_shapes=True,
                 show_layer_names=True)

Text-Only Model

Next, we’ll build and run the text-only model:

# Text-only model
def create_text_model():
    # Input layer for text
    text_input = keras.Input(shape=(1,), dtype=tf.string, name="text")
    
    # Vectorize the text
    text_vectorized = vectorize_layer(text_input)
    
    # Embedding layer
    embedding = layers.Embedding(size_vocabulary, 10, name="embedding")(text_vectorized)
    
    # Add dropout for regularization
    x = layers.Dropout(0.2)(embedding)
    
    # Global pooling layer
    x = layers.GlobalAveragePooling1D()(x)
    
    # Add another dropout layer
    x = layers.Dropout(0.2)(x)
    
    # Output layer
    output = layers.Dense(1, activation="sigmoid")(x)
    
    # Create the model
    model = keras.Model(inputs=text_input, outputs=output, name="text_model")
    
    # Compile the model
    model.compile(
        loss="binary_crossentropy",
        optimizer="adam",
        metrics=["accuracy"]
    )
    
    return model

Let’s create the model and summary:

# Create text model
text_model = create_text_model()

text_model.summary()
Model: "text_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ text (InputLayer)               │ (None, 1)              │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ text_vectorization_8            │ (None, 500)            │             0 │
│ (TextVectorization)             │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ embedding (Embedding)           │ (None, 500, 10)        │        20,000 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 500, 10)        │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling1d_1      │ (None, 10)             │             0 │
│ (GlobalAveragePooling1D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_3 (Dropout)             │ (None, 10)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            11 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 20,011 (78.17 KB)
 Trainable params: 20,011 (78.17 KB)
 Non-trainable params: 0 (0.00 B)

Now we’re ready to train the model:

# Train text model
text_history = text_model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=10,
    verbose=1
)
Epoch 1/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.5584 - loss: 0.6785 - val_accuracy: 0.6753 - val_loss: 0.6213
Epoch 2/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.8073 - loss: 0.5964 - val_accuracy: 0.8617 - val_loss: 0.5128
Epoch 3/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.8938 - loss: 0.4895 - val_accuracy: 0.9235 - val_loss: 0.4160
Epoch 4/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9166 - loss: 0.4013 - val_accuracy: 0.9453 - val_loss: 0.3466
Epoch 5/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9281 - loss: 0.3396 - val_accuracy: 0.9486 - val_loss: 0.2986
Epoch 6/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9356 - loss: 0.2969 - val_accuracy: 0.9523 - val_loss: 0.2640
Epoch 7/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9412 - loss: 0.2661 - val_accuracy: 0.9536 - val_loss: 0.2383
Epoch 8/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9448 - loss: 0.2430 - val_accuracy: 0.9587 - val_loss: 0.2182
Epoch 9/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9478 - loss: 0.2254 - val_accuracy: 0.9628 - val_loss: 0.2024
Epoch 10/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9510 - loss: 0.2091 - val_accuracy: 0.9644 - val_loss: 0.1893
# Visualize the model
utils.plot_model(text_model, "text_model.png", 
                 show_shapes=True,
                 show_layer_names=True)

Combined Model (Title and Text)

Finally, we’ll build and run the combined model:

# Combined model (title and text)
def create_combined_model():
    # Input layers
    title_input = keras.Input(shape=(1,), dtype=tf.string, name="title")
    text_input = keras.Input(shape=(1,), dtype=tf.string, name="text")
    
    # Vectorize inputs using the same vectorization layer
    title_vectorized = vectorize_layer(title_input)
    text_vectorized = vectorize_layer(text_input)
    
    # Shared embedding layer
    embedding_layer = layers.Embedding(size_vocabulary, 10, name="embedding")
    
    # Apply embedding to both inputs
    title_embedding = embedding_layer(title_vectorized)
    text_embedding = embedding_layer(text_vectorized)
    
    # Process title branch
    title_x = layers.Dropout(0.2)(title_embedding)
    title_x = layers.GlobalAveragePooling1D()(title_x)
    title_x = layers.Dropout(0.2)(title_x)
    
    # Process text branch
    text_x = layers.Dropout(0.2)(text_embedding)
    text_x = layers.GlobalAveragePooling1D()(text_x)
    text_x = layers.Dropout(0.2)(text_x)
    
    # Concatenate the two branches
    concatenated = layers.concatenate([title_x, text_x], name="concatenate")
    
    # Add a dense layer
    x = layers.Dense(32, activation="relu")(concatenated)
    x = layers.Dropout(0.2)(x)
    
    # Output layer
    output = layers.Dense(1, activation="sigmoid")(x)
    
    # Create the model
    model = keras.Model(inputs=[title_input, text_input], outputs=output, name="combined_model")
    
    # Compile the model
    model.compile(
        loss="binary_crossentropy",
        optimizer="adam",
        metrics=["accuracy"]
    )
    
    return model

Let’s create the model and summary:

# Create combined model
combined_model = create_combined_model()

combined_model.summary()
Model: "combined_model"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)         Output Shape          Param #  Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ title (InputLayer)  │ (None, 1)         │          0 │ -                 │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ text (InputLayer)   │ (None, 1)         │          0 │ -                 │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ text_vectorization… │ (None, 500)       │          0 │ title[0][0],      │
│ (TextVectorization) │                   │            │ text[0][0]        │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ embedding           │ (None, 500, 10)   │     20,000 │ text_vectorizati… │
│ (Embedding)         │                   │            │ text_vectorizati… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_4 (Dropout) │ (None, 500, 10)   │          0 │ embedding[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_6 (Dropout) │ (None, 500, 10)   │          0 │ embedding[1][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 10)        │          0 │ dropout_4[0][0]   │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 10)        │          0 │ dropout_6[0][0]   │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_5 (Dropout) │ (None, 10)        │          0 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_7 (Dropout) │ (None, 10)        │          0 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate         │ (None, 20)        │          0 │ dropout_5[0][0],  │
│ (Concatenate)       │                   │            │ dropout_7[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_2 (Dense)     │ (None, 32)        │        672 │ concatenate[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_8 (Dropout) │ (None, 32)        │          0 │ dense_2[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_3 (Dense)     │ (None, 1)         │         33 │ dropout_8[0][0]   │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 20,705 (80.88 KB)
 Trainable params: 20,705 (80.88 KB)
 Non-trainable params: 0 (0.00 B)

Now, we’re ready to train the model:

# Train combined model
combined_history = combined_model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=10,
    verbose=1
)
Epoch 1/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.5307 - loss: 0.6842 - val_accuracy: 0.9292 - val_loss: 0.5420
Epoch 2/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.8621 - loss: 0.4532 - val_accuracy: 0.9222 - val_loss: 0.2334
Epoch 3/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9280 - loss: 0.2307 - val_accuracy: 0.9132 - val_loss: 0.1904
Epoch 4/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9359 - loss: 0.1826 - val_accuracy: 0.9668 - val_loss: 0.1332
Epoch 5/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9479 - loss: 0.1565 - val_accuracy: 0.9692 - val_loss: 0.1210
Epoch 6/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9591 - loss: 0.1327 - val_accuracy: 0.9716 - val_loss: 0.1081
Epoch 7/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9612 - loss: 0.1217 - val_accuracy: 0.9725 - val_loss: 0.1061
Epoch 8/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9635 - loss: 0.1136 - val_accuracy: 0.9725 - val_loss: 0.0965
Epoch 9/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9666 - loss: 0.1065 - val_accuracy: 0.9736 - val_loss: 0.0920
Epoch 10/10
179/179 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.9703 - loss: 0.0974 - val_accuracy: 0.9734 - val_loss: 0.0917
# Visualize the model
utils.plot_model(combined_model, "combined_model.png", 
                 show_shapes=True,
                 show_layer_names=True)

Training History Visualization

Let’s visualize the training history to compare the performance of our three models:

def plot_history(title_history, text_history, combined_history):
    plt.figure(figsize=(12, 5))
    
    # Accuracy plot
    plt.subplot(1, 2, 1)
    plt.plot(title_history.history['accuracy'], label='Title Model (Train)')
    plt.plot(title_history.history['val_accuracy'], label='Title Model (Val)')
    plt.plot(text_history.history['accuracy'], label='Text Model (Train)')
    plt.plot(text_history.history['val_accuracy'], label='Text Model (Val)')
    plt.plot(combined_history.history['accuracy'], label='Combined Model (Train)')
    plt.plot(combined_history.history['val_accuracy'], label='Combined Model (Val)')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Model Accuracy')
    plt.legend()
    
    # Loss plot
    plt.subplot(1, 2, 2)
    plt.plot(title_history.history['loss'], label='Title Model (Train)')
    plt.plot(title_history.history['val_loss'], label='Title Model (Val)')
    plt.plot(text_history.history['loss'], label='Text Model (Train)')
    plt.plot(text_history.history['val_loss'], label='Text Model (Val)')
    plt.plot(combined_history.history['loss'], label='Combined Model (Train)')
    plt.plot(combined_history.history['val_loss'], label='Combined Model (Val)')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Model Loss')
    plt.legend()
    
    plt.tight_layout()
    plt.savefig('training_history.png')
    plt.show()

# Plot the training history
plot_history(title_history, text_history, combined_history)

Based on the training history, we can observe:

  • The Combined Model achieves the highest validation accuracy at approximately 97%, exceeding our target accuracy.
  • The Text Model performs quite well too, reaching around 96% validation accuracy.
  • The Title Model performs significantly worse, only reaching about 80% accuracy.

This indicates that while article titles provide some signal for fake news detection, the full text contains much more useful information, and using both together delivers the best performance.

Evaluation on Test Data

Let’s evaluate our most performant model on the test dataset:

# Load test data
test_url = "https://raw.githubusercontent.com/pic16b-ucla/25W/refs/heads/main/datasets/fake_news_test.csv"
test = pd.read_csv(test_url)

# Create test dataset
test_dataset = make_dataset(test)

# Evaluate model on the test dataset
combined_result = combined_model.evaluate(test_dataset)

# Print the results
print("Combined Model Test Accuracy: {:.2f}%".format(combined_result[1] * 100))
225/225 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9695 - loss: 0.0996
Combined Model Test Accuracy: 96.96%

Embedding Visualization

Now let’s visualize the word embeddings learned by our best model:

import plotly.graph_objects as go
from sklearn.decomposition import PCA

# Get the embedding layer weights
embedding_layer = combined_model.get_layer("embedding")
weights = embedding_layer.get_weights()[0]

# Get the vocabulary from the vectorization layer
vocab = vectorize_layer.get_vocabulary()

# Use PCA to reduce dimensions for visualization (3D)
pca = PCA(n_components=3)
embeddings_3d = pca.fit_transform(weights)

# Create a DataFrame for plotting
df_embeddings = pd.DataFrame({
    'word': vocab,
    'x': embeddings_3d[:, 0],
    'y': embeddings_3d[:, 1],
    'z': embeddings_3d[:, 2]
})

# Filter out padding and out-of-vocabulary terms
df_plot = df_embeddings[~df_embeddings['word'].isin(['', '[UNK]'])].copy()

# Compute distance from origin for highlighting interesting words
df_plot['distance'] = np.sqrt(df_plot['x']**2 + df_plot['y']**2 + df_plot['z']**2)

# Selected words to highlight, including political, news-related and sentiment words
highlight_words = [
    'trump', 'clinton', 'hillary', 'obama', 'democrats', 'republicans',
    'fake', 'real', 'true', 'false', 'hoax', 'conspiracy', 'news', 'media',
    'war', 'military', 'government'
]

# Create a new column to determine text visibility and point size
df_plot['show_label'] = df_plot['word'].isin(highlight_words)
df_plot['point_size'] = 3  # Default size
df_plot.loc[df_plot['show_label'], 'point_size'] = 6  # Larger for highlighted words

# Create text colors - visible only for highlighted words
text_colors = ['rgba(0,0,0,1)' if show else 'rgba(0,0,0,0)' for show in df_plot['show_label']]

# Calculate the range of each axis for better scaling
x_range = [df_plot['x'].min(), df_plot['x'].max()]
y_range = [df_plot['y'].min(), df_plot['y'].max()]
z_range = [df_plot['z'].min(), df_plot['z'].max()]

# Calculate axis scaling factors to make the plot more compact
# Specifically reduce the y-axis (component 2) range
scale_factor_y = 0.5  # Reduce y-axis scale by 50%

# Create the 3D scatter plot
fig = go.Figure(data=[
    go.Scatter3d(
        x=df_plot['x'],
        y=df_plot['y'] * scale_factor_y,  # Scale down component 2
        z=df_plot['z'],
        mode='markers+text',
        text=df_plot['word'],
        textposition="top center",
        textfont=dict(
            size=10,
            color=text_colors
        ),
        marker=dict(
            size=df_plot['point_size'],
            color=df_plot['distance'],
            colorscale='Viridis',
            opacity=0.8,
        ),
        hoverinfo='text',
        hovertext=df_plot['word']
    )
])

# Update layout with tighter ranges
fig.update_layout(
    title="3D Word Embedding Visualization for Fake News Classification",
    width=900,
    height=700,
    scene=dict(
        xaxis=dict(
            title='PCA Component 1',
            range=[x_range[0] * 1.1, x_range[1] * 1.1]  # Add small margin
        ),
        yaxis=dict(
            title='PCA Component 2',
            range=[y_range[0] * scale_factor_y * 1.1, y_range[1] * scale_factor_y * 1.1]
        ),
        zaxis=dict(
            title='PCA Component 3',
            range=[z_range[0] * 1.1, z_range[1] * 1.1]
        ),
        aspectmode='manual',  # Manual aspect ratio
        aspectratio=dict(x=1, y=0.5, z=1)  # Reduce y-axis aspect ratio
    )
)

# Show the plot
fig.show()

Analysis of Word Embeddings

Scrolling over the word embeddings we can see some trends. Here are some examples:

  • Political Polarization: In the embedding space, we can see that “trump” and “clinton” appear in distinct regions while “hilary” and “obama” remain close together. This might suggest that the model has learned different contextual patterns for how these names appear in news articles. This separation likely reflects the polarized nature of political reporting, where certain linguistic patterns are associated with coverage of different political figures.
  • War: “war” has words surrounding it that naturally seem to be brought up in the same discussions and context such as “targeted,” “campaign,” “missile,” etc.
  • Media References: The word “media” appears in a different region from specific news outlet names, suggesting that generic references to “the media” might be used differently in fake news (often as a collective entity to criticize) compared to specific outlet citations which might be more common in legitimate reporting.
  • Conspiracy-Related Terms: Words like “hoax”, “conspiracy” and related terms form their own cluster, indicating the model has learned to associate these concepts with questionable content. This semantic grouping helps the model identify articles that use language patterns typical of conspiracy theories.
  • Political parties: words like “democrats” and “republicans” are positioned close together as they are likely often discussed together along with words such as “washington” and “administration.”

These embedding patterns help explain why our combined model performs better than either title-only or text-only models. By learning the complex relationships between words in both titles and full text, the model captures rich semantic information about how language is used differently in fake versus real news articles.