How to Categorize Social Media Posts Locally with GLiClass

Run zero-shot text classification on your local machine to categorize social media posts by topic, sentiment, and intent without sending data to external APIs.

Overview

This cookbook demonstrates using the knowledgator/gliclass-edge-v3.0 model for local, privacy-preserving classification of social media content. The edge model is optimized for fast inference on consumer hardware while maintaining high accuracy.

What You'll Learn

Set up GLiClass for local inference
Define category taxonomies for social media content
Classify posts by topic, sentiment, and engagement potential
Handle multi-platform content (Twitter/X, Instagram, LinkedIn, TikTok)
Optimize performance for batch processing
Build a complete local classification pipeline

Prerequisites

Python 3.10+
4GB+ RAM (8GB recommended)
GPU optional but recommended for batch processing
Sample social media posts for testing

Use Cases

Content moderation and filtering
Social media analytics dashboards
Trend detection and monitoring
Influencer content analysis
Brand mention categorization
Competitor content tracking

Why Run Locally?

Running classification locally offers several advantages:

Benefit	Description
Privacy	Sensitive social data never leaves your infrastructure
Cost	No per-request API fees for high-volume processing
Latency	Sub-100ms inference without network round-trips
Offline	Works without internet connectivity
Control	Full control over model versions and updates

The GLiClass Edge Model

The knowledgator/gliclass-edge-v3.0 model is a compact zero-shot classifier optimized for edge deployment:

Specification	Value
Model size	~200MB
Inference speed	~50ms per text (CPU)
Max sequence length	512 tokens
Zero-shot	Yes (no training required)
Multilingual	English primary, partial multilingual support

Installation

Install Dependencies

pip install gliclass torch transformers

For GPU Acceleration (Optional)

# CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

Verify Installation

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

# Load model and tokenizer (downloads on first run)
model = GLiClassModel.from_pretrained("knowledgator/gliclass-edge-v3.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-edge-v3.0")

# Create classification pipeline
pipeline = ZeroShotClassificationPipeline(
    model, tokenizer,
    classification_type='multi-label',
    device='cpu'
)

print("Model loaded successfully!")

# Quick test
text = "Just shipped a new feature! So excited to share it with everyone."
labels = ["announcement", "question", "complaint", "casual conversation"]

results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(f"{result['label']} => {result['score']:.3f}")

Topic Categories

TOPIC_LABELS = [
    "technology and software",
    "business and entrepreneurship",
    "health and fitness",
    "food and cooking",
    "travel and adventure",
    "fashion and beauty",
    "entertainment and pop culture",
    "sports",
    "politics and news",
    "education and learning",
    "art and creativity",
    "personal life update",
    "humor and memes",
    "motivational and inspirational"
]

Content Type Categories

CONTENT_TYPE_LABELS = [
    "product announcement or launch",
    "promotional content or advertisement",
    "educational tutorial or how-to",
    "opinion or commentary",
    "question asking for advice",
    "personal story or experience",
    "news or current events",
    "behind-the-scenes content",
    "user-generated testimonial",
    "engagement bait or poll",
    "meme or humorous content",
    "inspirational quote or message"
]

Sentiment Categories

SENTIMENT_LABELS = [
    "very positive and enthusiastic",
    "positive and satisfied",
    "neutral or informational",
    "negative or disappointed",
    "angry or frustrated",
    "sarcastic or ironic"
]

Engagement Intent Categories

ENGAGEMENT_LABELS = [
    "seeking likes and shares",
    "starting a discussion",
    "asking for help or advice",
    "sharing information",
    "promoting a product or service",
    "building personal brand",
    "networking and connecting",
    "entertainment only"
]

Step 2: Basic Classification

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

# Initialize model and pipeline
model = GLiClassModel.from_pretrained("knowledgator/gliclass-edge-v3.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-edge-v3.0")

pipeline = ZeroShotClassificationPipeline(
    model, tokenizer,
    classification_type='multi-label',
    device='cpu'
)


def classify_post(text: str, labels: list[str], threshold: float = 0.3) -> dict:
    """
    Classify a social media post against given labels.

    Args:
        text: The post content
        labels: List of category labels
        threshold: Minimum confidence score

    Returns:
        Dictionary of label -> score for labels above threshold
    """
    results = pipeline(text, labels, threshold=threshold)[0]
    return {r['label']: r['score'] for r in results}


# Example post
post = """
🚀 Finally launched my new SaaS product after 6 months of building!
It helps small businesses automate their invoicing.
Would love your feedback - link in bio!
#startup #entrepreneurship #buildinpublic
"""

# Classify by topic
topic = classify_post(post, TOPIC_LABELS)
print("Topic:", topic)
# Output: {'business and entrepreneurship': 0.92, 'technology and software': 0.71}

# Classify by content type
content_type = classify_post(post, CONTENT_TYPE_LABELS)
print("Content Type:", content_type)
# Output: {'product announcement or launch': 0.95, 'promotional content or advertisement': 0.68}

# Classify sentiment
sentiment = classify_post(post, SENTIMENT_LABELS)
print("Sentiment:", sentiment)
# Output: {'very positive and enthusiastic': 0.89}

Step 3: Multi-Label Classification

Social media posts often belong to multiple categories:

def classify_multi_label(
    text: str,
    labels: list[str],
    threshold: float = 0.4,
    max_labels: int = 3
) -> list[tuple[str, float]]:
    """
    Return multiple matching labels sorted by confidence.
    """
    results = pipeline(text, labels, threshold=threshold)[0]

    # Sort by score descending
    sorted_results = sorted(
        results,
        key=lambda x: x['score'],
        reverse=True
    )

    return [(r['label'], r['score']) for r in sorted_results[:max_labels]]


# Example: Post with multiple topics
post = """
Made this amazing avocado toast while watching the game.
Perfect Sunday vibes! Recipe in comments 🥑⚽
"""

labels = classify_multi_label(post, TOPIC_LABELS)
for label, score in labels:
    print(f"  {label}: {score:.2f}")

# Output:
#   food and cooking: 0.87
#   sports: 0.62
#   personal life update: 0.58

Step 4: Platform-Specific Classification

Different platforms have different content styles:

from dataclasses import dataclass
from enum import Enum

class Platform(Enum):
    TWITTER = "twitter"
    INSTAGRAM = "instagram"
    LINKEDIN = "linkedin"
    TIKTOK = "tiktok"
    FACEBOOK = "facebook"

# Platform-specific labels
PLATFORM_LABELS = {
    Platform.TWITTER: [
        "hot take or opinion",
        "thread or long-form content",
        "news commentary",
        "viral moment reaction",
        "community engagement",
        "self-promotion",
        "humor or shitpost"
    ],
    Platform.INSTAGRAM: [
        "lifestyle showcase",
        "product feature",
        "behind-the-scenes",
        "aesthetic or mood post",
        "story highlight",
        "collaboration or partnership",
        "user-generated content repost"
    ],
    Platform.LINKEDIN: [
        "career update or announcement",
        "thought leadership",
        "industry insight",
        "job opportunity",
        "company news",
        "professional achievement",
        "networking request",
        "motivational content"
    ],
    Platform.TIKTOK: [
        "trend participation",
        "tutorial or how-to",
        "comedy skit",
        "storytime",
        "product review",
        "dance or music content",
        "day in my life",
        "duet or stitch response"
    ]
}

@dataclass
class SocialPost:
    """Represents a social media post with metadata."""
    text: str
    platform: Platform
    author: str = ""
    hashtags: list[str] = None
    mentions: list[str] = None

    def __post_init__(self):
        self.hashtags = self.hashtags or []
        self.mentions = self.mentions or []


def classify_by_platform(post: SocialPost, threshold: float = 0.4) -> dict:
    """Classify post using platform-specific categories."""
    platform_labels = PLATFORM_LABELS.get(post.platform, TOPIC_LABELS)

    results = pipeline(post.text, platform_labels, threshold=threshold)[0]
    categories = {r['label']: r['score'] for r in results}

    return {
        "platform": post.platform.value,
        "categories": categories,
        "primary_category": max(categories.items(), key=lambda x: x[1])[0] if categories else "uncategorized"
    }


# Example usage
linkedin_post = SocialPost(
    text="""
    Excited to announce I've joined Acme Corp as Senior Engineer!
    After 5 years at my previous role, I'm ready for this new challenge.
    Grateful for everyone who supported me on this journey.
    #newjob #career #grateful
    """,
    platform=Platform.LINKEDIN,
    author="jane_doe",
    hashtags=["newjob", "career", "grateful"]
)

result = classify_by_platform(linkedin_post)
print(f"Platform: {result['platform']}")
print(f"Primary: {result['primary_category']}")
print(f"All categories: {result['categories']}")

Step 5: Complete Classification Pipeline

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import json

@dataclass
class ClassificationResult:
    """Complete classification output for a social media post."""
    post_id: str
    text: str
    platform: str
    topic: dict
    content_type: dict
    sentiment: dict
    engagement_intent: dict
    primary_topic: str
    primary_sentiment: str
    is_promotional: bool
    classified_at: str = field(default_factory=lambda: datetime.utcnow().isoformat())

    def to_dict(self) -> dict:
        return {
            "post_id": self.post_id,
            "text": self.text[:100] + "..." if len(self.text) > 100 else self.text,
            "platform": self.platform,
            "primary_topic": self.primary_topic,
            "primary_sentiment": self.primary_sentiment,
            "is_promotional": self.is_promotional,
            "topic_scores": self.topic,
            "sentiment_scores": self.sentiment,
            "content_type_scores": self.content_type,
            "engagement_intent_scores": self.engagement_intent,
            "classified_at": self.classified_at
        }


class SocialMediaClassifier:
    """
    Complete local classification pipeline for social media posts.
    """

    def __init__(
        self,
        model_name: str = "knowledgator/gliclass-edge-v3.0",
        device: str = "cpu",
        threshold: float = 0.4,
        classification_type: str = "multi-label"
    ):
        self.model = GLiClassModel.from_pretrained(model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.threshold = threshold

        self.pipeline = ZeroShotClassificationPipeline(
            self.model,
            self.tokenizer,
            classification_type=classification_type,
            device=device
        )

    def _classify(self, text: str, labels: list[str]) -> dict:
        """Internal classification method."""
        results = self.pipeline(text, labels, threshold=self.threshold)[0]
        return {r['label']: r['score'] for r in results}

    def classify(
        self,
        post_id: str,
        text: str,
        platform: str = "unknown"
    ) -> ClassificationResult:
        """Perform complete classification on a single post."""

        # Topic classification
        topic_result = self._classify(text, TOPIC_LABELS)

        # Content type classification
        content_type_result = self._classify(text, CONTENT_TYPE_LABELS)

        # Sentiment classification
        sentiment_result = self._classify(text, SENTIMENT_LABELS)

        # Engagement intent classification
        engagement_result = self._classify(text, ENGAGEMENT_LABELS)

        # Determine primary classifications
        primary_topic = max(topic_result.items(), key=lambda x: x[1])[0] if topic_result else "uncategorized"
        primary_sentiment = max(sentiment_result.items(), key=lambda x: x[1])[0] if sentiment_result else "neutral"

        # Check if promotional
        promotional_indicators = ["promotional content", "promoting a product", "product announcement"]
        is_promotional = any(
            any(ind in label.lower() for ind in promotional_indicators)
            for label in list(content_type_result.keys()) + list(engagement_result.keys())
        )

        return ClassificationResult(
            post_id=post_id,
            text=text,
            platform=platform,
            topic=topic_result,
            content_type=content_type_result,
            sentiment=sentiment_result,
            engagement_intent=engagement_result,
            primary_topic=primary_topic,
            primary_sentiment=primary_sentiment,
            is_promotional=is_promotional
        )

    def classify_batch(
        self,
        posts: list[dict],
        text_field: str = "text",
        id_field: str = "id",
        platform_field: str = "platform"
    ) -> list[ClassificationResult]:
        """Classify multiple posts."""
        results = []

        for post in posts:
            result = self.classify(
                post_id=str(post.get(id_field, "")),
                text=post[text_field],
                platform=post.get(platform_field, "unknown")
            )
            results.append(result)

        return results

    def get_summary(self, results: list[ClassificationResult]) -> dict:
        """Generate summary statistics from classification results."""
        topic_counts = {}
        sentiment_counts = {}
        promotional_count = 0

        for result in results:
            # Count topics
            topic_counts[result.primary_topic] = topic_counts.get(result.primary_topic, 0) + 1

            # Count sentiments
            sentiment_counts[result.primary_sentiment] = sentiment_counts.get(result.primary_sentiment, 0) + 1

            # Count promotional
            if result.is_promotional:
                promotional_count += 1

        return {
            "total_posts": len(results),
            "topic_distribution": topic_counts,
            "sentiment_distribution": sentiment_counts,
            "promotional_percentage": round(promotional_count / len(results) * 100, 1) if results else 0
        }

Step 6: Usage Examples

Basic Usage

# Initialize classifier
classifier = SocialMediaClassifier(device="cpu")

# Single post classification
post_text = """
Just discovered this amazing coffee shop in downtown!
The latte art is incredible and the vibes are immaculate ☕✨
Definitely my new favorite spot. Who else loves finding hidden gems?
"""

result = classifier.classify(
    post_id="post_001",
    text=post_text,
    platform="instagram"
)

print(f"Topic: {result.primary_topic}")
print(f"Sentiment: {result.primary_sentiment}")
print(f"Is Promotional: {result.is_promotional}")
print(f"All topics: {result.topic}")

Batch Processing

# Sample posts from different platforms
posts = [
    {
        "id": "tw_001",
        "text": "Hot take: tabs are better than spaces. Fight me.",
        "platform": "twitter"
    },
    {
        "id": "ig_001",
        "text": "Morning routine 🌅 5am wake up, meditation, cold shower, gym. Discipline = freedom",
        "platform": "instagram"
    },
    {
        "id": "li_001",
        "text": "Excited to share that our team just closed a $10M Series A! Thank you to all our investors and supporters.",
        "platform": "linkedin"
    },
    {
        "id": "tw_002",
        "text": "Anyone else's code working perfectly locally but failing in production? Just me? 🙃",
        "platform": "twitter"
    },
    {
        "id": "ig_002",
        "text": "Ad: Use code SUMMER20 for 20% off my new ebook! Link in bio 📚",
        "platform": "instagram"
    }
]

# Classify all posts
results = classifier.classify_batch(posts)

# Print results
for result in results:
    print(f"\n{result.post_id} ({result.platform}):")
    print(f"  Topic: {result.primary_topic}")
    print(f"  Sentiment: {result.primary_sentiment}")
    print(f"  Promotional: {'Yes' if result.is_promotional else 'No'}")

# Get summary statistics
summary = classifier.get_summary(results)
print(f"\n--- Summary ---")
print(f"Total posts: {summary['total_posts']}")
print(f"Promotional: {summary['promotional_percentage']}%")
print(f"Topics: {summary['topic_distribution']}")

Step 7: Performance Optimization

GPU Acceleration

import torch

# Check GPU availability
if torch.cuda.is_available():
    print(f"GPU available: {torch.cuda.get_device_name(0)}")
    classifier = SocialMediaClassifier(device="cuda:0")
else:
    print("Running on CPU")
    classifier = SocialMediaClassifier(device="cpu")

Batch Processing with Progress

from tqdm import tqdm

def classify_with_progress(
    classifier: SocialMediaClassifier,
    posts: list[dict],
    batch_size: int = 100
) -> list[ClassificationResult]:
    """Classify posts with progress bar."""
    results = []

    for i in tqdm(range(0, len(posts), batch_size), desc="Classifying"):
        batch = posts[i:i + batch_size]
        batch_results = classifier.classify_batch(batch)
        results.extend(batch_results)

    return results

Memory-Efficient Processing

def classify_stream(
    classifier: SocialMediaClassifier,
    post_iterator,
    output_file: str
):
    """
    Process posts as a stream, writing results immediately.
    Memory-efficient for very large datasets.
    """
    with open(output_file, 'w') as f:
        f.write('[\n')
        first = True

        for post in post_iterator:
            result = classifier.classify(
                post_id=post['id'],
                text=post['text'],
                platform=post.get('platform', 'unknown')
            )

            if not first:
                f.write(',\n')
            first = False

            json.dump(result.to_dict(), f, indent=2)

        f.write('\n]')


# Usage with file-based iteration
def read_posts_from_file(filepath: str):
    """Generator to read posts line by line."""
    with open(filepath, 'r') as f:
        for line in f:
            yield json.loads(line)

# Process large dataset
# classify_stream(classifier, read_posts_from_file("posts.jsonl"), "results.json")

Caching for Repeated Classifications

import hashlib

class CachedClassifier(SocialMediaClassifier):
    """Classifier with result caching for repeated texts."""

    def __init__(self, cache_size: int = 10000, **kwargs):
        super().__init__(**kwargs)
        self.cache_size = cache_size
        self._cache = {}

    def _get_cache_key(self, text: str, labels: tuple) -> str:
        """Generate cache key from text and labels."""
        content = f"{text}:{':'.join(sorted(labels))}"
        return hashlib.md5(content.encode()).hexdigest()

    def _classify_cached(
        self,
        text: str,
        labels: list[str]
    ) -> dict:
        """Classify with caching."""
        cache_key = self._get_cache_key(text, tuple(labels))

        if cache_key in self._cache:
            return self._cache[cache_key]

        result = self._classify(text, labels)

        # Maintain cache size
        if len(self._cache) >= self.cache_size:
            # Remove oldest entry
            self._cache.pop(next(iter(self._cache)))

        self._cache[cache_key] = result
        return result

Step 8: Content Moderation Example

MODERATION_LABELS = [
    "spam or scam content",
    "hate speech or discrimination",
    "harassment or bullying",
    "misinformation or fake news",
    "adult or explicit content",
    "violence or threats",
    "self-harm or dangerous content",
    "safe and appropriate content"
]

def moderate_content(
    classifier: SocialMediaClassifier,
    text: str,
    threshold: float = 0.5
) -> dict:
    """
    Check post for policy violations.
    """
    results = classifier.pipeline(text, MODERATION_LABELS, threshold=threshold)[0]
    result_dict = {r['label']: r['score'] for r in results}

    # Determine if content is safe
    safe_label = "safe and appropriate content"
    is_safe = safe_label in result_dict and result_dict[safe_label] > 0.6

    # Get violations (excluding safe label)
    violations = {
        k: v for k, v in result_dict.items()
        if k != safe_label and v > threshold
    }

    return {
        "is_safe": is_safe and not violations,
        "violations": violations,
        "requires_review": bool(violations),
        "confidence": result_dict.get(safe_label, 0.0)
    }


# Example
classifier = SocialMediaClassifier(device="cpu")
suspicious_post = "Make $10,000 from home! DM me for details! 💰🔥"
moderation_result = moderate_content(classifier, suspicious_post)

print(f"Safe: {moderation_result['is_safe']}")
print(f"Violations: {moderation_result['violations']}")
# Output:
# Safe: False
# Violations: {'spam or scam content': 0.89}

Step 9: Export and Integration

Export to CSV

import csv

def export_to_csv(results: list[ClassificationResult], filepath: str):
    """Export classification results to CSV."""
    with open(filepath, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([
            'post_id', 'platform', 'primary_topic', 'primary_sentiment',
            'is_promotional', 'text_preview', 'classified_at'
        ])

        for result in results:
            writer.writerow([
                result.post_id,
                result.platform,
                result.primary_topic,
                result.primary_sentiment,
                result.is_promotional,
                result.text[:50] + "..." if len(result.text) > 50 else result.text,
                result.classified_at
            ])

Export to JSON Lines

def export_to_jsonl(results: list[ClassificationResult], filepath: str):
    """Export results to JSON Lines format."""
    with open(filepath, 'w') as f:
        for result in results:
            f.write(json.dumps(result.to_dict()) + '\n')

Webhook Integration

import requests

def send_to_webhook(
    result: ClassificationResult,
    webhook_url: str,
    filter_promotional: bool = False
):
    """Send classification result to webhook."""
    if filter_promotional and not result.is_promotional:
        return None

    payload = {
        "event": "post_classified",
        "data": result.to_dict()
    }

    response = requests.post(webhook_url, json=payload)
    return response.status_code == 200

Best Practices

Choose appropriate thresholds: Start with 0.4 for multi-label scenarios, increase to 0.6+ for single-label precision
Use descriptive labels: "product announcement or launch" works better than just "announcement"

Preprocess text: Remove URLs, excessive emojis, and hashtags if they add noise

import re
def clean_post(text: str) -> str:
    text = re.sub(r'http\S+', '', text)  # Remove URLs
    text = re.sub(r'#\w+', '', text)      # Remove hashtags
    return text.strip()

Batch for throughput: Process posts in batches of 50-100 for optimal GPU utilization
Cache repeated content: Social media often has duplicate or near-duplicate posts
Monitor model drift: Periodically validate classifications against human labels
Handle edge cases: Very short posts (fewer than 10 words) may have lower accuracy; consider flagging for review

Troubleshooting

Issue	Solution
Out of memory	Reduce batch size, use CPU, or enable gradient checkpointing
Slow inference	Use GPU, reduce labels per request, enable caching
Low accuracy	Use more descriptive labels, lower threshold, preprocess text
Model download fails	Check internet connection, set `HF_HOME` for custom cache location

Next Steps

PII Detection and Redaction — Remove personal data before analysis
Customer Intent Classification — Apply similar techniques to support tickets
Financial Spam Detection — Train custom classifiers with GLiClass

Overview​

What You'll Learn​

Prerequisites​

Use Cases​

Why Run Locally?​

The GLiClass Edge Model​

Installation​

Install Dependencies​

For GPU Acceleration (Optional)​

Verify Installation​

Step 1: Define Social Media Categories​

Topic Categories​

Content Type Categories​

Sentiment Categories​

Engagement Intent Categories​

Step 2: Basic Classification​

Step 3: Multi-Label Classification​

Step 4: Platform-Specific Classification​

Step 5: Complete Classification Pipeline​

Step 6: Usage Examples​

Basic Usage​

Batch Processing​

Step 7: Performance Optimization​

GPU Acceleration​

Batch Processing with Progress​

Memory-Efficient Processing​

Caching for Repeated Classifications​

Step 8: Content Moderation Example​

Step 9: Export and Integration​

Export to CSV​

Export to JSON Lines​

Webhook Integration​

Best Practices​

Troubleshooting​

Next Steps​