Description
The AI Text Scraper is a powerful tool designed for developers, Vibe Coders, Product Managers, and Researchers who need to build high-quality text datasets for Retrieval-Augmented Generation (RAG) systems.
Tired of manually cleaning up ads, headers, and other clutter from web articles? This extension automates the entire process, allowing you to turn a list of URLs into clean, ready-to-use .txt files with just one click.
β¨ KEY FEATURES β¨
**Time Save! Bulk & Single Page Scraping**
No more copying and pasting individual files into separate word doc, cleaning the data and saving as a .txt file!
**More Time Save! Intelligent Content Extraction:**
Powered by Mozilla's Readability.js library, the extension intelligently removes ads, banners, and navigation menus to isolate the core article content.
**The Best Time Save! AI-Powered Cleaning:**
Take your data quality to the next level. Connect your own API key to use powerful language models (Google Gemini, OpenAI GPT, or Anthropic Claude) to fix paragraphing, remove duplicate sentences, and eliminate any remaining artifacts.
---
π€ WHO IS THIS FOR? π€
AI Developers, & Vibe Coders: Quickly build and expand knowledge bases for your RAG applications.
Data Scientists: Efficiently gather and preprocess large text corpora for analysis and model training.
Product Managers: Rapidly create a proof-of-concept or MVP for an AI feature by sourcing a clean, initial dataset without needing an engineering team.
.
Researchers & Students: Collect and archive articles and online sources for academic work without the noise.
---
βοΈ **HOW IT WORKS** βοΈ
The extension uses a two-stage process:
1. **Extraction:** It first uses Readability.js to find the main content of a webpage.
2. **AI Cleaning (Optional):** If you enable the AI feature, the extracted text is then sent to your chosen AI provider with a specific prompt to perform a final, high-fidelity cleanup, ensuring the output is perfect for ingestion into a vector database.
Get started in seconds. Configure your settings, paste your URLs, and start building your dataset today!
Reviews
Loading reviews...
Permissions (4)
Permissions
downloadsβΉ Can manage and monitor downloads scriptingβΉ Can inject scripts into web pages storageβΉ Can store data locally in your browser tabsβΉ Can see your open tabs and their URLs
Details
| Version | 1.0 |
| Updated | Oct 30, 2025 |
| Size | 316KiB |
| First Seen | Mar 29, 2026 |
More by oscarcraven
Popular in developer
GoFullPage - Full Page Screen Capture
by GoFullPage
10M
β
4.89
developer
10M
β
4.89
developer
TouchEn PC보μ νμ₯
by λΌμ¨μνμ΄
8M
β
1.33
developer
8M
β
1.33
developer
React Developer Tools
by Meta
5M
β
3.95
developer
5M
β
3.95
developer
Meta Pixel Helper
by Meta
4M
β
3.91
developer
4M
β
3.91
developer
ColorZilla
by colorzilla.com
4M
β
4.59
developer
4M
β
4.59
developer
Popular Extensions
Adobe Acrobat: PDF edit, convert, sign tools
by Adobe Inc.
331M
β
4.40
workflow
331M
β
4.40
workflow
Grammarly: AI Writing Assistant and Grammar Checker App
by Grammarly
42M
β
4.50
communication
42M
β
4.50
communication
Chrome Remote Desktop
by Chrome Remote Desktop Release Managers
38M
β
3.14
workflow
38M
β
3.14
workflow
Microsoft Single Sign On
by Microsoft
36M
β
2.27
workflow
36M
β
2.27
workflow
Cisco Webex Extension
by cisco.chromestore
23M
β
2.34
social
23M
β
2.34
social