pdf Skill for AI Agents
by Anthropic
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. Use when processing, generating, or analyzing PDF documents.
55downloads
Updated
About this Skill
The pdf skill enables AI agents like Claude and ChatGPT to comprehensive pdf manipulation toolkit for extracting text and tables, creating new pdfs, merging/splitting documents, and handling forms. use when processing, generating, or analyzing pdf documents.
SKILLS.md Content
# PDF Processing
Essential PDF processing operations using Python libraries and command-line tools.
## Python Libraries
### pypdf
Basic reading and writing operations:
```python
from pypdf import PdfReader, PdfWriter
# Read PDF
reader = PdfReader("input.pdf")
for page in reader.pages:
text = page.extract_text()
# Write/merge PDFs
writer = PdfWriter()
writer.add_page(page)
writer.write("output.pdf")
```
### pdfplumber
Advanced text and table extraction:
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
tables = page.extract_tables()
```
### reportlab
Create new PDFs from scratch:
```python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("new.pdf", pagesize=letter)
c.drawString(100, 750, "Hello World")
c.save()
```
## Command-Line Tools
### pdftotext
Extract text from PDFs:
```bash
pdftotext input.pdf output.txt
```
### qpdf
Manipulate PDF structure:
```bash
qpdf --split-pages input.pdf output-%d.pdf
```
### pdftk
Merge, split, rotate:
```bash
pdftk file1.pdf file2.pdf cat output merged.pdf
```
## Common Tasks
### OCR for Scanned Documents
Use pytesseract with pdf2image for scanned PDFs.
### Watermarking
Overlay text or images on existing PDFs.
### Image Extraction
Extract embedded images from PDF pages.
### Password Protection
Add or remove PDF encryption.
## Quick Reference
| Task | Tool |
|------|------|
| Extract text | pdfplumber, pdftotext |
| Extract tables | pdfplumber |
| Merge PDFs | pypdf, pdftk |
| Split PDFs | qpdf, pdftk |
| Create PDFs | reportlab |
| Rotate pages | pypdf, pdftk |
| Fill forms | pypdf |
| OCR | pytesseract |