Preprocessing Scripts
Three Python scripts run before Eleventy to generate JSON data files.
Overview
preprocess.py (orchestrator)
├── extract_metadata.py → metadata.json
├── generate_indices.py → hierarchy.json
└── aggregate_tasks.py → tasks.json
Location: uu_framework/scripts/
1. extract_metadata.py
Parses all markdown files and extracts metadata.
Input
- All
.mdfiles inclase/ - Excludes paths matching
site.yamlexclude patterns
Processing
-
YAML Frontmatter (lines 34-45)
--- title: "Page Title" type: lesson --- -
Component Markers (lines 60-85)
:::homework{id="A.1" title="Task"} Content here... ::: -
Title Extraction (fallback chain)
- Frontmatter
title - First H1 heading
- Filename
- Frontmatter
Output: metadata.json
{
"a_stack/01_intro/01_concepts.md": {
"path": "clase/a_stack/01_intro/01_concepts.md",
"title": "Conceptos",
"type": "lesson",
"order": 1,
"components": [
{
"type": "homework",
"attrs": {"id": "A.1.1", "title": "..."},
"content_preview": "First 200 chars..."
}
],
"has_frontmatter": true
}
}
2. generate_indices.py
Builds hierarchical tree structure for navigation.
Sort Key Algorithm (lines 25-50)
def get_sort_key(name):
# Returns tuple: (category, number, sub_category, name)
# "01_intro" → (0, 1, 0, '') # Numbered
# "01_a_sub" → (0, 1, 1, 'a') # Sub-section
# "a_stack" → (2, 999, 0, 'a') # Appendix (letter prefix)
Priority:
- Numeric prefixes (00_, 01_, 02_)
- Letter sub-prefixes (a, b)
- Appendix prefixes (a_, b_)
Output: hierarchy.json
{
"name": "clase",
"type": "root",
"children": [
{
"name": "a_stack",
"type": "directory",
"path": "a_stack",
"has_index": true,
"title": "Stack",
"children": [...]
}
]
}
Key Fields
| Field | Description |
|---|---|
name |
Directory/file name |
path |
Relative path from clase/ |
type |
directory or file |
has_index |
Has 00_index.md |
title |
From metadata or derived |
order |
Sort tuple |
children |
Nested items |
3. aggregate_tasks.py
Collects homework, exams, and projects into lists.
Processing
- Reads
metadata.json - Extracts components by type
- Calculates overdue status
- Generates URLs
Output: tasks.json
{
"homework": [
{
"id": "A.1.1",
"title": "Crear cuentas",
"due": "2026-02-01",
"points": null,
"chapter": "Stack",
"file": "a_stack/01_intro/01_cuentas.md",
"url": "/a_stack/01_intro/01_cuentas/",
"summary": "First 100 chars...",
"overdue": false,
"type": "homework"
}
],
"exams": [],
"projects": []
}
Overdue Calculation (lines 28-37)
def is_overdue(due_str):
if not due_str:
return False
try:
due_date = datetime.strptime(due_str, '%Y-%m-%d').date()
return due_date < datetime.now().date()
except:
return False
Running Preprocessing
Via Docker
# Full build (includes preprocessing)
docker compose -f uu_framework/docker/docker-compose.yaml run build
# Preprocessing only
docker compose -f uu_framework/docker/docker-compose.yaml run preprocess
Manual
cd uu_framework
python3 scripts/preprocess.py --content ../clase --output eleventy/_data
Error Handling
Current Behavior
- Missing frontmatter: Falls back to H1 or filename
- Invalid YAML: Silently ignored, returns
{} - Missing files: Warning logged, continues
- Invalid dates: Treated as not overdue
Known Issues
- Bare
except:blocks catch all errors silently - No validation of required component attributes
- No duplicate ID detection
See Troubleshooting for fixes.