morethanadiagnosis-hub/HANDOFF_WEBSITE_SCRAPER.md

# Handoff: Website Content Scraper & Frontend Implementation

**Date**: November 18, 2025
**From**: Claude (CL)
**To**: Claude Web
**Status**: Website content extracted and ready for frontend implementation

---

## Overview

A Puppeteer-based web scraper has been successfully created to extract and analyze the morethanadiagnosis.org website. All content, structure, navigation, and assets have been captured and saved for frontend replication.

---

## What's Been Completed

### ✅ Web Scraper Created
- **Location**: `/srv/containers/mtad-api/scraper.js`
- **Technology**: Puppeteer (headless browser automation)
- **Purpose**: Dynamically render and extract JavaScript-heavy Wix website content

### ✅ Content Successfully Extracted
- **Output**: `/srv/containers/mtad-api/website_content.json`
- **Format**: Structured JSON with all page elements

### ✅ Data Captured
- **13 Headings** - All H1-H6 elements across the page
- **24 Paragraphs** - Body text and descriptions
- **22 Buttons/CTAs** - Call-to-action elements
- **34 Links** - Navigation and external links
- **15 Images** - Images with alt text and URLs
- **10 Sections** - Major content sections
- **Full text** - Complete rendered page content

---

## Extracted Website Structure

### Navigation Menu
```
Home
├── Podcast
├── Resources
├── Happy Mail
├── Support Group
├── Support Circle
├── The Journal
├── In Loving Memory
├── Connect With Us
└── Shop
```

### Key Pages & Content Areas

#### 1. **Homepage Hero**
- Title: "You are more than a diagnosis"
- Tagline: "Connecting Through Stories, Thriving Through Community"
- Description: Community for folks with chronic illness and those touched by cancer
- CTA: "Join Our Community"

#### 2. **Happy Mail Section**
- Description: Free joy-filled snail mail program
- By: Nerisa (sends to folks navigating cancer/chronic illness)
- Who Can Receive:
  - Cancer diagnosis or treatment
  - Chronic illness or rare disease
  - Medical limbo or recovery
- CTA: "Order Happy Mail"

#### 3. **Support & Community**
- Connect Section with quote: "We're here to create a safe, supportive space..."
- Support Circle Login
- Featured Stories: Jes & Den's journey with cancer/FAP

#### 4. **Podcast**
- "More Than A Diagnosis" podcast
- Hosts: Jes and Den
- Content: Real stories about life beyond medical diagnosis

#### 5. **Resources**
- Curated list of helpful resources for diagnosis navigation
- Financial guidance
- Support materials
- Regularly updated

#### 6. **Wings of Remembrance**
- Memorial tribute section
- Honor those who shaped the journey

#### 7. **Shop Products**
Products with purpose-driven stories:
- **"Worst Club Best Members" Shirt** - Inspired by Nerisa's Happy Mail program
  - Features duck with tattooed cancer ribbon
  - Celebrates community resilience

- **"More Than A Diagnosis" Shirt** - Reminder of strength beyond diagnosis
  - For cancer/chronic illness advocates
  - Proceeds support advocacy work

- **"I Don't Want To / I Get To" Shirt** - Jes's motto during cancer treatment
  - Perspective shift: "I get to because some folks don't get to"
  - Personal empowerment message

- **Ribbon Collection** - Multi-cancer awareness
  - Represents all cancer types equally
  - Community-focused design

---

## Extracted Content File

**Location**: `/srv/containers/mtad-api/website_content.json`

**Structure**:
```json
{
  "title": "Page title",
  "url": "Page URL",
  "headings": [
    { "level": "H1", "text": "..." }
  ],
  "paragraphs": ["..."],
  "buttons": [
    { "text": "...", "href": "...", "class": "..." }
  ],
  "links": [
    { "text": "...", "href": "..." }
  ],
  "images": [
    { "src": "...", "alt": "...", "title": "..." }
  ],
  "sections": [
    { "heading": "...", "content": "..." }
  ],
  "fullText": "Complete rendered page text"
}
```

---

## How to Use the Scraper

### Run the Scraper
```bash
cd /srv/containers/mtad-api
node scraper.js
```

### Output
- Displays content summary to console
- Saves full JSON to `website_content.json`
- Shows preview of extracted content
- Lists navigation links found

### Modify the Scraper
Edit `/srv/containers/mtad-api/scraper.js` to:
- Change target URL
- Modify extraction selectors
- Add/remove data fields
- Adjust wait times for slower pages

---

## Key Implementation Notes for Frontend

### Design Approach
The website uses:
- Clean, minimalist design
- Community-focused messaging
- Story-driven content
- Purpose-driven product section
- Warm, accessible tone

### Content Themes
1. **Connection & Community** - Central to brand identity
2. **Stories & Authenticity** - Real people, real journeys
3. **Support & Resources** - Practical help alongside emotional support
4. **Resilience & Empowerment** - Beyond medical labels
5. **Inclusivity** - All diagnoses, all experiences matter

### Recommended Components to Build
- [ ] Hero section with tagline
- [ ] Happy Mail card/section
- [ ] Story/testimonial cards (Jes & Den)
- [ ] Podcast section
- [ ] Resources directory
- [ ] Product showcase (with story narratives)
- [ ] Memorial/tribute section
- [ ] CTA buttons throughout

### Copy Guidelines
- Use warm, empathetic language
- Focus on community and connection
- Highlight real stories and experiences
- Emphasize accessibility and support
- Avoid medical jargon where possible

---

## Dependencies

### Already Installed
- Puppeteer (v23.0.0+)
- Node.js (v18+)

### To Scrape Other Sites
```bash
npm install puppeteer
node scraper.js
```

---

## Next Steps for Claude Web

1. **Review extracted content** at `/srv/containers/mtad-api/website_content.json`
2. **Create page components** based on extracted structure:
   - Home page with hero, Happy Mail, Connect sections
   - Podcast page
   - Resources page
   - Shop page with product cards
   - Support pages (group, circle, journal, in-loving-memory)
3. **Implement navigation** based on extracted links
4. **Style with Tailwind** - Use existing design system
5. **Add API integration** - Connect to backend for dynamic content
6. **Test all pages** - Verify navigation, CTAs, responsiveness

---

## Files & Resources

| File | Purpose |
|------|---------|
| `/srv/containers/mtad-api/scraper.js` | Puppeteer scraper script |
| `/srv/containers/mtad-api/website_content.json` | Extracted content (JSON) |
| `/srv/containers/mtad-api/web/` | Frontend code (Next.js) |
| `/srv/containers/mtad-api/HANDOFF_CLAUDE_WEB.md` | Frontend deployment handoff |

---

## Quick Reference: Extracted Data

**Total Content**:
- 13 heading levels
- 24 substantive paragraphs
- 22 CTA buttons
- 34 navigation/internal links
- 15 images (with alt text)
- 10 major page sections

**Navigation Links Extracted**:
1. Join Our Community → /supportgroup
2. Order Happy Mail → /happymail
3. Podcast → /podcast
4. Resources → /resources
5. Support Group → /supportgroup
6. Support Circle → /groups
7. The Journal → /thejournal
8. In Loving Memory → /inlovingmemory
9. Connect With Us → /meetus
10. Shop → /category/all-products

---

## Troubleshooting

### Scraper Not Finding Content
- Check internet connection
- Verify morethanadiagnosis.org is accessible
- Increase timeout in scraper.js (line 32)
- Check browser console for JavaScript errors

### Missing Images
- Image URLs are stored in website_content.json
- Download and host locally in `/public/images/`
- Update image src paths in components

### Layout Questions
- Wix uses grid-based mesh system
- Responsive design adapts to mobile/desktop
- See original site for exact spacing/sizing

---

## Contact/Questions

If you need to:
- Re-run the scraper: `node scraper.js`
- Modify extraction logic: Edit `/srv/containers/mtad-api/scraper.js`
- Add new pages: Use extracted content as template
- Integrate with API: Check `/srv/containers/mtad-api/backend/`

All code is in the GitHub repo under the main branch. The scraper is production-ready and can be re-run at any time to update content.

---

**Status**: CONTENT EXTRACTION COMPLETE ✅
**Ready for**: Frontend implementation in Next.js
**Files Generated**: website_content.json with full page structure and content

Good luck with the frontend implementation! The extracted content is comprehensive and ready to use. 🚀