- Create Puppeteer-based scraper for morethanadiagnosis.org - Extract full page structure, content, navigation, and images - Generate JSON output with 13 headings, 24 paragraphs, 22 CTAs, 34 links, 15 images - Add comprehensive handoff doc with implementation guide for frontend - Document all website sections: Happy Mail, Support, Podcast, Resources, Shop - Include content themes and recommendations for Next.js components
302 lines
8.1 KiB
Markdown
302 lines
8.1 KiB
Markdown
# Handoff: Website Content Scraper & Frontend Implementation
|
|
|
|
**Date**: November 18, 2025
|
|
**From**: Claude (CL)
|
|
**To**: Claude Web
|
|
**Status**: Website content extracted and ready for frontend implementation
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
A Puppeteer-based web scraper has been successfully created to extract and analyze the morethanadiagnosis.org website. All content, structure, navigation, and assets have been captured and saved for frontend replication.
|
|
|
|
---
|
|
|
|
## What's Been Completed
|
|
|
|
### ✅ Web Scraper Created
|
|
- **Location**: `/srv/containers/mtad-api/scraper.js`
|
|
- **Technology**: Puppeteer (headless browser automation)
|
|
- **Purpose**: Dynamically render and extract JavaScript-heavy Wix website content
|
|
|
|
### ✅ Content Successfully Extracted
|
|
- **Output**: `/srv/containers/mtad-api/website_content.json`
|
|
- **Format**: Structured JSON with all page elements
|
|
|
|
### ✅ Data Captured
|
|
- **13 Headings** - All H1-H6 elements across the page
|
|
- **24 Paragraphs** - Body text and descriptions
|
|
- **22 Buttons/CTAs** - Call-to-action elements
|
|
- **34 Links** - Navigation and external links
|
|
- **15 Images** - Images with alt text and URLs
|
|
- **10 Sections** - Major content sections
|
|
- **Full text** - Complete rendered page content
|
|
|
|
---
|
|
|
|
## Extracted Website Structure
|
|
|
|
### Navigation Menu
|
|
```
|
|
Home
|
|
├── Podcast
|
|
├── Resources
|
|
├── Happy Mail
|
|
├── Support Group
|
|
├── Support Circle
|
|
├── The Journal
|
|
├── In Loving Memory
|
|
├── Connect With Us
|
|
└── Shop
|
|
```
|
|
|
|
### Key Pages & Content Areas
|
|
|
|
#### 1. **Homepage Hero**
|
|
- Title: "You are more than a diagnosis"
|
|
- Tagline: "Connecting Through Stories, Thriving Through Community"
|
|
- Description: Community for folks with chronic illness and those touched by cancer
|
|
- CTA: "Join Our Community"
|
|
|
|
#### 2. **Happy Mail Section**
|
|
- Description: Free joy-filled snail mail program
|
|
- By: Nerisa (sends to folks navigating cancer/chronic illness)
|
|
- Who Can Receive:
|
|
- Cancer diagnosis or treatment
|
|
- Chronic illness or rare disease
|
|
- Medical limbo or recovery
|
|
- CTA: "Order Happy Mail"
|
|
|
|
#### 3. **Support & Community**
|
|
- Connect Section with quote: "We're here to create a safe, supportive space..."
|
|
- Support Circle Login
|
|
- Featured Stories: Jes & Den's journey with cancer/FAP
|
|
|
|
#### 4. **Podcast**
|
|
- "More Than A Diagnosis" podcast
|
|
- Hosts: Jes and Den
|
|
- Content: Real stories about life beyond medical diagnosis
|
|
|
|
#### 5. **Resources**
|
|
- Curated list of helpful resources for diagnosis navigation
|
|
- Financial guidance
|
|
- Support materials
|
|
- Regularly updated
|
|
|
|
#### 6. **Wings of Remembrance**
|
|
- Memorial tribute section
|
|
- Honor those who shaped the journey
|
|
|
|
#### 7. **Shop Products**
|
|
Products with purpose-driven stories:
|
|
- **"Worst Club Best Members" Shirt** - Inspired by Nerisa's Happy Mail program
|
|
- Features duck with tattooed cancer ribbon
|
|
- Celebrates community resilience
|
|
|
|
- **"More Than A Diagnosis" Shirt** - Reminder of strength beyond diagnosis
|
|
- For cancer/chronic illness advocates
|
|
- Proceeds support advocacy work
|
|
|
|
- **"I Don't Want To / I Get To" Shirt** - Jes's motto during cancer treatment
|
|
- Perspective shift: "I get to because some folks don't get to"
|
|
- Personal empowerment message
|
|
|
|
- **Ribbon Collection** - Multi-cancer awareness
|
|
- Represents all cancer types equally
|
|
- Community-focused design
|
|
|
|
---
|
|
|
|
## Extracted Content File
|
|
|
|
**Location**: `/srv/containers/mtad-api/website_content.json`
|
|
|
|
**Structure**:
|
|
```json
|
|
{
|
|
"title": "Page title",
|
|
"url": "Page URL",
|
|
"headings": [
|
|
{ "level": "H1", "text": "..." }
|
|
],
|
|
"paragraphs": ["..."],
|
|
"buttons": [
|
|
{ "text": "...", "href": "...", "class": "..." }
|
|
],
|
|
"links": [
|
|
{ "text": "...", "href": "..." }
|
|
],
|
|
"images": [
|
|
{ "src": "...", "alt": "...", "title": "..." }
|
|
],
|
|
"sections": [
|
|
{ "heading": "...", "content": "..." }
|
|
],
|
|
"fullText": "Complete rendered page text"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## How to Use the Scraper
|
|
|
|
### Run the Scraper
|
|
```bash
|
|
cd /srv/containers/mtad-api
|
|
node scraper.js
|
|
```
|
|
|
|
### Output
|
|
- Displays content summary to console
|
|
- Saves full JSON to `website_content.json`
|
|
- Shows preview of extracted content
|
|
- Lists navigation links found
|
|
|
|
### Modify the Scraper
|
|
Edit `/srv/containers/mtad-api/scraper.js` to:
|
|
- Change target URL
|
|
- Modify extraction selectors
|
|
- Add/remove data fields
|
|
- Adjust wait times for slower pages
|
|
|
|
---
|
|
|
|
## Key Implementation Notes for Frontend
|
|
|
|
### Design Approach
|
|
The website uses:
|
|
- Clean, minimalist design
|
|
- Community-focused messaging
|
|
- Story-driven content
|
|
- Purpose-driven product section
|
|
- Warm, accessible tone
|
|
|
|
### Content Themes
|
|
1. **Connection & Community** - Central to brand identity
|
|
2. **Stories & Authenticity** - Real people, real journeys
|
|
3. **Support & Resources** - Practical help alongside emotional support
|
|
4. **Resilience & Empowerment** - Beyond medical labels
|
|
5. **Inclusivity** - All diagnoses, all experiences matter
|
|
|
|
### Recommended Components to Build
|
|
- [ ] Hero section with tagline
|
|
- [ ] Happy Mail card/section
|
|
- [ ] Story/testimonial cards (Jes & Den)
|
|
- [ ] Podcast section
|
|
- [ ] Resources directory
|
|
- [ ] Product showcase (with story narratives)
|
|
- [ ] Memorial/tribute section
|
|
- [ ] CTA buttons throughout
|
|
|
|
### Copy Guidelines
|
|
- Use warm, empathetic language
|
|
- Focus on community and connection
|
|
- Highlight real stories and experiences
|
|
- Emphasize accessibility and support
|
|
- Avoid medical jargon where possible
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
### Already Installed
|
|
- Puppeteer (v23.0.0+)
|
|
- Node.js (v18+)
|
|
|
|
### To Scrape Other Sites
|
|
```bash
|
|
npm install puppeteer
|
|
node scraper.js
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps for Claude Web
|
|
|
|
1. **Review extracted content** at `/srv/containers/mtad-api/website_content.json`
|
|
2. **Create page components** based on extracted structure:
|
|
- Home page with hero, Happy Mail, Connect sections
|
|
- Podcast page
|
|
- Resources page
|
|
- Shop page with product cards
|
|
- Support pages (group, circle, journal, in-loving-memory)
|
|
3. **Implement navigation** based on extracted links
|
|
4. **Style with Tailwind** - Use existing design system
|
|
5. **Add API integration** - Connect to backend for dynamic content
|
|
6. **Test all pages** - Verify navigation, CTAs, responsiveness
|
|
|
|
---
|
|
|
|
## Files & Resources
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/srv/containers/mtad-api/scraper.js` | Puppeteer scraper script |
|
|
| `/srv/containers/mtad-api/website_content.json` | Extracted content (JSON) |
|
|
| `/srv/containers/mtad-api/web/` | Frontend code (Next.js) |
|
|
| `/srv/containers/mtad-api/HANDOFF_CLAUDE_WEB.md` | Frontend deployment handoff |
|
|
|
|
---
|
|
|
|
## Quick Reference: Extracted Data
|
|
|
|
**Total Content**:
|
|
- 13 heading levels
|
|
- 24 substantive paragraphs
|
|
- 22 CTA buttons
|
|
- 34 navigation/internal links
|
|
- 15 images (with alt text)
|
|
- 10 major page sections
|
|
|
|
**Navigation Links Extracted**:
|
|
1. Join Our Community → /supportgroup
|
|
2. Order Happy Mail → /happymail
|
|
3. Podcast → /podcast
|
|
4. Resources → /resources
|
|
5. Support Group → /supportgroup
|
|
6. Support Circle → /groups
|
|
7. The Journal → /thejournal
|
|
8. In Loving Memory → /inlovingmemory
|
|
9. Connect With Us → /meetus
|
|
10. Shop → /category/all-products
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Scraper Not Finding Content
|
|
- Check internet connection
|
|
- Verify morethanadiagnosis.org is accessible
|
|
- Increase timeout in scraper.js (line 32)
|
|
- Check browser console for JavaScript errors
|
|
|
|
### Missing Images
|
|
- Image URLs are stored in website_content.json
|
|
- Download and host locally in `/public/images/`
|
|
- Update image src paths in components
|
|
|
|
### Layout Questions
|
|
- Wix uses grid-based mesh system
|
|
- Responsive design adapts to mobile/desktop
|
|
- See original site for exact spacing/sizing
|
|
|
|
---
|
|
|
|
## Contact/Questions
|
|
|
|
If you need to:
|
|
- Re-run the scraper: `node scraper.js`
|
|
- Modify extraction logic: Edit `/srv/containers/mtad-api/scraper.js`
|
|
- Add new pages: Use extracted content as template
|
|
- Integrate with API: Check `/srv/containers/mtad-api/backend/`
|
|
|
|
All code is in the GitHub repo under the main branch. The scraper is production-ready and can be re-run at any time to update content.
|
|
|
|
---
|
|
|
|
**Status**: CONTENT EXTRACTION COMPLETE ✅
|
|
**Ready for**: Frontend implementation in Next.js
|
|
**Files Generated**: website_content.json with full page structure and content
|
|
|
|
Good luck with the frontend implementation! The extracted content is comprehensive and ready to use. 🚀
|