- Create Puppeteer-based scraper for morethanadiagnosis.org - Extract full page structure, content, navigation, and images - Generate JSON output with 13 headings, 24 paragraphs, 22 CTAs, 34 links, 15 images - Add comprehensive handoff doc with implementation guide for frontend - Document all website sections: Happy Mail, Support, Podcast, Resources, Shop - Include content themes and recommendations for Next.js components
8.1 KiB
Handoff: Website Content Scraper & Frontend Implementation
Date: November 18, 2025 From: Claude (CL) To: Claude Web Status: Website content extracted and ready for frontend implementation
Overview
A Puppeteer-based web scraper has been successfully created to extract and analyze the morethanadiagnosis.org website. All content, structure, navigation, and assets have been captured and saved for frontend replication.
What's Been Completed
✅ Web Scraper Created
- Location:
/srv/containers/mtad-api/scraper.js - Technology: Puppeteer (headless browser automation)
- Purpose: Dynamically render and extract JavaScript-heavy Wix website content
✅ Content Successfully Extracted
- Output:
/srv/containers/mtad-api/website_content.json - Format: Structured JSON with all page elements
✅ Data Captured
- 13 Headings - All H1-H6 elements across the page
- 24 Paragraphs - Body text and descriptions
- 22 Buttons/CTAs - Call-to-action elements
- 34 Links - Navigation and external links
- 15 Images - Images with alt text and URLs
- 10 Sections - Major content sections
- Full text - Complete rendered page content
Extracted Website Structure
Navigation Menu
Home
├── Podcast
├── Resources
├── Happy Mail
├── Support Group
├── Support Circle
├── The Journal
├── In Loving Memory
├── Connect With Us
└── Shop
Key Pages & Content Areas
1. Homepage Hero
- Title: "You are more than a diagnosis"
- Tagline: "Connecting Through Stories, Thriving Through Community"
- Description: Community for folks with chronic illness and those touched by cancer
- CTA: "Join Our Community"
2. Happy Mail Section
- Description: Free joy-filled snail mail program
- By: Nerisa (sends to folks navigating cancer/chronic illness)
- Who Can Receive:
- Cancer diagnosis or treatment
- Chronic illness or rare disease
- Medical limbo or recovery
- CTA: "Order Happy Mail"
3. Support & Community
- Connect Section with quote: "We're here to create a safe, supportive space..."
- Support Circle Login
- Featured Stories: Jes & Den's journey with cancer/FAP
4. Podcast
- "More Than A Diagnosis" podcast
- Hosts: Jes and Den
- Content: Real stories about life beyond medical diagnosis
5. Resources
- Curated list of helpful resources for diagnosis navigation
- Financial guidance
- Support materials
- Regularly updated
6. Wings of Remembrance
- Memorial tribute section
- Honor those who shaped the journey
7. Shop Products
Products with purpose-driven stories:
-
"Worst Club Best Members" Shirt - Inspired by Nerisa's Happy Mail program
- Features duck with tattooed cancer ribbon
- Celebrates community resilience
-
"More Than A Diagnosis" Shirt - Reminder of strength beyond diagnosis
- For cancer/chronic illness advocates
- Proceeds support advocacy work
-
"I Don't Want To / I Get To" Shirt - Jes's motto during cancer treatment
- Perspective shift: "I get to because some folks don't get to"
- Personal empowerment message
-
Ribbon Collection - Multi-cancer awareness
- Represents all cancer types equally
- Community-focused design
Extracted Content File
Location: /srv/containers/mtad-api/website_content.json
Structure:
{
"title": "Page title",
"url": "Page URL",
"headings": [
{ "level": "H1", "text": "..." }
],
"paragraphs": ["..."],
"buttons": [
{ "text": "...", "href": "...", "class": "..." }
],
"links": [
{ "text": "...", "href": "..." }
],
"images": [
{ "src": "...", "alt": "...", "title": "..." }
],
"sections": [
{ "heading": "...", "content": "..." }
],
"fullText": "Complete rendered page text"
}
How to Use the Scraper
Run the Scraper
cd /srv/containers/mtad-api
node scraper.js
Output
- Displays content summary to console
- Saves full JSON to
website_content.json - Shows preview of extracted content
- Lists navigation links found
Modify the Scraper
Edit /srv/containers/mtad-api/scraper.js to:
- Change target URL
- Modify extraction selectors
- Add/remove data fields
- Adjust wait times for slower pages
Key Implementation Notes for Frontend
Design Approach
The website uses:
- Clean, minimalist design
- Community-focused messaging
- Story-driven content
- Purpose-driven product section
- Warm, accessible tone
Content Themes
- Connection & Community - Central to brand identity
- Stories & Authenticity - Real people, real journeys
- Support & Resources - Practical help alongside emotional support
- Resilience & Empowerment - Beyond medical labels
- Inclusivity - All diagnoses, all experiences matter
Recommended Components to Build
- Hero section with tagline
- Happy Mail card/section
- Story/testimonial cards (Jes & Den)
- Podcast section
- Resources directory
- Product showcase (with story narratives)
- Memorial/tribute section
- CTA buttons throughout
Copy Guidelines
- Use warm, empathetic language
- Focus on community and connection
- Highlight real stories and experiences
- Emphasize accessibility and support
- Avoid medical jargon where possible
Dependencies
Already Installed
- Puppeteer (v23.0.0+)
- Node.js (v18+)
To Scrape Other Sites
npm install puppeteer
node scraper.js
Next Steps for Claude Web
- Review extracted content at
/srv/containers/mtad-api/website_content.json - Create page components based on extracted structure:
- Home page with hero, Happy Mail, Connect sections
- Podcast page
- Resources page
- Shop page with product cards
- Support pages (group, circle, journal, in-loving-memory)
- Implement navigation based on extracted links
- Style with Tailwind - Use existing design system
- Add API integration - Connect to backend for dynamic content
- Test all pages - Verify navigation, CTAs, responsiveness
Files & Resources
| File | Purpose |
|---|---|
/srv/containers/mtad-api/scraper.js |
Puppeteer scraper script |
/srv/containers/mtad-api/website_content.json |
Extracted content (JSON) |
/srv/containers/mtad-api/web/ |
Frontend code (Next.js) |
/srv/containers/mtad-api/HANDOFF_CLAUDE_WEB.md |
Frontend deployment handoff |
Quick Reference: Extracted Data
Total Content:
- 13 heading levels
- 24 substantive paragraphs
- 22 CTA buttons
- 34 navigation/internal links
- 15 images (with alt text)
- 10 major page sections
Navigation Links Extracted:
- Join Our Community → /supportgroup
- Order Happy Mail → /happymail
- Podcast → /podcast
- Resources → /resources
- Support Group → /supportgroup
- Support Circle → /groups
- The Journal → /thejournal
- In Loving Memory → /inlovingmemory
- Connect With Us → /meetus
- Shop → /category/all-products
Troubleshooting
Scraper Not Finding Content
- Check internet connection
- Verify morethanadiagnosis.org is accessible
- Increase timeout in scraper.js (line 32)
- Check browser console for JavaScript errors
Missing Images
- Image URLs are stored in website_content.json
- Download and host locally in
/public/images/ - Update image src paths in components
Layout Questions
- Wix uses grid-based mesh system
- Responsive design adapts to mobile/desktop
- See original site for exact spacing/sizing
Contact/Questions
If you need to:
- Re-run the scraper:
node scraper.js - Modify extraction logic: Edit
/srv/containers/mtad-api/scraper.js - Add new pages: Use extracted content as template
- Integrate with API: Check
/srv/containers/mtad-api/backend/
All code is in the GitHub repo under the main branch. The scraper is production-ready and can be re-run at any time to update content.
Status: CONTENT EXTRACTION COMPLETE ✅ Ready for: Frontend implementation in Next.js Files Generated: website_content.json with full page structure and content
Good luck with the frontend implementation! The extracted content is comprehensive and ready to use. 🚀