admin da63a31c95 feat: add website scraper and handoff documentation for claude-web

- Create Puppeteer-based scraper for morethanadiagnosis.org
- Extract full page structure, content, navigation, and images
- Generate JSON output with 13 headings, 24 paragraphs, 22 CTAs, 34 links, 15 images
- Add comprehensive handoff doc with implementation guide for frontend
- Document all website sections: Happy Mail, Support, Podcast, Resources, Shop
- Include content themes and recommendations for Next.js components

2025-11-18 17:17:43 +00:00

8.1 KiB

Raw Blame History

Handoff: Website Content Scraper & Frontend Implementation

Date: November 18, 2025 From: Claude (CL) To: Claude Web Status: Website content extracted and ready for frontend implementation

Overview

A Puppeteer-based web scraper has been successfully created to extract and analyze the morethanadiagnosis.org website. All content, structure, navigation, and assets have been captured and saved for frontend replication.

What's Been Completed

✅ Web Scraper Created

Location: /srv/containers/mtad-api/scraper.js
Technology: Puppeteer (headless browser automation)
Purpose: Dynamically render and extract JavaScript-heavy Wix website content

✅ Content Successfully Extracted

Output: /srv/containers/mtad-api/website_content.json
Format: Structured JSON with all page elements

✅ Data Captured

13 Headings - All H1-H6 elements across the page
24 Paragraphs - Body text and descriptions
22 Buttons/CTAs - Call-to-action elements
34 Links - Navigation and external links
15 Images - Images with alt text and URLs
10 Sections - Major content sections
Full text - Complete rendered page content

Extracted Website Structure

Home
├── Podcast
├── Resources
├── Happy Mail
├── Support Group
├── Support Circle
├── The Journal
├── In Loving Memory
├── Connect With Us
└── Shop

Key Pages & Content Areas

1. Homepage Hero

Title: "You are more than a diagnosis"
Tagline: "Connecting Through Stories, Thriving Through Community"
Description: Community for folks with chronic illness and those touched by cancer
CTA: "Join Our Community"

2. Happy Mail Section

Description: Free joy-filled snail mail program
By: Nerisa (sends to folks navigating cancer/chronic illness)
Who Can Receive:
- Cancer diagnosis or treatment
- Chronic illness or rare disease
- Medical limbo or recovery
CTA: "Order Happy Mail"

3. Support & Community

Connect Section with quote: "We're here to create a safe, supportive space..."
Support Circle Login
Featured Stories: Jes & Den's journey with cancer/FAP

4. Podcast

"More Than A Diagnosis" podcast
Hosts: Jes and Den
Content: Real stories about life beyond medical diagnosis

5. Resources

Curated list of helpful resources for diagnosis navigation
Financial guidance
Support materials
Regularly updated

6. Wings of Remembrance

Memorial tribute section
Honor those who shaped the journey

7. Shop Products

Products with purpose-driven stories:

"Worst Club Best Members" Shirt - Inspired by Nerisa's Happy Mail program
- Features duck with tattooed cancer ribbon
- Celebrates community resilience
"More Than A Diagnosis" Shirt - Reminder of strength beyond diagnosis
- For cancer/chronic illness advocates
- Proceeds support advocacy work
"I Don't Want To / I Get To" Shirt - Jes's motto during cancer treatment
- Perspective shift: "I get to because some folks don't get to"
- Personal empowerment message
Ribbon Collection - Multi-cancer awareness
- Represents all cancer types equally
- Community-focused design

Extracted Content File

Location: /srv/containers/mtad-api/website_content.json

Structure:

{
  "title": "Page title",
  "url": "Page URL",
  "headings": [
    { "level": "H1", "text": "..." }
  ],
  "paragraphs": ["..."],
  "buttons": [
    { "text": "...", "href": "...", "class": "..." }
  ],
  "links": [
    { "text": "...", "href": "..." }
  ],
  "images": [
    { "src": "...", "alt": "...", "title": "..." }
  ],
  "sections": [
    { "heading": "...", "content": "..." }
  ],
  "fullText": "Complete rendered page text"
}

How to Use the Scraper

Run the Scraper

cd /srv/containers/mtad-api
node scraper.js

Output

Displays content summary to console
Saves full JSON to website_content.json
Shows preview of extracted content
Lists navigation links found

Modify the Scraper

Edit /srv/containers/mtad-api/scraper.js to:

Change target URL
Modify extraction selectors
Add/remove data fields
Adjust wait times for slower pages

Key Implementation Notes for Frontend

Design Approach

The website uses:

Clean, minimalist design
Community-focused messaging
Story-driven content
Purpose-driven product section
Warm, accessible tone

Content Themes

Connection & Community - Central to brand identity
Stories & Authenticity - Real people, real journeys
Support & Resources - Practical help alongside emotional support
Resilience & Empowerment - Beyond medical labels
Inclusivity - All diagnoses, all experiences matter

Recommended Components to Build

Hero section with tagline
Happy Mail card/section
Story/testimonial cards (Jes & Den)
Podcast section
Resources directory
Product showcase (with story narratives)
Memorial/tribute section
CTA buttons throughout

Copy Guidelines

Use warm, empathetic language
Focus on community and connection
Highlight real stories and experiences
Emphasize accessibility and support
Avoid medical jargon where possible

Dependencies

Already Installed

Puppeteer (v23.0.0+)
Node.js (v18+)

To Scrape Other Sites

npm install puppeteer
node scraper.js

Next Steps for Claude Web

Review extracted content at /srv/containers/mtad-api/website_content.json
Create page components based on extracted structure:
- Home page with hero, Happy Mail, Connect sections
- Podcast page
- Resources page
- Shop page with product cards
- Support pages (group, circle, journal, in-loving-memory)
Implement navigation based on extracted links
Style with Tailwind - Use existing design system
Add API integration - Connect to backend for dynamic content
Test all pages - Verify navigation, CTAs, responsiveness

Files & Resources

File	Purpose
`/srv/containers/mtad-api/scraper.js`	Puppeteer scraper script
`/srv/containers/mtad-api/website_content.json`	Extracted content (JSON)
`/srv/containers/mtad-api/web/`	Frontend code (Next.js)
`/srv/containers/mtad-api/HANDOFF_CLAUDE_WEB.md`	Frontend deployment handoff

Quick Reference: Extracted Data

Total Content:

13 heading levels
24 substantive paragraphs
22 CTA buttons
34 navigation/internal links
15 images (with alt text)
10 major page sections

Navigation Links Extracted:

Join Our Community → /supportgroup
Order Happy Mail → /happymail
Podcast → /podcast
Resources → /resources
Support Group → /supportgroup
Support Circle → /groups
The Journal → /thejournal
In Loving Memory → /inlovingmemory
Connect With Us → /meetus
Shop → /category/all-products

Troubleshooting

Scraper Not Finding Content

Check internet connection
Verify morethanadiagnosis.org is accessible
Increase timeout in scraper.js (line 32)
Check browser console for JavaScript errors

Missing Images

Image URLs are stored in website_content.json
Download and host locally in /public/images/
Update image src paths in components

Layout Questions

Wix uses grid-based mesh system
Responsive design adapts to mobile/desktop
See original site for exact spacing/sizing

Contact/Questions

If you need to:

Re-run the scraper: node scraper.js
Modify extraction logic: Edit /srv/containers/mtad-api/scraper.js
Add new pages: Use extracted content as template
Integrate with API: Check /srv/containers/mtad-api/backend/

All code is in the GitHub repo under the main branch. The scraper is production-ready and can be re-run at any time to update content.

Status: CONTENT EXTRACTION COMPLETE ✅ Ready for: Frontend implementation in Next.js Files Generated: website_content.json with full page structure and content

Good luck with the frontend implementation! The extracted content is comprehensive and ready to use. 🚀

8.1 KiB Raw Blame History