Facebook Page Follower Scrap (view page source)

Overview

This script is an updated version of facebook page follower scrap. This script will get more profile information through the view page source. This script automates the process of scraping Facebook profile information including:

  • Profile names

  • Facebook IDs

  • Profile links

  • Email addresses

  • Phone numbers

The data is collected from Facebook mobile app via Appium and stored in a Google Sheet.

Main Different

The difference between original version and updated version is that updated versions have been added in to get email and phone numbers from the view page source. Beside, ID also has been updated to be obtained from the view page source.

Below is updated code:

def get_facebook_details(original_url):
    try:
        # Setup Chrome options
        chrome_options = Options()
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--headless=new") 
       
        driver = webdriver.Chrome(options=chrome_options)

        # Visit the original URL
        driver.get(original_url)
        time.sleep(5)

        final_url = driver.current_url
        parsed = urlparse(final_url)
        base_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"

        # Clean the profile link
        
        clean_url = base_url

        # Get page source for detailed scraping
        page_source = driver.page_source
        
        # Extract Facebook user ID
        user_id = None
        id_match = re.search(r'"userID":"(\d+)"', page_source)
        if id_match and id_match.group(1) != '0':
            user_id = id_match.group(1)
            print(f"✅ Extracted ID: {user_id}")
        
        # Extract email addresses
        emails = set()
        email_matches = re.finditer(EMAIL_REGEX, page_source)
        for match in email_matches:
            email = match.group().lower()
            if not email.endswith('facebook.com') and not email.endswith('fb.com'):
                emails.add(email)
        
        # Extract phone numbers
        phones = set()
        phone_matches = re.finditer(PHONE_REGEX, page_source)
        for match in phone_matches:
            phone = match.group(1).strip()
            
            # Clean and standardize the format
            phone = re.sub(r'[^\d\+]', '', phone)  # Remove all non-digit/non-plus characters
            phone = phone.lstrip('0') if phone.startswith('0') and len(phone) > 10 else phone
            
            # Validate it's a proper phone number (at least 8 digits)
            if sum(c.isdigit() for c in phone) >= 8:
                phones.add(phone)
        
        driver.quit()

        return {
            "url": clean_url,
            "id": user_id,
            "emails": list(emails)[:3] if emails else None,
            "phones": list(phones)[:3] if phones else None
        }
    except Exception as e:
        print("❌ Error in get_facebook_details:", e)
        return None

Configuration

Format of email and phone number:

EMAIL_REGEX = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
PHONE_REGEX = r'"text":"((?:\+\d{1,3}[-.\s]?)?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9})"'

Execution:

To run this script: python facebook_scrapper updated.py

Last updated