Puppeteer is a Node library which provides a high-level API to control Chrome or chromium over the DevTools Protocol. Puppeteer runs headless by default, but it can be configured to run full Chrome. It was maintained by the Chrome DevTools team and an awesome open-source community. When you install the puppeteer, it downloads the recent version of chromium that is guaranteed to work with the API. The benefit of the puppeteer was, it allowed access to the measurement of loading and rendering times provided by the Chrome Performance Analysis tool. Most things that you can do manually in the browser can be done using Puppeteer such as generate screenshots and PDF of pages, automate form submission, UI testing, keyboard etc.,
A Short Demo for Web Scraping, e2e with Puppeteer
To use puppeteer, you have to installed the Node js in your machine.
npm i puppeteer
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping a web page involves fetching it and extracting from it. Fetching is the downloading of a page. Therefore, web crawling is a main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else.
Headful chrome means displaying the browser graphical user interface and it is very useful for debugging. In that we need to set the headless option to false when the browser will launch.
An automation script consists of a launch point, variables with corresponding binding values, and the source code. You use wizards to create the components of an automation script. You create scripts and launch points or you create a launch point and associate the launch point with an existing script. It provides many benefits including faster execution of repetitive tasks, ability to parallelize workloads and improved test coverage for your website. In the below example, we have to create a simple login process when the user enters the email and password , then they will redirects to their respective page.
The first step of web-scraping is to acquire the selectors. A selector is just a path to the data. You have to acquire the selectors when inspect the element of the page, the developer tools window will open. In the Elements tab of Developer Tools, right-click the highlighted element and select CopySelector. In the below example, instead of map we have to use forEach for looping the data.