Is Puppeteer-Cluster Stealthy enough to pass bot tests?

91
January 10, 2020, at 9:20 PM

I wanted to know if anyone using Puppeteer-Cluster could elaborate on how the Cluster.Launch({settings}) protects against sharing of cookies and web data between pages in different context.

Do the browser contexts here, actually block cookies and user-data is not shared or tracked? Browserless' now infamous page seems to think no, here and that .launch({}) should be called on the task, not ahead of the queue.

So my question is, how do we know if puppeteer-cluster is sharing cookies / data between queued tasks? And what kind of options are in the library to lower the chances of being labeled a bot?

Setup: I am using page.authenticate with a proxy service, random user agent, and still getting blocked(403) occasionally by the site which I'm performing the test.

async function run() {
// Create a cluster with 2 workers
  const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSER, //Cluster.CONCURRENCY_PAGE,
    maxConcurrency: 2, //5, //25, //the number of chromes open
    monitor: false, //true,
    puppeteerOptions: {
      executablePath,
      args: [
        "--proxy-server=pro.proxy.net:2222",
        "--incognito",
        "--disable-gpu",
        "--disable-dev-shm-usage",
        "--disable-setuid-sandbox",
        "--no-first-run",
        "--no-sandbox",
        "--no-zygote"
      ],
      headless: false,
      sameDomainDelay: 1000,
      retryDelay: 3000,
      workerCreationDelay: 3000
    }
  });
   // Define a task 
      await cluster.task(async ({ page, data: url }) => {
         extract(url, page); //call the extract
      });
   //task
      const extract = async ({ page, data: dataJson }) => {
         page.setExtraHTTPHeaders({headers})
         await page.authenticate({
           username: proxy_user, 
           password: proxy_pass
         });
       //Randomized Delay
         await delay(2000 + (Math.floor(Math.random() * 998) + 1));
         const response = await page.goto(dataJson.Url);
 }
//loop over inputs, and queue them into cluster
  var dataJson = {
      url: url
      };
  cluster.queue(dataJson, extract);
 }
 // Shutdown after everything is done
 await cluster.idle();
 await cluster.close();

}

READ ALSO
NodeJS | Mongoose not updating value in DB

NodeJS | Mongoose not updating value in DB

In array I'm updating a valueI am setting member approved value to true then I console

73
How to send initial data on page load?

How to send initial data on page load?

I have a Single Page Application with a server that serves the indexhtml on the / route

68
Google Cloud Storage returning wrong identity

Google Cloud Storage returning wrong identity

On my local machine I've been working on two differents projects with different accounts, etc, none of this projects are related, so

103