Node.js scraping with the request module

73
January 10, 2020, at 6:30 PM

I want to get html from a web. But it show like that.

meta http-equiv=refresh content="0;url=http://www.skku.edu/errSkkuPage.jsp">

But when I use https://www.naver.com/ instead of https://www.skku.edu/skku/index.do, it works well.

I want to know the reason.

Here's my code.

var request = require('request');
const url = "https://www.skku.edu/skku/index.do";
request(url, function(error, response, body){
  if (error) throw error;
  console.log(body);
});
Answer 1

The website blocks the request that is coming from programmatic script checking User-Agent in the request header. Pass the user-Agent that web-browser(eg: Google chrome) sends and it should work.

var request = require('request');
var options = {
    'method': 'GET',
    'url': 'https://www.skku.edu/skku/index.do',
    'headers': {
    'User-Agent': ' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'
 }
};
request(options, function (error, response) {
    if (error) throw new Error(error);
    console.log(response.body);
});
Answer 2

I wouldn't recommend request module as it is not maintained for changes anymore. see it here - https://github.com/request/request/issues/3142

You could look for alternatives in form of got, axios etc which makes code much more readable and clear. And most important thing - Native support for promises and async/await The above code will look like

var got = require('got');
const url = "https://www.skku.edu/skku/index.do";
(async () => {
  const response = await got(url);
  console.log(response.body);
})();
READ ALSO
Is it possible to lazy load a set of components/modules in React from a url?

Is it possible to lazy load a set of components/modules in React from a url?

I have a React application that I'm building that will support 3rd party developers writing their own components that we will test and then host on our company CDNI have dynamic loading of modules working using @loadable/component, but I can't figure...

88
Heroku error module not found 'json-server'

Heroku error module not found 'json-server'

I want to deploy my server on Heroku, but on Heroku build it says this:

107
Node.js Multer “.array is not a function”

Node.js Multer “.array is not a function”

I've been looking and trying to figure this out for two days now, and the only real mention of it that I can find is an old issue report on version 11

85
Sending multiple files in nodejs through TCP sockets

Sending multiple files in nodejs through TCP sockets

Here is the logical that im using:

79