[node.js] 웹 데이터 수집 ①

■ Node.js로 웹 페이지 다운로드

01. node.js로 웹 페이지를 다운로드 받는 코드는 아래와 같다.

download-node.js

// url에 있는 파일을 savepath에 다운로드 한다.

// 다운로드 URL을 지정
var url = "http://blog.wickedmiso.com/";

// 저장할 위치를 지정
var savepath = "test.html";

// 사용 모듈 정의
var http = require('http');    // HTTP 모듈
var fs = require('fs');        // 파일 처리 관련 모듈

// 출력 지정
var outfile = fs.createWriteStream(savepath);

// 비동기로 URL의 파일 다운로드
http.get(url, function(res) {
    res.pipe(outfile);
    res.on('end', function() {
        outfile.close();
        console.log("ok");
    });
});

02. 이제 커멘드 창에서 아래와 같이 입력하고 실행하면 이런 결과를 얻을 수 있다.

03. test.html 파일이 생성되어 있는것을 확인 할 수 있다.

04. test.html을 실행하면 아래와 같이 사이트가 오픈되면 성공한것이다.

■ 코드 리팩토링

01. 앞서 작성한 프로그램은 그대로 사용하기에 다소 불편한 감이 있다.

여기서는 함수를 사용해서 재 사용성ㅇ르 높여보자.

download-node-func.js

// 다운로드
download(
    "http://jpub.tistory.com/539"
    , "spring.html"
    , function() {
        console.log("ok, spring.");
    }
);

download(
    "http://jpub.tistory.com/537"
    , "angular.html"
    , function() {
        console.log("ok, augular.");
    }
);

// url의 파일을 savepath에 다운로드하는 함수
function download(url, savepath, callback) {

    var http = require("http");
    var fs = require("fs");
    var outfile = fs.createWriteStream(savepath);

    var req = http.get(url, function(res) {
        res.pipe(outfile);
        res.on("end", function() {
            outfile.close();
            callback();
        });
    });
}

※ 다운로드하는 부분을 함수로 감쌌을 뿐이므로 이전과 크게 바뀐 것은 없다.

Node.js의 코드는 인덴트(들여쓰기)가 깊어지기 쉬워 읽기 쉬운 프로그래밍 언어라고 하기는 어렵다.

02. 실형결과는 다음과 같다.

> node download-node-func.js

03. 그럼 angular.html 파일과, spring.html 파일 2개가 생성된 모습을 확인 할 수 있다.

저작자표시 (새창열림)

'Node.js > Node.js 웹 크롤링' 카테고리의 다른 글

[node.js] 웹 데이터 수집 ② - HTML 해석(링크와 이미지 추출) (0)	2017.06.20

사악미소의 현대마법의 공방

[node.js] 웹 데이터 수집 ①

'Node.js > Node.js 웹 크롤링' 카테고리의 다른 글

티스토리툴바

[node.js] 웹 데이터 수집 ①

'Node.js > Node.js 웹 크롤링' 카테고리의 다른 글

'Node.js/Node.js 웹 크롤링' Related Articles

티스토리툴바