I wrote a script that downloads all PDFs found on the web page of a particular government agency. I would have chosen bash
for such a task, but I want the script to run in Node.js. The downloaded files are a few hundred kilobytes big, if that matters.
It is the first time I use ES6, but I want to learn how to take advantage of all ES6 features, and I want to make my script very much in the spirit of ES6. In bash I would have downloaded the list of PDF files first, then proceeded to download them with indentation back to zero, but I got the impression that putting everything within the first request.get
block is more in the ES6 spirit, correct me if I am wrong.
The script works, and validates with ESLint after disabling the following rules:
no-console
because it is intended to be run on the consoleno-undef
because it tells merequire is not defined
one-var
because it would force me to mix require and other declaration thus triggeringno-mixed-requires
sort-vars
because I want to order constants in a more sensible waystrict
because it would not accept"use strict";
as the first line
My code:
"use strict";
// Imports
const async = require("async"),
fs = require("fs"),
http = require("http"),
request = require("request");
// Settings
const INDEX_URL = "http://www.kokuminhogo.go.jp/hinan/index.html",
MAX_PARALLEL_DOWNLOADS = 3,
PREFECTURES_REGEX = /area.*hinan_(.*)\.pdf/g,
REMOTE_PREFIX = "http://www.kokuminhogo.go.jp/pdf/hinan_",
LOCAL_PREFIX = "data/file",
FORMAT_EXTENSION = ".pdf",
REGEX_MATCH_INDEX = 1,
HTTP_STATUS_OK = 200;
// ///////////////////////////////////////////////////////////
// Utility to download a file, with callback
// ///////////////////////////////////////////////////////////
const download = (url, dest, next) => {
console.log(`Downloading ${url} to ${dest}`);
const file = fs.createWriteStream(dest);
http.get(url, (response) => {
response.pipe(file);
file.on("finish", () => {
file.close();
next();
});
});
};
// ///////////////////////////////////////////////////////////
// Get list of available prefectures
// ///////////////////////////////////////////////////////////
request.get(INDEX_URL, (error, response, body) => {
const prefectures = [];
if (!error && response.statusCode === HTTP_STATUS_OK) {
let match = PREFECTURES_REGEX.exec(body);
while (match !== null) {
prefectures.push(match[REGEX_MATCH_INDEX]);
match = PREFECTURES_REGEX.exec(body);
}
}
console.log(`Number of prefectures available: ${prefectures.length}`);
// ///////////////////////////////////////////////////////////
// Download the PDF files for all prefectures
// ///////////////////////////////////////////////////////////
// Files will be numbered 1, 2, etc
let fileNumber = 1;
// Process in parallel
async.eachLimit(
prefectures,
MAX_PARALLEL_DOWNLOADS,
(prefecture, next) => {
download(
REMOTE_PREFIX + prefecture + FORMAT_EXTENSION,
LOCAL_PREFIX + fileNumber + FORMAT_EXTENSION,
next
);
fileNumber += 1;
}
,
() => {
console.log("Finished downloading");
}
);
});
1 Answer 1
Things that come to my mind after a quick read:
- do not use
const someFunction = (...) => {...}
instead offunction someFunction(...) {...})
- use
module.exports
even if it is a script - group functionality into functions; you have comments beautifully spaced, but a function name will be more readable and maintainable
- try to have functions that do one thing; when explaining to someone what a function does, and you have to use
and
then you should split the function - try to use
map
,filter
, etc. instead of having iterations with a variable (e.g.fileNumber
) - whenever you have
someString + someOther
you could use string templating (i.e.${someString}${someOther}
- prefer using promises instead of callbacks; the code ends up being more readable; e.g. use bluebird.promisify
- use a good eslint config (e.g. airbnb)
To enhance collaboration, you could use a github repository and reference it here.
Explore related questions
See similar questions with these tags.
no-undef
, tell ESLint it's being run in Node: eslint.org/docs/user-guide/configuring#specifying-environments \$\endgroup\$