I have created a script (mediawiki gadget), which iterates over every redlink in a rendered wiki-page (that is, over every href with class new
, signifying an internal link to an article that does not exist), and checks if the title exists in a set of known pages. If it does, a few attributes are changed. The execution is extremely simple, but I need it to be as fast as possible.
The set of known pages contains nearly 20K strings, here omitted:
const rawPages=new Set([...])
$("#mw-content-text a.new").each(function() {
var b=$(this),a=b.attr("title");
!a||
a.startsWith("קאַטעגאָריע:") ||
a.startsWith("טעקע:") ||
a.startsWith("שמועס:") ||
a.startsWith("באניצער:") ||
a.startsWith("המכלול:") ||
a.startsWith("הילף:") ||
a.startsWith("מעדיעװיקי:") ||
a.startsWith("מוסטער:") ||
a.startsWith("יחידה:") ||
a.startsWith("דרעפט:") ||
a.startsWith("טעמע:") ||
(a=a.replace(" (בלאַט עקזיסטירט נאכנישט)", ""),
rawPages.has(a)&&
b.attr({
"class": "raw",
"title": a + " (רוי)",
"href": "/"+encodeURIComponent("רוי")+":"+encodeURIComponent(a)
}))});
At first, the script checks if the title begins with some prefixes, in which case we are not interested in modifying the link. But the rawPages
set does not contain any such title anyway; would it be faster to check directly for membership, instead of trying to short-circuit with multiple startsWith
calls?
Anything else I can do, to speed it up?
Thanks!
-
1\$\begingroup\$ What is the reason that this is being done on the frontend and not the backend? Wikimedia has an API for this kind of query. \$\endgroup\$Reinderien– Reinderien2023年07月19日 14:43:27 +00:00Commented Jul 19, 2023 at 14:43
-
\$\begingroup\$ @Reinderien, can you elaborate or point me to what API you have in mind? Thanks! \$\endgroup\$philo math– philo math2023年07月19日 15:14:53 +00:00Commented Jul 19, 2023 at 15:14
-
1\$\begingroup\$ Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. \$\endgroup\$pacmaninbw– pacmaninbw ♦2023年07月19日 15:38:23 +00:00Commented Jul 19, 2023 at 15:38
-
\$\begingroup\$ api.wikimedia.org/wiki/Core_REST_API/Reference \$\endgroup\$Reinderien– Reinderien2023年07月19日 16:17:19 +00:00Commented Jul 19, 2023 at 16:17
-
\$\begingroup\$ @Reinderien, the set contains a list of pages that do exist, just not in the main namespace (and in the processes of being replaced with main-namespace articles) , and they should therefore be handled as semi-existant, so to speak - including changing the href. I don't see how this can be done more efficiently via API calls; surely sending a request for every redlink is not faster than a prepared list... \$\endgroup\$philo math– philo math2023年07月19日 16:56:49 +00:00Commented Jul 19, 2023 at 16:56
2 Answers 2
The easiest way to get a performance boost in your code is to not use jQuery:
const rawPages = new Set([...]);
const anchors = document.querySelectorAll("#mw-content-text a.new");
for (let i = 0; i < anchors.length; i++) {
const a = anchors[i];
const title = a.title;
if (!rawPages.has(title)) {
continue;
}
a.title = title + " (רוי)";
a.classList.add("raw");
// concatenation might be faster than this interpolation, test it.
a.href = `/${encodeURIComponent("רוי"):${encodeURIComponent(title)}`;
}
I've also removed the startsWith
calls as I'm not sure whether that will be saving computation on average.
-
\$\begingroup\$ +1 for removal of jQuery... Minor points, not a performance boost, the modern version would use
for of
loop rather than the noiseyfor ;;
loop where possible. Egfor (const a of anchors) {
, and generally it's always best to avoidcontinue
especially if the alternative 'if (rawPages.has(title)) { .... a.href = ...; }`code is simpler \$\endgroup\$Blindman67– Blindman672023年07月24日 22:44:40 +00:00Commented Jul 24, 2023 at 22:44 -
\$\begingroup\$ Agreed that
for of
is generally preferred, modern or not. However, it's hard to beat a for loop for performance-critical code. I think thecontinue
vs nesting is a personal preference but I agree with you that nesting is probably clearer here. \$\endgroup\$RobH– RobH2023年07月26日 08:52:09 +00:00Commented Jul 26, 2023 at 8:52
You must match T titles against a dictionary of W words, and your current code runs in O(T ×ばつ W) time.
For at least some of your thousands of strings
(and for all eleven that appeared in your example)
the spec is to find each title that .startsWith
this single-word regex: ^(\w+):
So create a Set()
lexicon containing all those words.
Then for each title, use that regex to parse out the
leading word (if any), and look it up in the the lexicon.
If found, perform the .replace operation.
That's O(1) constant time per title,
total of O(T) time for the page.
If you need a leading phrase rather than leading word,
use ^([ \w]+):
or similar expression.
-
\$\begingroup\$ Perhaps I have not expressed myself clearly (or I don't get what you are saying); English is not my native language. The spec is not to find those that startsWith the provided examples - quite the opposite. The spec is to find those which exist in the rawPages set. I check those that startsWith before, to avoid searching the big set when possible. \$\endgroup\$philo math– philo math2023年07月19日 15:21:29 +00:00Commented Jul 19, 2023 at 15:21
-
\$\begingroup\$ I have now added the first few strings of the set to be searched. The task is to find redlinks pointing to articles existing in this set, and modify said redlinks. \$\endgroup\$philo math– philo math2023年07月19日 15:31:19 +00:00Commented Jul 19, 2023 at 15:31
-
\$\begingroup\$ The code in the question already uses a set for lookups so I'd say it's O(T) already. \$\endgroup\$RobH– RobH2023年07月20日 07:19:35 +00:00Commented Jul 20, 2023 at 7:19
-
\$\begingroup\$ @RobH I took W in the example to be eleven, and in the described problem to be closer to twenty thousand. Subsequent comments suggest that was not the original intent. \$\endgroup\$J_H– J_H2023年07月20日 15:08:40 +00:00Commented Jul 20, 2023 at 15:08
Explore related questions
See similar questions with these tags.