Jump to content
Wikimedia Meta-Wiki

Grants:Project/Rapid/Hjfocs/soweego 1.1

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Kippelboy (talk | contribs) at 13:26, 20 July 2019 (→‎Endorsements ). It may differ significantly from the current version .
statusproposed
soweego 1.1: Unity is Strength
Put soweego linkers together for the greater good of Wikidata quality
targetWikidata
start dateAugust 26
end dateOctober 4
budget (local currency)1,780 EUR
budget (USD)2,000 USD
grant typeindividual
granteeHjfocs
contact(s)• fossati(_AT_)spaziodati.eu


Project Goal

soweego[1] links Wikidata items to large external catalogs. It is an artificial intelligence based on multiple machine learning[2] algorithms (AKA linkers). Its vision is to make Wikidata the nucleus of the open data landscape.

The main goal of this proposal is to automatically get the highest-quality links by bringing soweego linkers together: unity is strength.

Problem

Pretty much like a human, soweego claims that a given Wikidata item links to a given catalog identifier with different levels of confidence.

Currently, it only considers the confidence yielded by one linker (the best), thus not leveraging any relationship or information captured by others. That is to say, the system has only one pair of eyes, but it could indeed benefit from extra viewpoints.

Therefore, we can improve the quality and quantity of links by letting soweego linkers join forces.

Solution

Machine learning algorithms capture information in heterogeneous ways, and they have been shown to perform better together, rather than alone.[3] [4] [5]

We propose to build an ensemble system,[6] and to implement it as an enhancement of the soweego linker module.[7]

Furthermore, linkers can behave differently depending on the external catalog. Hence, it is important to automatically tune the weight of each linker in the ensemble. Finally, we will automatically set the optimal parameters of each linker through cross-validation[8] techniques.

Project Plan

Activities

  1. State of the art: explore best practices in ensemble learning and investigate related approaches applied to soweego's task, namely record linkage;[9]
  2. add decision trees[10] to the current pool of linkers;
  3. develop the ensemble system;
  4. implement automatic hyperparameters tuning of linkers;
  5. implement automatic weighting of each linker, for each supported catalog;
  6. evaluate performance and compare to previous results without ensemble;
  7. write reports and include them in a MSc thesis at the University of Trento (Q930528), supervised by Hjfocs.

Outcomes

  1. Release of soweego unity is strength (version 1.1);
  2. delivery of ready-to-use documentation;
  3. engagement of developers through the standard social coding workflow: understand, fork, make a pull request.

Community notification

Impact

  1. 229k confident Wikidata identifier statements created or referenced;[11]
  2. 124k link candidates uploaded to the Mix'n'match tool[12] for curation;[11]
  3. 4 pull requests submitted to the soweego code repository, under the Wikidata GitHub organization.[13]

Resources

Hjfocs will work tighly with Tupini07, and supervise his MSc thesis at the University of Trento (Q930528), together with Prof. Passerini.[14] We will not receive any additional support.

The whole budget is allocated to the implementation efforts.

References

Endorsements

  • Support Support Can't wait to see it in action! Sannita - not just another it.wiki sysop 18:22, 14 July 2019 (UTC)
  • Strong support Strong support (disclaimer: I contributed to the development of soweego 1) I strongly endorse this proposal, because I see it as the natural next step for soweego. We implemented several algorithms, picked the one that performed best, but had to put the others aside. An ensemble would definitely smooth the cons of each algorithm, thus providing the strongest results. MaxFrax96 (talk) 12:15, 15 July 2019 (UTC)
  • Sounds promising. Jonathan Groß (talk) 17:21, 16 July 2019 (UTC)
  • Support Support --Jaqen (talk) 16:54, 18 July 2019 (UTC)
  • Support Support Looking forward to it!

AltStyle によって変換されたページ (->オリジナル) /