Jump to content
Wikimedia Meta-Wiki

Research:Surviving new editor

From Meta, a Wikimedia project coordination wiki
Surviving new editor
Specification
A surviving new editor ( n , m , t 1 , t 2 , t 3 ) {\displaystyle {\text{surviving new editor}}(n,m,t_{1},t_{2},t_{3})} {\displaystyle {\text{surviving new editor}}(n,m,t_{1},t_{2},t_{3})} is a new editor who completes at least n {\displaystyle n} {\displaystyle n} edits within t 1 {\displaystyle t_{1}} {\displaystyle t_{1}} time since registration ( T {\displaystyle T} {\displaystyle T}) and also completes m {\displaystyle m} {\displaystyle m} edits in the survival period [ T + t 2 , T + t 2 + t 3 ] {\displaystyle [T+t_{2},T+t_{2}+t_{3}]} {\displaystyle [T+t_{2},T+t_{2}+t_{3}]}.
WMF Standard
  • n {\displaystyle n} {\displaystyle n} = 1 edit
  • m {\displaystyle m} {\displaystyle m} = 1 edit
  • t 1 {\displaystyle t_{1}} {\displaystyle t_{1}} = 1 day
  • t 2 {\displaystyle t_{2}} {\displaystyle t_{2}} = 30 days (~ one month)
  • t 3 {\displaystyle t_{3}} {\displaystyle t_{3}} = 30 days (~ one month)
Measures
Editor retention
Aliases
Retained editor
Related metrics
New editor
Status
draft
SQL
SET@activation_period=1;/* One day */
SET@n=1;/* One activation edit */
SET@trial_period=30;/* 30 days */
SET@survival_period=30;/* 30 days*/
SET@m=1;/* One survival edit */
SET@start_date="20140101";/* January 1st, 2014 after midnight */
SET@end_date="20140201";/* February 1st, 2014 before midnight */
SELECT
user_id,
user_name,
user_registration,
SUM(activation_edits)>@nASactivated,
SUM(activation_edits)>@nANDSUM(surviving_edits)>@mASsurviving,
(
UNIX_TIMESTAMP(NOW())<
UNIX_TIMESTAMP(DATE_ADD(user_registration,INTERVAL@trial_period+@survival_periodDAY))
)AScensored
FROM(
SELECT
user_id,
user_name,
user_registration,
SUM(
rev_timestampBETWEEN
user_registrationAND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@activation_periodDAY),"%Y%m%d%H%i%M")
)ASactivation_edits,
SUM(
rev_timestampBETWEEN
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_periodDAY),"%Y%m%d%H%i%M")AND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_period+@survival_periodDAY),"%Y%m%d%H%i%M")
)ASsurviving_edits
FROMuser
LEFTJOINrevisionON
user_id=rev_userAND
(
rev_timestampBETWEEN
user_registrationAND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@activation_periodDAY),"%Y%m%d%H%i%M")OR
rev_timestampBETWEEN
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_periodDAY),"%Y%m%d%H%i%M")AND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_period+@survival_periodDAY),"%Y%m%d%H%i%M")
)
WHEREuser_registrationBETWEEN@start_dateAND@end_date
UNIONALL
SELECT
user_id,
user_name,
user_registration,
SUM(
ar_timestampBETWEEN
user_registrationAND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@activation_periodDAY),"%Y%m%d%H%i%M")
)ASactivation_edits,
SUM(
ar_timestampBETWEEN
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_periodDAY),"%Y%m%d%H%i%M")AND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_period+@survival_periodDAY),"%Y%m%d%H%i%M")
)ASsurviving_edits
FROMuser
LEFTJOINarchiveON
user_id=ar_userAND
(
ar_timestampBETWEEN
user_registrationAND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@activation_periodDAY),"%Y%m%d%H%i%M")OR
ar_timestampBETWEEN
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_periodDAY),"%Y%m%d%H%i%M")AND
DATE_FORMAT(DATE_ADD(user_registration,INTERVAL@trial_period+@survival_periodDAY),"%Y%m%d%H%i%M")
)
WHEREuser_registrationBETWEEN@start_dateAND@end_date
)split_edit_counts
GROUPBYuser_id,user_name,user_registration;

Surviving new editor is a standardized user class used to measure the number of first-time editors in a wiki project who continue to edit for a substantial period of time. It's used as a proxy for editor retention.

Discussion

[edit ]

The t 1 {\displaystyle t_{1}} {\displaystyle t_{1}} activation period

[edit ]

The activation period selects users whose retention needs to be measured:

  • setting t 1 = 0 {\displaystyle t_{1}=0} {\displaystyle t_{1}=0} measures the retention (or rather a delayed activation) of newly registered users, regardless of when they started editing.
  • by setting t 1 > 0 {\displaystyle t_{1}>0} {\displaystyle t_{1}>0} to a value other than 0 we restrict the measurement of retention to a subset of users who edited within a given activation period since registration
  • by setting t 1 = 1 {\displaystyle t_{1}=1} {\displaystyle t_{1}=1} we measure the retention of new editors, based on the proposed definition of a new editor: when we do so, we effectively consider surviving new editors as a proper subset of new editors.

The t 2 {\displaystyle t_{2}} {\displaystyle t_{2}} trial period

[edit ]

During the trial period, new editors are presumed to be testing out Wikipedia and Wikipedians are testing out the editor. This is the time when non-retained editors tend to leave Wikipedia and when retained editors decide to stick around. The longer the duration of this period, the longer an editor will need to remain active in order to be counted.

The t 3 {\displaystyle t_{3}} {\displaystyle t_{3}} survival period

[edit ]

During the survival period, new editors who are retained are expected to show some activity to indicate their survival. The longer the duration of the survival period, the more likely we are to notice some activity from editors who are less consistently active. Longer survival periods are also likely to catch users who left Wikipedia reactivating their accounts.

Analysis

[edit ]

Wikis

[edit ]

German

[edit ]
(追記) (追記ここまで)
The proportion of surviving newly registered user is plotted by registration date for a set of different trial and survival periods.
Survival rate comparison (dewiki). The proportion of surviving newly registered user is plotted by registration date for a set of different trial and survival periods.

English

[edit ]
(追記) (追記ここまで)
The proportion of surviving newly registered users is plotted by registration date for a set of different trial and survival periods.
Survival rate comparison (enwiki). The proportion of surviving newly registered users is plotted by registration date for a set of different trial and survival periods.

Sensitivity

[edit ]

Trial period duration

[edit ]
(追記) (追記ここまで)
The factor of difference between proportions of surviving new editors for different trial periods is plotted (based on trial period = 3 months and locking the survival period to 3 months).
Trial period factor. The factor of difference between proportions of surviving new editors for different trial periods is plotted (based on trial period = 3 months and locking the survival period to 3 months).

Figure #Trial period factor plots the factor relationship between the # of users who edit after 3 months (horizontal line at 1 {\displaystyle 1} {\displaystyle 1}) and the number users who edit after 1, 2, 4, 5 and 6 months. It looks like both enwiki and dewiki have a bit of trend where the number of users surviving for 1 or 2 trial months in relation to 3 or more is changing. This is not extreme and therefore might not matter. But it does suggest that even users who survive 1-2 months are getting less likely to survive 3.

Survival period duration

[edit ]
(追記) (追記ここまで)
The factor of difference between proportions of surviving new editors for different survival periods is plotted (based on survival period = 3 months and locking the trial period to 3 months).
Survival period factor. The factor of difference between proportions of surviving new editors for different survival periods is plotted (based on survival period = 3 months and locking the trial period to 3 months).

Figure #Survival period factor plots the factor relationship between the # of users who edit within a 3 month window (horizontal line at 1 {\displaystyle 1} {\displaystyle 1}) and the number users who edit within 1, 2, 4, 5 and 6 month windows. For the survival period duration, we don't see any meaningful change over time.

Usage

[edit ]

References

[edit ]

AltStyle によって変換されたページ (->オリジナル) /