Releases: Baron-Sun/socialscikit

SocialSciKit v0.1.0 — Zero-code text analysis toolkit

18 Apr 09:23

@Baron-Sun Baron-Sun

v0.1.0

4e34d56

SocialSciKit v0.1.0 — Zero-code text analysis toolkit Latest

SocialSciKit v0.1.0 — Initial Release

SocialSciKit is an open-source, zero-code toolkit for social science text analysis. It runs entirely in the browser, supports GPT / Claude / Ollama backends, and ships with a bilingual UI (English / 中文).

This initial release covers the full research lifecycle — from raw data to a publication-ready Methods section — through three independent modules plus a unified visualization dashboard.

📦 Three Core Modules

QuantiKit — Text Classification

End-to-end pipeline for supervised text classification.

Method recommendation with CSS-literature citations (zero-shot / few-shot / fine-tuning)
Annotation budget estimation via power-law learning-curve fitting, with 80% CI and marginal-return curves
Built-in annotator (skip / undo / flag) with real-time progress chart
Three classification paths: prompt classification (with APE-based prompt optimization), local transformer fine-tuning, OpenAI fine-tuning API
Pipeline log export in JSON for downstream tools

QualiKit — Qualitative Coding

End-to-end pipeline for interview transcripts, focus groups, and open-ended surveys.

PII de-identification with Chinese + English NER, per-item review and bulk acceptance
Interactive research framework (RQs + sub-themes) with LLM-assisted sub-theme suggestion
LLM batch coding grounded in a verbatim evidence_span extracted from the source text
Review workflow with confidence ranking, bulk accept, manual coding, cascading dropdowns
Structured Excel export + pipeline log

Toolbox — Research Methods Tools

Three standalone utilities that work with any CSV or pipeline log.

ICR Calculator: Cohen's Kappa, Krippendorff's Alpha, Multi-label Jaccard — supports 2 or more coders with auto metric selection
Consensus Coding: dispatch the same coding task to 2–5 LLMs in parallel and aggregate via majority vote
Methods Section Generator: auto-draft a bilingual Methods paragraph from an imported pipeline log or a short form

📊 Visualization Dashboard

Academic-style matplotlib charts embedded throughout both pipelines:

QuantiKit Step 5 (Evaluation) — metric summary cards + row-normalized confusion-matrix heatmap + per-class P/R/F1 grouped bar chart
QuantiKit Step 3 (Annotation) — live progress donut, updated after every action
QualiKit Step 5 (Review) — review-progress donut + confidence histogram (with tier shading and median marker) + theme-distribution horizontal bar chart
Toolbox ICR — pairwise agreement bar chart with "Good" and "Moderate" reference lines

All charts use a consistent blue / green / orange palette and include full CJK font support.

🔍 Evidence Highlighting

LLM coding in QualiKit is now grounded in verbatim evidence rather than opaque labels.

The coding prompt requires the LLM to return an evidence_span — the exact phrase or sentence from the source text that supports the assigned RQ / sub-theme.
In the review UI, the original text is rendered with the supporting quote highlighted in green at the correct position.
When the quote can't be matched verbatim (e.g. paraphrased), a fallback "Evidence" block displays the cited text so reviewers can still audit the coding decision.
Case-insensitive substring matching makes highlighting robust to minor capitalization differences.

This makes every LLM decision auditable — a critical step for IRB-facing qualitative research.

💾 Project Save & Restore

Save the full state of your research session — loaded DataFrames, annotation sessions (with cursor + history + elapsed time), extraction review sessions, research questions, de-identification results — to a single JSON file. Reload from the Home tab to resume work later. Tagged-union serialization keeps complex types (DataFrames, dataclasses, enums) losslessly round-tripping.

🌐 Runtime

Component	Tested
Python	3.9 – 3.12
Gradio	4.44+
LLM backends	OpenAI (gpt-4o / gpt-4o-mini / gpt-4.1), Anthropic (Claude Sonnet 4 / Haiku 4.5), Ollama (Llama 3 / Mistral / Qwen 2.5)
Test suite	676 tests passing

🚀 Install & Launch

pip install socialscikit
socialscikit # launches the unified UI at http://127.0.0.1:7860

Assets 2

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly