-
Notifications
You must be signed in to change notification settings - Fork 0
Releases: dotcommander/defuddle
v0.7.3
[v0.7.3] — 2026年06月10日
Fixed
- Sanitize site-specific extractor output before returning
Result.Content, matching the generic parser sanitizer path. - Honor
ProcessCode,ProcessImages,ProcessHeadings,ProcessMath,ProcessFootnotes, andProcessRolesoptions during standardization. - Cap
ParseFromURLresponse reads before buffering the body, returningErrTooLargefor oversized responses. - Return structured
ErrHTTPStatus/HTTPStatusErrorfor non-2xx URL fetches instead of parsing error pages. - Resolve implicit metadata URLs against the final redirect target while preserving an explicit caller-supplied
Options.URL. - Sync selected upstream parser fixes from
kepano/defuddle: ChatGPT split assistant messages, YouTube JSON-LD video metadata selection, markdown link destinations with spaces, and weekday-aware byline cleanup.
Changed
task verifynow runsgovulncheck ./...through the newtask vulngate.
Assets 2
v0.7.2
v0.7.2
Fixed
- fix(extractors/grok): extract body inner HTML instead of full document wrapper
Changed
- refactor(scoring): single-pass anchor metrics in scoreNonContentBlock
Assets 2
v0.7.1
Defuddle Go v0.7.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.7.1
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- 6a60544: fix(lint): errcheck on test pipes and tidy go.mod
Others
- 1223a0a: chore(taskfile): gate tag target on verify
- 0bb9f37: test(cli): add RunE integration tests for all subcommands
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.7.0...v0.7.1
Assets 11
v0.6.0
Full Changelog: v0.5.3...v0.6.0
Assets 2
v0.5.3
Full Changelog: v0.5.2...v0.5.3
Assets 2
v0.5.2
Defuddle Go v0.5.2
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.5.2
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- c748dab: fix(removals): subdomain-aware same-site hostname matching
Performance improvements
- 6ee7089: perf: pre-compiled CSS selectors, regex fast-path, and avoid re-parse on word count
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.5.1...v0.5.2
Assets 11
v0.5.1
Defuddle Go v0.5.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.5.1
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Refactors
- e1191d1: refactor(extractors): split registry.go into per-category files
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.5.0...v0.5.1
Assets 11
v0.5.0
Defuddle Go v0.5.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.5.0
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- 8521924: feat(extractors): port leetcode, discourse, linkedin — complete upstream parity
- 43bff00: feat(extractors): port lwn, c2_wiki, x_oembed (Batch B)
- e17513b: feat(extractors): port wikipedia, medium, nytimes (Batch A)
Others
- e837e74: chore(scripts): add upstream extractor sync checker
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.4.0...v0.5.0
Assets 11
v0.4.0
Defuddle Go v0.4.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.4.0
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- b816f48: feat(cli): accept piped HTML in parse command
- 70f6a46: feat(extractors): add Bluesky thread and quoted-post extraction
- 22b4e28: feat(extractors): add Mastodon federation support with shared comment helpers
- 5061a60: feat(extractors): add Threads extractor with dual DOM+JSON paths
Bug fixes
- 8e6f13b: fix(cli): resolve err113 and errcheck lint findings
- a8456fe: fix: prepare repository for public release
Refactors
- 2291b7c: refactor(cli): extract loadResult/renderOutput, replace property switch with map, delete unused helpers
- 391d389: refactor(cli): replace go-json-experiment with stdlib encoding/json
- 92f8bf3: refactor(defuddle): decompose parseInternal, retry ladder as data, extract isProtectedNode
- 5882003: refactor(extractors): DRY conversation extractors (shared title helpers, fallback selectors, single ExtractMessages pass)
- d62ab88: refactor(scoring): extract ScoreElement sub-functions, hoist magic numbers to const
- 79f4593: refactor: Go 1.24+ modernization (SplitSeq, new(bool), slices.Contains)
Others
- faeba9f: chore(ci): bump actions/upload-artifact v4 → v7
- 3b1697f: chore(deps): bump go-task/setup-task from 1 to 2
- 6b94b3b: chore(deps): bump golang.org/x/net from 0.52.0 to 0.53.0
- 0612afc: style(cli): align errNoURLs with sibling error conventions
- 6311269: test(markdown): add golden file harness for 22 custom renderers
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.3.1...v0.4.0
Assets 11
v0.3.1
Defuddle Go v0.3.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.3.1
Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- 4736085: fix: rename test fixtures to avoid colons in file paths
🔍 Usage Examples
# Extract content from URL defuddle parse https://example.com/article # Convert to markdown defuddle parse https://example.com/article --markdown # Get JSON output with metadata defuddle parse https://example.com/article --json # Extract specific property defuddle parse https://example.com/article --property title
Full Changelog: v0.3.0...v0.3.1