coURLan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

[画像:adbar logo]

adrien.barbaresi.eu Source Code Changelog

Suggest Changes

Popularity

2.1

Growing

Activity

2.3

Stars 158

Watchers 0

Forks 11

Last Commit about 1 month ago

Description

Avoid loosing bandwidth capacity and processing time for webpages which are probably not worth the effort. This library provides an additional brain for web crawling, scraping and management of Internet archives. Specific fonctionality for crawlers: stay away from pages with little text content or target synoptic pages explicitly to gather links.

This navigation help targets text-based documents (i.e. currently web pages expected to be in HTML format) and tries to guess the language of pages to allow for language-focused collection. Additional functions include straightforward domain name extraction and URL sampling.

Programming language: Python

License: Apache License 2.0

Tags: Natural Language Processing URL Manipulation WWW Validation

Latest version: v0.6.0

coURLan alternatives and similar packages

Based on the "URL Manipulation" category.
Alternatively, view courlan alternatives based on common mentions on social networks and blogs.

furl

6.4 3.3 L2 coURLan VS furl

🌐 The easiest way to parse and modify URLs in Python.

gruns logo
yarl

5.2 9.1 coURLan VS yarl

Yet another URL library

aio-libs logo

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

Promo www.influxdata.com

[画像:InfluxDB Logo]

webargs

5.1 8.2 L5 coURLan VS webargs

A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

marshmallow-code logo
pyshorteners

3.1 0.0 L5 coURLan VS pyshorteners

DISCONTINUED. :electric_plug: Generating short urls with python has never been easier
purl

2.9 0.0 L5 coURLan VS purl

A simple, immutable URL class with a clean API for interrogation and manipulation.

codeinthehole logo
short_url

2.5 0.0 L5 coURLan VS short_url

Python implementation for generating Tiny URL- and bit.ly-like URLs.

Alir3z4 logo
URL Cleaner

0.9 10.0 coURLan VS URL Cleaner

A package for removing tracing parameters from URLs. This package supports automatically updating filtering rules from Adguard.

fireindark707 logo

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of coURLan or a related project?

Add another 'URL Manipulation' Package

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

featured getstream.io

Popular Comparisons

SaaSHub - Software Alternatives and Reviews

featured www.saashub.com

Do not miss the trending, packages, news and articles with our weekly report.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)

BY-SA

We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.