status-stable last commit Ask DeepWiki Twitter Follow
BlackWeb is a project that collects and unifies public blocklists of domains (porn, downloads, drugs, malware, spyware, trackers, bots, social networks, warez, weapons, etc.) to make them compatible with Squid-Cache.
| ACL | Blocked Domains | File Size |
|---|---|---|
| blackweb.txt | 4772375 | 118,8 MB |
git clone --depth=1 https://github.com/maravento/blackweb.git
blackweb.txt is already updated and optimized for Squid-Cache. Download it and unzip it in the path of your preference and activate Squid-Cache RULE.
wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf -
#!/bin/bash # Variables url="https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz" wgetd="wget -q -c --timestamping --no-check-certificate --retry-connrefused --timeout=10 --tries=4 --show-progress" # TMP folder output_dir="bwtmp" mkdir -p "$output_dir" # Download if $wgetd "$url"; then echo "File downloaded: $(basename $url)" else echo "Main file not found. Searching for multiparts..." # Multiparts from a to z all_parts_downloaded=true for part in {a..z}{a..z}; do part_url="${url%.*}.$part" if $wgetd "$part_url"; then echo "Part downloaded: $(basename $part_url)" else echo "Part not found: $part" all_parts_downloaded=false break fi done if $all_parts_downloaded; then # Rebuild the original file in the current directory cat blackweb.tar.gz.* > blackweb.tar.gz echo "Multipart file rebuilt" else echo "Multipart process cannot be completed" exit 1 fi fi # Unzip the file to the output folder tar -xzf blackweb.tar.gz -C "$output_dir" echo "Done"
wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf - wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.txt.sha256 LOCAL=$(sha256sum blackweb.txt | awk '{print 1ドル}'); REMOTE=$(awk '{print 1ドル}' blackweb.txt.sha256); echo "$LOCAL" && echo "$REMOTE" && [ "$LOCAL" = "$REMOTE" ] && echo OK || echo FAIL
BlackWeb Rule for Squid-Cache
Edit:
/etc/squid/squid.conf
And add the following lines:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # Block Rule for Blackweb acl blackweb dstdomain "/path_to/blackweb.txt" http_access deny blackweb
BlackWeb contains millions of domains, therefore it is recommended:
Use
allowdomains.txtto exclude essential domains or subdomains, such as.accounts.google.com,.yahoo.com,.github.com, etc. According to Squid's documentation, the subdomainsaccounts.google.comandaccounts.youtube.commay be used by Google for authentication within its ecosystem. Blocking them could disrupt access to services like Gmail, Drive, Docs, and others.
acl allowdomains dstdomain "/path_to/allowdomains.txt"
http_access allow allowdomainsUse
blockdomains.txtto block any other domain not included inblackweb.txt
acl blockdomains dstdomain "/path_to/blockdomains.txt"
http_access deny blockdomainsUse
blocktlds.txtto block gTLD, sTLD, ccTLD, etc.
acl blocktlds dstdomain "/path_to/blocktlds.txt"
http_access deny blocktldsInput:
.bardomain.xxx .subdomain.bardomain.xxx .bardomain.ru .bardomain.adult .foodomain.com .foodomain.porn
Output:
.foodomain.com
Use this rule to block Punycode - RFC3492, IDN | Non-ASCII (TLDs or Domains), to prevent an IDN homograph attack. For more information visit welivesecurity: Homograph attacks.
acl punycode dstdom_regex -i \.xn--.* http_access deny punycode
Input:
.bücher.com .mañana.com .google.com .auth.wikimedia.org .xn--fiqz9s .xn--p1ai
ASCII Output:
.google.com .auth.wikimedia.org
Use this rule to block words (Optional. Can generate false positives).
# Download ACL: sudo wget -P /etc/acl/ https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/squid/blockwords.txt # Squid Rule to Block Words: acl blockwords url_regex -i "/etc/acl/blockwords.txt" http_access deny blockwords
Input:
.bittorrent.com https://www.google.com/search?q=torrent https://www.google.com/search?q=mydomain https://www.google.com/search?q=porn .mydomain.com
Output:
https://www.google.com/search?q=mydomain
.mydomain.comUse
streaming.txtto block streaming domains not included inblackweb.txt(for example: .youtube.com .googlevideo.com, .ytimg.com, etc.).
acl streaming dstdomain "/path_to/streaming.txt"
http_access deny streamingNote: This list may contain overlapping domains. It is important to manually clean it according to the proposed objective. Example:
- If your goal is to block Facebook, keep the primary domains and remove specific subdomains.
- If your goal is to block features, like Facebook streaming, keep the specific subdomains and remove the primary domains to avoid impacting overall site access. Example:
# Block Facebook .fbcdn.net .facebook.com # Block some Facebook streaming content .z-p3-video.flpb1-1.fna.fbcdn.net
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # Allow Rule for Domains acl allowdomains dstdomain "/path_to/allowdomains.txt" http_access allow allowdomains # Block Rule for Punycode acl punycode dstdom_regex -i \.xn--.* http_access deny punycode # Block Rule for gTLD, sTLD, ccTLD acl blocktlds dstdomain "/path_to/blocktlds.txt" http_access deny blocktlds # Block Rule for Domains acl blockdomains dstdomain "/path_to/blockdomains.txt" http_access deny blockdomains # Block Rule for Patterns (Optional) # https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/squid/blockpatterns.txt acl blockwords url_regex -i "/path_to/blockpatterns.txt" http_access deny blockwords # Block Rule for web3 (Optional) # https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/web3/web3domains.txt acl web3 dstdomain "/path_to/web3domains.txt" http_access deny web3 # Block Rule for Blackweb acl blackweb dstdomain "/path_to/blackweb.txt" http_access deny blackweb
This section is only to explain how update and optimization process works. It is not necessary for user to run it. This process can take time and consume a lot of hardware and bandwidth resources, therefore it is recommended to use test equipment.
The update process of
blackweb.txtconsists of several steps and is executed in sequence by the scriptbwupdate.sh. The script will request privileges when required.
wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/bwupdate.sh && chmod +x bwupdate.sh && ./bwupdate.sh
Update requires python 3x and bash 5x. It also requires the following dependencies:
wget git curl libnotify-bin perl tar rar unrar unzip zip gzip python-is-python3 idn2 iconv
Make sure your Squid is installed correctly. If you have any problems, run the following script: (
sudo ./squid_install.sh):
#!/bin/bash # kill old version while pgrep squid > /dev/null; do echo "Waiting for Squid to stop..." killall -s SIGTERM squid &>/dev/null sleep 5 done # squid remove (if exist) apt purge -y squid* &>/dev/null rm -rf /var/spool/squid* /var/log/squid* /etc/squid* /dev/shm/* &>/dev/null # squid install (you can use 'squid-openssl' or 'squid') apt install -y squid-openssl squid-langpack squid-common squidclient squid-purge # create log if [ ! -d /var/log/squid ]; then mkdir -p /var/log/squid fi &>/dev/null if [[ ! -f /var/log/squid/{access,cache,store,deny}.log ]]; then touch /var/log/squid/{access,cache,store,deny}.log fi &>/dev/null # permissions chown -R proxy:proxy /var/log/squid # enable service systemctl enable squid.service systemctl start squid.service echo "Done"
Capture domains from downloaded public blocklists (see SOURCES) and unifies them in a single file.
Remove overlapping domains (
'.sub.example.com' is a subdomain of '.example.com'), does homologation to Squid-Cache format and excludes false positives (google, hotmail, yahoo, etc.) with a allowlist (debugwl.txt).
Input:
com .com .domain.com domain.com 0.0.0.0 domain.com 127.0.0.1 domain.com ::1 domain.com domain.com.co foo.bar.subdomain.domain.com .subdomain.domain.com.co www.domain.com www.foo.bar.subdomain.domain.com domain.co.uk xxx.foo.bar.subdomain.domain.co.uk
Output:
.domain.com .domain.com.co .domain.co.uk
Remove domains with invalid TLDs (with a list of Public and Private Suffix TLDs: ccTLD, ccSLD, sTLD, uTLD, gSLD, gTLD, eTLD, etc., up to 4th level 4LDs).
Input:
.domain.exe .domain.com .domain.edu.co
Output:
.domain.com .domain.edu.co
Remove hostnames larger than 63 characters (RFC 1035) and other characters inadmissible by IDN and convert domains with international characters (non ASCII) and used for homologous attacks to Punycode/IDNA format.
Input:
bücher.com café.fr españa.com köln-düsseldorfer-rhein-main.de mañana.com mūsųlaikas.lt sendesık.com президент.рф
Output:
xn--bcher-kva.com xn--caf-dma.fr xn--d1abbgf6aiiy.xn--p1ai xn--espaa-rta.com xn--kln-dsseldorfer-rhein-main-cvc6o.de xn--maana-pta.com xn--mslaikas-qzb5f.lt xn--sendesk-wfb.com
Removes entries with invalid encoding, non-printable characters, whitespace, disallowed symbols, and any content that does not conform to the strict ASCII format for valid domain names (CP1252, ISO-8859-1, corrupted UTF-8, etc.). Converts the output to plain text with
charset=us-ascii, ensuring a clean, standardized list ready for validation, comparison, or DNS resolution:
Input:
M-C$ -$ .$ 0$ 1$ 23andmÃa.com .Ã2utlook.com .ălibăbă.com .ămăzon.com .ăvăst.com .amÃ1azon.com .amÉTMzon.com .avalÃ3n.com .bÄonance.com .bitdáo1fender.com .blÃ3ckchain.site .blockchaiÇ1.com .cashpluÈTM.com .dáo1ll.com .diÃ3cesisdebarinas.org .disnáo1ylandparis.com .ebăy.com .ÉTMmÉTMzon.com .evo-bancÃ3.com .goglÄTM.com .gooÄŸle.com .googÄ1⁄4ÄTM.com .googlÉTM.com .google.com .ibáo1ria.com .imgÃor.com .lloydÅŸbank.com .mÃ1⁄2etherwallet.com .mrgreÄTMn.com .myáo1tháo1rwallet.com .myáo1thernwallet.com .myetháo1rnwallet.com .myetheá1TMwallet.com .myethernwalláo1t.com .nÄTMtflix.com .paxfÃ1ll.com .tÃ1⁄4rkiyeisbankasi.com .tÅTMezor.com .westernÃonion.com .yÃ2utube.com .yăhoo.com .yoÃ1⁄4tÃ1⁄4be.co .yoÃ1⁄4tÃ1⁄4be.com .yoÃ1⁄4tu.be
Output:
.google.com
Most of the SOURCES contain millions of invalid or nonexistent domains, so each domain is double-checked via DNS (in 2 steps) to exclude those entries from Blackweb. This process is performed in parallel and can be resource-intensive, depending on your hardware and network conditions. You can control concurrency with the
PROCSvariable:
PROCS=$(($(nproc))) # Conservative (network-friendly) PROCS=$(($(nproc) * 2)) # Balanced PROCS=$(($(nproc) * 4)) # Aggressive (default) PROCS=$(($(nproc) * 8)) # Extreme (8 or higher, use with caution)
For example, on a system with a Core i5 CPU (4 physical cores / 8 threads with Hyper-Threading):
nproc → 8
PROCS=$((8 * 4)) → 32 parallel queries
⚠️ HighPROCSvalues increase DNS resolution speed but may saturate your CPU or bandwidth, especially on limited networks like satellite links. Adjust accordingly. Real-time processing example:
Processed: 2463489 / 7244989 (34.00%)
Output:
HIT google.com google.com has address 142.251.35.238 google.com has IPv6 address 2607:f8b0:4008:80b::200e google.com mail is handled by 10 smtp.google.com. FAULT testfaultdomain.com Host testfaultdomain.com not found: 3(NXDOMAIN)
Remove government domains (.gov) and other related TLDs from BlackWeb.
Input:
.argentina.gob.ar .mydomain.com .gob.mx .gov.uk .navy.mil
Output:
.mydomain.com
Run Squid-Cache with BlackWeb and any error sends it to
SquidError.txt.
BlackWeb: Done 06/05/2023 15:47:14
- The default path of BlackWeb is
/etc/acl. You can change it for your preference. - If you need to interrupt the execution of
bwupdate.sh(ctrl + c) and it stopped at the DNS Loockup part, it will restart at that point. If you stop it earlier, you will have to start from the beginning or modify the script manually so that it starts from the desired point. - If you use
aufs, temporarily change it toufsduring the upgrade, to avoid:ERROR: Can't change type of existing cache_dir aufs /var/spool/squid to ufs. Restart required.
- ABPindo - indonesianadblockrules
- abuse.ch - hostfile
- Adaway - host
- adblockplus - advblock Russian
- adblockplus - antiadblockfilters
- adblockplus - easylistchina
- adblockplus - easylistlithuania
- anudeepND - adservers
- anudeepND - coinminer
- AssoEchap - stalkerware-indicators
- azet12 - KADhosts
- BarbBlock - blacklists
- BBcan177 - minerchk
- BBcan177 - MS-2
- BBcan177 - referrer-spam-blacklist
- betterwebleon - slovenian-list
- bigdargon - hostsVN
- BlackJack8 - iOSAdblockList
- BlackJack8 - webannoyances
- blocklistproject - everything
- cert.pl - List of malicious domains
- chadmayfield - porn top
- chadmayfield - porn_all
- chainapsis - phishing-block-list
- cjx82630 - Chinese CJX's Annoyance List
- cobaltdisco - Google-Chinese-Results-Blocklist
- CriticalPathSecurity - Public-Intelligence-Feeds
- DandelionSprout - adfilt
- Dawsey21 - adblock-list
- Dawsey21 - main-blacklist
- developerdan - ads-and-tracking-extended
- Disconnect.me - simple_ad
- Disconnect.me - simple_malvertising
- Disconnect.me - simple_tracking
- dorxmi - nothingblock
- Eallion - uBlacklist
- EasyList - EasyListHebrew
- ethanr - dns-blacklists
- fabriziosalmi - blacklists
- firebog - AdguardDNS
- firebog - Admiral
- firebog - Easylist
- firebog - Easyprivacy
- firebog - Kowabit
- firebog - neohostsbasic
- firebog - Prigent-Ads
- firebog - Prigent-Crypto
- firebog - Prigent-Malware
- firebog - RPiList-Malware
- firebog - RPiList-Phishing
- firebog - WaLLy3K
- frogeye - firstparty-trackers-hosts
- gardar - Icelandic ABP List
- greatis - Anti-WebMiner
- hagezi - dns-blocklists
- hexxium - threat-list/
- hoshsadiq - adblock-nocoin-list
- jawz101 - potentialTrackers
- jdlingyu - ad-wars
- joelotz - URL_Blacklist
- kaabir - AdBlock_Hosts
- kevle1 - Windows-Telemetry-Blocklist - xiaomiblock
- liamja - Prebake Filter Obtrusive Cookie Notices
- malware-filter - URLhaus Malicious URL Blocklist
- malware-filter.- phishing-filter-hosts
- Matomo-org - spammers
- MBThreatIntel - malspam
- mine.nu - hosts0
- mitchellkrogza - Badd-Boyz-Hosts
- mitchellkrogza - hacked-domains
- mitchellkrogza - nginx-ultimate-bad-bot-blocker
- mitchellkrogza - strip_domains
- molinero - hBlock
- NanoAdblocker - NanoFilters
- neodevpro - neodevhost
- notracking - hosts-blocklists
- Oleksiig - Squid-BlackList
- openphish - feed
- pengelana - domains blocklist
- phishing.army - phishing_army_blocklist_extended
- piperun - iploggerfilter
- quidsup - notrack-blocklists
- quidsup - notrack-malware
- reddestdream - MinimalHostsBlocker
- RooneyMcNibNug - pihole-stuff
- Rpsl - adblock-leadgenerator-list
- ruvelro - Halt-and-Block-Mining
- ryanbr - fanboy-adblock
- scamaNet - blocklist
- simeononsecurity/System-Wide-Windows-Ad-Blocker
- Someonewhocares - hosts
- stanev.org - Bulgarian adblock list
- StevenBlack - add.2o7Net
- StevenBlack - add.Risk
- StevenBlack - fakenews-gambling-porn-social
- StevenBlack - hosts
- StevenBlack - spam
- StevenBlack - uncheckyAds
- Stopforumspam - Toxic Domains
- sumatipru - squid-blacklist
- Taz - SpamDomains
- tomasko126 - Easylist Czech and Slovak filter list
- txthinking - blackwhite
- txthinking - bypass china domains
- Ultimate Hosts Blacklist - hosts
- Université Toulouse 1 Capitole - Blacklists UT1 - Olbat
- Université Toulouse 1 Capitole - Blacklists UT1
- vokins - yhosts
- Winhelp2002 - hosts
- yourduskquibbles - Web Annoyances Ultralist
- yous - YousList
- yoyo - Peter Lowe’s Ad and tracking server list
- zoso - Romanian Adblock List
- google supported domains
- iana
- ipv6-hosts (Partial)
- publicsuffix
- Ransomware Database
- University Domains and Names Data List
- whoisxmlapi
- Awesome Open Source: Blackweb
- Community IPfire: url filter and self updating blacklists
- covert.io: Getting Started with DGA Domain Detection Research
- Crazymax: WindowsSpyBlocker
- egirna: Allowing/Blocking Websites Using Squid
- Jason Trost: Getting Started with DGA Domain Detection Research
- Kandi Openweaver: Domains Blocklist for Squid-Cache
- Kerry Cordero: Blocklists of Suspected Malicious IPs and URLs
- Keystone Solutions: blocklists
- Lifars: Sites with blocklist of malicious IPs and URLs
- Opensourcelibs: Blackweb
- OSINT Framework: Domain Name/Domain Blacklists/Blackweb
- Osintbay: Blackweb
- Reddit: Blackweb
- Secrepo: Samples of Security Related Data
- Segu-Info: Análisis de malware y sitios web en tiempo real
- Segu-Info: Dominios/TLD dañinos que pueden ser bloqueados para evitar spam y #phishing
- Soficas: CiberSeguridad - Protección Activa
- Stackoverflow: Blacklist IP database
- Wikipedia: Blacklist_(computing)
- Xploitlab: Projects using WindowsSpyBlocker
- Zeltser: Free Blocklists of Suspected Malicious IPs and URLs
- Zenarmor: How-to-enable-web-filtering-on-OPNsense-proxy?
- This project includes third-party components.
- Changes must be proposed via Issues. Pull Requests are not accepted.
- BlackWeb is designed exclusively for Squid-Cache and due to the large number of blocked domains it is not recommended to use it in other environments (DNSMasq, Pi-Hole, etc.), or add it to the Windows Hosts File, as it could slow down or crash it. Use it at your own risk. For more information check Issue 10
- Blackweb is NOT a blacklist service itself. It does not independently verify domains. Its purpose is to consolidate and reformat public blacklist sources to make them compatible with Squid.
- If your domain appears in Blackweb and you believe this is an error, you should review the public sources SOURCES, to identify where it is listed and contact the maintainer of that list to request its removal. Once the domain is removed from the upstream source, it will automatically disappear from Blackweb in the next update. You can also use the following script to perform the same verification:
wget https://raw.githubusercontent.com/maravento/blackweb/refs/heads/master/bwupdate/tools/checksources.sh chmod +x checksources.sh ./checksources.sh
e.g:
[?] Enter domain to search: kickass.to [*] Searching for 'kickass.to'... [+] Domain found in: https://github.com/fabriziosalmi/blacklists/releases/download/latest/blacklist.txt [+] Domain found in: https://hostsfile.org/Downloads/hosts.txt [+] Domain found in: https://raw.githubusercontent.com/blocklistproject/Lists/master/everything.txt [+] Domain found in: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/domains/ultimate.txt [+] Domain found in: https://raw.githubusercontent.com/Ultimate-Hosts-Blacklist/Ultimate.Hosts.Blacklist/master/hosts/hosts0 [+] Domain found in: https://sysctl.org/cameleon/hosts [+] Domain found in: https://v.firebog.net/hosts/Kowabit.txt Done
We thank all those who have contributed to this project. Those interested can contribute, sending us links of new lists, to be included in this project.
Special thanks to: Jhonatan Sneider
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Due to recent arbitrary changes in computer terminology, it is necessary to clarify the meaning and connotation of the term blacklist, associated with this project:
In computing, a blacklist, denylist or blocklist is a basic access control mechanism that allows through all elements (email addresses, users, passwords, URLs, IP addresses, domain names, file hashes, etc.), except those explicitly mentioned. Those items on the list are denied access. The opposite is a whitelist, which means only items on the list are let through whatever gate is being used. Source Wikipedia
Therefore, blacklist, blocklist, blackweb, blackip, whitelist and similar, are terms that have nothing to do with racial discrimination.