Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit b0249e4

Browse files
Merge pull request #61 from DevMahmoud10/main
add: links extractor automation script
2 parents 5725072 + cc6964b commit b0249e4

File tree

5 files changed

+90
-0
lines changed

5 files changed

+90
-0
lines changed

‎links_extractor/README.md‎

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Links Extractor
2+
3+
## Objective
4+
This script automate extracting URLs from any ```.txt``` file content based on regex expression then exporting the extracted urls in ```.txt``` output file separated by line separator.
5+
## Sample
6+
- Sample input available in ```sample/sample_text_file.txt```
7+
- Sample output available in ```sample/sample_text_file_links.txt```
8+
## Requirements
9+
```pip install requirements.txt```
10+
## How to run the script?
11+
```
12+
python links_extractor.py file_name.txt
13+
```

‎links_extractor/links_extractor.py‎

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import re
2+
import sys
3+
4+
5+
def get_urls(file_path):
6+
"""[start method to fire extracting urls process]
7+
8+
Arguments:
9+
file_path {[str]} -- [target text file path]
10+
"""
11+
text = read_text_file(file_path)
12+
urls = extract_urls(text)
13+
export_urls(urls, file_path)
14+
15+
16+
def read_text_file(file_path):
17+
"""[summary]
18+
19+
Arguments:
20+
file_path {[str]} -- [target text file path]
21+
22+
Returns:
23+
[str] -- [file content to works on]
24+
"""
25+
with open(file_path) as f:
26+
text = f.read()
27+
return text
28+
29+
30+
def extract_urls(text):
31+
"""[summary]
32+
33+
Arguments:
34+
text {[str]} -- [file content to works on]
35+
36+
Returns:
37+
[list] -- [extracted urls]
38+
"""
39+
url_regex_pattern = r"(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+"
40+
urls = re.findall(url_regex_pattern, text)
41+
return urls
42+
43+
44+
def export_urls(urls, file_path):
45+
"""[summary]
46+
47+
Arguments:
48+
urls {[list]} -- [extracted urls]
49+
file_path {[str]} -- [result text file path]
50+
"""
51+
with open(file_path.replace(".txt", "_links.txt"), "w") as f:
52+
f.write("\n".join(urls))
53+
54+
55+
if __name__ == "__main__":
56+
file_path = sys.argv[1]
57+
get_urls(file_path)

‎links_extractor/requirements.txt‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
regex==2020年9月27日
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
New album 'Heart To Mouth" is out now: https://lp.lnk.to/HeartToMouthID
2+
3+
4+
Lost On You: http://smarturl.it/LostOnYouAlbum
5+
6+
----------------------------------
7+
8+
Website: http://iamlp.com
9+
Facebook: http://facebook.com/iamLP
10+
Twitter: http://twitter.com/iamlp
11+
Soundcloud: https://soundcloud.com/iamlpmusic
12+
Suggested by WMG
13+
LP - Muddy Waters [Live Session]
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
https://lp.lnk.to/HeartToMouthID
2+
http://smarturl.it/LostOnYouAlbum
3+
http://iamlp.com
4+
http://facebook.com/iamLP
5+
http://twitter.com/iamlp
6+
https://soundcloud.com/iamlpmusic

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /