Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit a9a3595

Browse files
Perform news search of topic on google and send via email
1 parent 4bc64f3 commit a9a3595

File tree

4 files changed

+234
-0
lines changed

4 files changed

+234
-0
lines changed

‎GoogleSearchNewsletter/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# GoogleSearchNewsletter
2+
3+
## What this program does
4+
- Performs google search of topic stated in config file
5+
- Go to news tab of google search results
6+
- Extract each news heading and link from first search results page
7+
- Save search results to file
8+
- Read file to send email to address in config file
9+
10+
## How to use
11+
12+
### 1. Setup Python and modules
13+
14+
Python and the following modules must be installed on the computer running this script.
15+
16+
Install Python and pip:
17+
```
18+
sudo apt-get install python
19+
sudo apt-get install pip
20+
```
21+
22+
Install selenium:
23+
```
24+
pip install -r requirements.txt
25+
```
26+
27+
### 2. Download browser and driver
28+
29+
You need to have either Firefox or Chrome installed. You also need the corresponding driver for the browser.
30+
31+
For Firefox download geckodriver:
32+
https://github.com/mozilla/geckodriver/releases
33+
34+
For Chrome download chromedriver:
35+
https://chromedriver.chromium.org/downloads
36+
37+
38+
### 3. Configure config.ini
39+
40+
Example configuration:
41+
```
42+
[your_settings]
43+
driver = geckodriver
44+
search_topic = Nintendo news
45+
email_subject = My newsletter for Nintendo
46+
email_smtp = smtp.live.com
47+
sender_email_address = sendingemail@live.com
48+
email_password = yourpassword
49+
receiver_email_address = receiveremail@live.com
50+
```
51+
52+
Note:
53+
- The "email_smtp" is the mail server of the sender.
54+
- See [this link](https://www.arclab.com/en/kb/email/list-of-smtp-and-pop3-servers-mailserver-list.html) to find a list of smtp servers and insert the correct one for your email address.
55+
56+
57+
### 4. Run program
58+
59+
```
60+
python google-search-newsletter.py
61+
```
62+
63+
You should receive an email in the receiver address containing the news headlines and links.
64+
65+
### 5. Setup running schedule using Crontab Linux utility
66+
67+
Schedule the time and frequency to run this script. See the [Crontab man page](https://linux.die.net/man/5/crontab) for more info.
68+
69+
Open the crontab file for editing. Run on command line:
70+
```
71+
crontab -e
72+
```
73+
74+
Add the following to the end of the crontab file. This example will run the script everyday at 07:05am. Edit according to your needs.
75+
```
76+
# needed if headless=false
77+
DISPLAY=:0
78+
79+
# at 07:05am go to directory of script and run. log output and potential errors to file 'crontab.log'
80+
05 07 * * * cd /pathtoscript/ && python google-search-newsletter.py > crontab.log 2>&1
81+
```
82+
Save the crontab file and you will see:
83+
```
84+
crontab: installing new crontab
85+
```
86+
The script should now run everyday at 07:05am.
87+
88+
89+
### Tips for developers
90+
91+
The browsers run in headless mode which means the browser GUI is not opened while running the program.
92+
If you want the GUI to open while running do the following:
93+
- If using geckodriver change `firefox_options.headless = True` to `firefox_options.headless = False`
94+
- If using chromedriver change `chrome_options.add_argument('--headless')` to `chrome_options.add_argument('--None')`

‎GoogleSearchNewsletter/config.ini

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[your_settings]
2+
driver =
3+
search_topic =
4+
email_subject =
5+
email_smtp =
6+
sender_email_address =
7+
email_password =
8+
receiver_email_address =
9+
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
from selenium import webdriver
2+
from selenium.webdriver.common.keys import Keys
3+
from selenium.webdriver.firefox.options import Options as Options_firefox
4+
from selenium.webdriver.chrome.options import Options as Options_chrome
5+
from email.mime.text import MIMEText
6+
from configparser import ConfigParser
7+
import smtplib
8+
9+
10+
newsletter_file = 'newsletter.txt'
11+
config_file = 'config.ini'
12+
config = ConfigParser()
13+
config.read(config_file)
14+
15+
16+
def scrape_news():
17+
18+
# get user settings
19+
driver = config.get('your_settings', 'driver')
20+
search_topic = config.get('your_settings', 'search_topic')
21+
22+
# set up driver
23+
PATH_TO_DRIVER = "./%s" % driver
24+
25+
if driver == 'geckodriver':
26+
firefox_options = Options_firefox()
27+
28+
# run in headless mode
29+
firefox_options.headless = True
30+
31+
# disable cookies to prevent popups
32+
firefox_pref = webdriver.FirefoxProfile()
33+
firefox_pref.set_preference("network.cookie.cookieBehavior", 2)
34+
35+
browser = webdriver.Firefox(executable_path=PATH_TO_DRIVER, options=firefox_options, firefox_profile=firefox_pref)
36+
37+
elif driver == 'chromedriver':
38+
chrome_options = Options_chrome()
39+
40+
# run in headless mode
41+
chrome_options.add_argument('--headless')
42+
43+
# disable cookies to prevent popups
44+
chrome_options.add_experimental_option('prefs', {'profile.default_content_setting_values.cookies': 2})
45+
46+
browser = webdriver.Chrome(executable_path=PATH_TO_DRIVER, options=chrome_options)
47+
48+
else:
49+
print('ERROR: driver not supported')
50+
51+
print('Getting search results...')
52+
53+
# open URL
54+
browser.get('https://google.com')
55+
56+
# select google search bar
57+
google_search = browser.find_element_by_name('q')
58+
59+
# type news topic to search
60+
google_search.send_keys(search_topic)
61+
google_search.send_keys(Keys.ENTER)
62+
63+
browser.implicitly_wait(5)
64+
65+
browser.find_element_by_css_selector('a[data-sc="N"]').click()
66+
67+
browser.implicitly_wait(5)
68+
69+
# get all elements containing news title
70+
all_headings = browser.find_elements_by_xpath('//div[contains(@role, "heading") and contains(@aria-level, "2")]')
71+
72+
# get all elements containing links for each news title
73+
all_links = browser.find_elements_by_xpath('//g-card/div/div/div[2]/a')
74+
75+
#open file for writing
76+
file = open(newsletter_file, 'w')
77+
78+
# loop over each title and link, print each to the file
79+
for heading, link in zip(all_headings, all_links):
80+
file.write('\n\n')
81+
file.write(heading.text)
82+
file.write('\n')
83+
file.write(link.get_attribute('href'))
84+
85+
browser.close()
86+
print('Done. Search results exported to "newsletter.txt"')
87+
88+
pass
89+
90+
91+
def send_email():
92+
93+
print('Sending email...')
94+
95+
# get user settings
96+
email_subject = config.get('your_settings', 'email_subject')
97+
email_smtp = config.get('your_settings', 'email_smtp')
98+
sender_email_address = config.get('your_settings', 'sender_email_address')
99+
email_password = config.get('your_settings', 'email_password')
100+
receiver_email_address = config.get('your_settings', 'receiver_email_address')
101+
102+
# newsletter file will be sent by email
103+
with open(newsletter_file, 'r') as file:
104+
file_content = file.read()
105+
106+
# configure mail
107+
message = MIMEText(file_content)
108+
message['Subject'] = email_subject
109+
message['From'] = sender_email_address
110+
message['To'] = receiver_email_address
111+
112+
# set smtp server
113+
server = smtplib.SMTP(email_smtp, '587')
114+
server.ehlo()
115+
server.starttls()
116+
117+
# send email
118+
server.login(sender_email_address, email_password)
119+
server.send_message(message)
120+
server.quit()
121+
122+
print("Email sent!")
123+
124+
pass
125+
126+
127+
if __name__ == "__main__":
128+
scrape_news()
129+
send_email()
130+
pass

‎GoogleSearchNewsletter/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
selenium==3.141.0

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /