How to get src link using Selenium?

Question 1

The task is simple: to extract the link of an audio of pronunciation for a word from a Yahoo Dictionary Webpage: e.g. Yahoo's Dictionary@ "real"

Using "Chropath", I can locate the Xpath of the element that contains the ".mp3" src link. The Xpath is

//div[@class='compText ml-10 d-ib']//span[contains(@class,'d-ib dict-sound va-mid audio-0')]

However, when I try to use the below coding, it seems that the find_element_by_Xpath method returns nothing. (Remarks: note the "SoundURL " part)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import os
# instantiate a chrome options object so you can set the size and headless preference
options = Options()
options.add_argument("--headless")
word = "real"
print("start driver...", end='')
driver = webdriver.Chrome(options=options, executable_path="F:\Python_Module\chromedriver.exe")
driver.get('https://hk.dictionary.search.yahoo.com/search?p='+ word)
Pronunciation = driver.find_element_by_class_name(" fz-14").text
Meaning = driver.find_element_by_xpath("//div[@class='compList mb-25 ml-25 p-rel']//ul").get_attribute('innerHTML')
SoundURL = driver.find_element_by_xpath("//div[@class='compText ml-10 d-ib']//span[contains(@class,'d-ib dict-sound va-mid audio-0')]").get_attribute('innerHTML')
print("Print Function started")
print("begin pronunciation")
print(Pronunciation)
print("begin pronunciation")
print("begin Meaning")
print(Meaning)
print("end Meaning")
print("begin sound")
print(SoundURL)
print("end sound")

As shown in the screencap, I would like to extract the following element:

<audio src="https://s.yimg.com/bg/dict/ox/mp3/v1/real@_us_2.mp3" xpath="1"></audio>

Source HTML viewed in Chrome

Question 2

The problem is that for 1-2 seconds, the span element is present in the DOM, but the audio child element hasn't been injected yet.

You can verify this by adding a time.sleep(3) before grabbing your soundURL var.

How you want to solve this problem in your script depends on your requirements. There's basically 3 sets of options:

time.sleep() - simple but inefficient
selenium implicit wait
selenium explicit wait - more complicated to setup but efficient

If you want to learn about Selenium waits, refer here: link

With a wait strategy, you'll probably want to find the audio element itself rather than getting it thru the containing span element. Here's an example along those lines (using implicit wait):

driver.implicitly_wait(3)
sound_url = driver.find_element_by_tag_name('audio').get_attribute('src')
# sound_url now contains 'https://s.yimg.com/bg/dict/ox/mp3/v1/real@_us_2.mp3'

Question 3

Thanks. I try your method, but it seems the element I can get using Xpath of the Span Class after delaying (10 seconds) is the following :<selenium.webdriver.remote.webelement.WebElement (session="0ceb1228ebde4600e44bb05202a7f2f0", element="0.2536322356105647-3")>

Question 4

@KCT Is that not the element you want? I can't tell which element that is without looking at its attributes.

Question 5

No. That's not the element I want. The element I want is the audio src element containing the mp3 URL, as encircled in my screencap. Even though I have used delay, the element still is not containing any mp3 link as it should have been.

Question 6

Okay. I try again and used your code (using tag name) and got the URL. However, in the same page, there are two tag names that too goes by the name 'audio', meaning the tag name 'audio' is not unique, so how do I get the second URL? Thanks again.

Question 7

@KCT you can use a CSS Selector or Xpath to find the 2nd one specifically, or you can use the same approach as above but change element to elements so it returns a list of all matching elements, e.g. audio_elems = driver.find_elements_by_tag_name('audio'), then to get the 2nd URL, do print(audio_elems[1].get_attribute('src'))

Mike B Mike B 1556 bronze badges · Answer 1 · 2019-02-05 20:24:34Z

1

The problem is that for 1-2 seconds, the span element is present in the DOM, but the audio child element hasn't been injected yet.

You can verify this by adding a time.sleep(3) before grabbing your soundURL var.

How you want to solve this problem in your script depends on your requirements. There's basically 3 sets of options:

time.sleep() - simple but inefficient
selenium implicit wait
selenium explicit wait - more complicated to setup but efficient

If you want to learn about Selenium waits, refer here: link

With a wait strategy, you'll probably want to find the audio element itself rather than getting it thru the containing span element. Here's an example along those lines (using implicit wait):

driver.implicitly_wait(3)
sound_url = driver.find_element_by_tag_name('audio').get_attribute('src')
# sound_url now contains 'https://s.yimg.com/bg/dict/ox/mp3/v1/real@_us_2.mp3'

Share

Improve this answer

edited Feb 6, 2019 at 1:54

answered Feb 5, 2019 at 20:24

Mike B's user avatar

Mike B Mike B

1556 bronze badges

5

Thanks. I try your method, but it seems the element I can get using Xpath of the Span Class after delaying (10 seconds) is the following :<selenium.webdriver.remote.webelement.WebElement (session="0ceb1228ebde4600e44bb05202a7f2f0", element="0.2536322356105647-3")>

KCT
– KCT

2019年02月06日 01:45:29 +00:00
Commented Feb 6, 2019 at 1:45
@KCT Is that not the element you want? I can't tell which element that is without looking at its attributes.

Mike B
– Mike B

2019年02月06日 01:51:17 +00:00
Commented Feb 6, 2019 at 1:51
No. That's not the element I want. The element I want is the audio src element containing the mp3 URL, as encircled in my screencap. Even though I have used delay, the element still is not containing any mp3 link as it should have been.

KCT
– KCT

2019年02月06日 12:39:21 +00:00
Commented Feb 6, 2019 at 12:39
Okay. I try again and used your code (using tag name) and got the URL. However, in the same page, there are two tag names that too goes by the name 'audio', meaning the tag name 'audio' is not unique, so how do I get the second URL? Thanks again.

KCT
– KCT

2019年02月06日 14:21:20 +00:00
Commented Feb 6, 2019 at 14:21
@KCT you can use a CSS Selector or Xpath to find the 2nd one specifically, or you can use the same approach as above but change element to elements so it returns a list of all matching elements, e.g. audio_elems = driver.find_elements_by_tag_name('audio'), then to get the 2nd URL, do print(audio_elems[1].get_attribute('src'))

Mike B
– Mike B

2019年02月06日 22:01:35 +00:00
Commented Feb 6, 2019 at 22:01

Add a comment |

Stack Exchange Network

How to get src link using Selenium?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to get src link using Selenium?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions