0

The task is simple: to extract the link of an audio of pronunciation for a word from a Yahoo Dictionary Webpage: e.g. Yahoo's Dictionary@ "real"

Using "Chropath", I can locate the Xpath of the element that contains the ".mp3" src link. The Xpath is

//div[@class='compText ml-10 d-ib']//span[contains(@class,'d-ib dict-sound va-mid audio-0')]

However, when I try to use the below coding, it seems that the find_element_by_Xpath method returns nothing. (Remarks: note the "SoundURL " part)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import os
# instantiate a chrome options object so you can set the size and headless preference
options = Options()
options.add_argument("--headless")
word = "real"
print("start driver...", end='')
driver = webdriver.Chrome(options=options, executable_path="F:\Python_Module\chromedriver.exe")
driver.get('https://hk.dictionary.search.yahoo.com/search?p='+ word)
Pronunciation = driver.find_element_by_class_name(" fz-14").text
Meaning = driver.find_element_by_xpath("//div[@class='compList mb-25 ml-25 p-rel']//ul").get_attribute('innerHTML')
SoundURL = driver.find_element_by_xpath("//div[@class='compText ml-10 d-ib']//span[contains(@class,'d-ib dict-sound va-mid audio-0')]").get_attribute('innerHTML')
print("Print Function started")
print("begin pronunciation")
print(Pronunciation)
print("begin pronunciation")
print("begin Meaning")
print(Meaning)
print("end Meaning")
print("begin sound")
print(SoundURL)
print("end sound")

As shown in the screencap, I would like to extract the following element:

<audio src="https://s.yimg.com/bg/dict/ox/mp3/v1/real@_us_2.mp3" xpath="1"></audio>

Source HTML viewed in Chrome

asked Feb 5, 2019 at 15:21

1 Answer 1

1

The problem is that for 1-2 seconds, the span element is present in the DOM, but the audio child element hasn't been injected yet.

You can verify this by adding a time.sleep(3) before grabbing your soundURL var.

How you want to solve this problem in your script depends on your requirements. There's basically 3 sets of options:

  1. time.sleep() - simple but inefficient
  2. selenium implicit wait
  3. selenium explicit wait - more complicated to setup but efficient

If you want to learn about Selenium waits, refer here: link

With a wait strategy, you'll probably want to find the audio element itself rather than getting it thru the containing span element. Here's an example along those lines (using implicit wait):

driver.implicitly_wait(3)
sound_url = driver.find_element_by_tag_name('audio').get_attribute('src')
# sound_url now contains 'https://s.yimg.com/bg/dict/ox/mp3/v1/real@_us_2.mp3'
answered Feb 5, 2019 at 20:24
5
  • Thanks. I try your method, but it seems the element I can get using Xpath of the Span Class after delaying (10 seconds) is the following :<selenium.webdriver.remote.webelement.WebElement (session="0ceb1228ebde4600e44bb05202a7f2f0", element="0.2536322356105647-3")> Commented Feb 6, 2019 at 1:45
  • @KCT Is that not the element you want? I can't tell which element that is without looking at its attributes. Commented Feb 6, 2019 at 1:51
  • No. That's not the element I want. The element I want is the audio src element containing the mp3 URL, as encircled in my screencap. Even though I have used delay, the element still is not containing any mp3 link as it should have been. Commented Feb 6, 2019 at 12:39
  • Okay. I try again and used your code (using tag name) and got the URL. However, in the same page, there are two tag names that too goes by the name 'audio', meaning the tag name 'audio' is not unique, so how do I get the second URL? Thanks again. Commented Feb 6, 2019 at 14:21
  • @KCT you can use a CSS Selector or Xpath to find the 2nd one specifically, or you can use the same approach as above but change element to elements so it returns a list of all matching elements, e.g. audio_elems = driver.find_elements_by_tag_name('audio'), then to get the 2nd URL, do print(audio_elems[1].get_attribute('src')) Commented Feb 6, 2019 at 22:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.