1

I have an array with data and I notice that I have each data twice. Is there any method to remove the duplicate data to simplify the array content? Below is the code that I made in python:

import requests
import re
import bs4
r = requests.get("http://as.com/tag/moto_gp/a/")
r.raise_for_status()
html = r.text
matches = re.findall(r"http://motor\.as\.com/motor/\d+/\d+/\d+/motociclismo/\d+_\d+.html", html)
print (matches)
asked May 22, 2016 at 10:32

1 Answer 1

7

I hope your matches is a list.Then you can use simple method.

In [1]: a = [1,1,2,2,3,3,4,4,5]
In [2]: list(set(a))
Out[2]: [1, 2, 3, 4, 5]

For your code only one modification.

matches = list(set(re.findall(r"http://motor\.as\.com/motor/\d+/\d+/\d+/motociclismo/\d+_\d+.html", html)))
answered May 22, 2016 at 10:37

1 Comment

@SergeiLebedev You are right. It will work with all iterable.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.