I'm looking to pull all the youtube links from a string of text and was wondering how does this look?
if (preg_match_all('/(https?:\/\/)?(www\.)?(youtube\.com)\/watch\?v=([a-zA-Z0-9_-]+)[^\s]*/im', $this->content, $matches)) {
}
Obviously this doesn't take into account the youtu.be
links as they are formatted differently.
-
1\$\begingroup\$ There is also youtube.com/v/... This is an embed link. \$\endgroup\$Devon– Devon2015年04月08日 14:58:25 +00:00Commented Apr 8, 2015 at 14:58
2 Answers 2
[^\s]
is effectively just\S
For checking against
youtu.be
links, you can have:youtu(\.be|be\.com)
- If you do not want to store every matched group, use
(?:...)
[A-Za-z0-9_]
can be simplified to\w
.- Use a different delimiter than
/
so that you won't have to escape it in pattern.
Therefore:
preg_match_all(
'@(https?://)?(?:www\.)?(youtu(?:\.be/([-\w]+)|be\.com/watch\?v=([-\w]+)))\S*@im',
$this->content,
$matches
)
-
1\$\begingroup\$ Some good tips, thanks! As for
youtu.be
I originally had that in there but removed it as it has a different format and doesn't have thewatch?v=
, they are formatted asyoutu.be/diJ8dooR
for example. Rather than complicating the pattern I will probably put this in another preg_match_all. \$\endgroup\$Brett– Brett2015年04月08日 06:54:30 +00:00Commented Apr 8, 2015 at 6:54
A YouTube /watch
URL can have query string parameters besides the video ID v
. For example, hd=1
requests a high-definition stream. There can be a t=4m33s
parameter to make the video player start at the indicated point in the video. You can even use wadsworth=1
to apply the Wadsworth Constant.
Technically, query string parameters can occur in any order, and the resulting URLs are generally considered semantically identical. Therefore, you should handle the possibility that some other query parameters might come between /watch?
and v=
.
-
\$\begingroup\$ Yeah I'm aware of that and was going to account for it, but in every link I have seen I haven't encountered one where the
v
doesn't come first. I guess if you absolutely have to make sure to find the links then you should account for this, but in my case I'm not too worried about it. \$\endgroup\$Brett– Brett2015年04月08日 06:58:04 +00:00Commented Apr 8, 2015 at 6:58