Script to loop through a list of YouTube channels, and output metadata to a Markdown file

Question 1

Overview

I have created a bash script (triggered via GitHub Actions) that does the following:

Parse a list of YouTube channel IDs and nicknames.
Fetch their metadata via YouTube's Channel API.
Build up Markdown tables using this metadata.
Load a Markdown template, and replace a placeholder with the generated Markdown.

Additional functionality implemented is:

Display optional arbitrary emoji next to specific channel names (${ARRAY_LINE[2]}).
Format numbers to be human readable (e.g. 1200 -> 1.2K).
Log channels being processed.

Whilst I've run the code through ShellCheck and made other improvements, I suspect there are weaknesses around:

Parsing output.json 5x, fetching a different field each time.
Replacing the placeholder text.

Code

The script itself youtube-update.sh:

#!/bin/bash
HEADER_PREFIX="#### "
PLACEHOLDER_TEXT="dynamic-channel-data"
OUTPUT=""
# Convert list of channels into Markdown tables
while read -r LINE; do
 if [[ ${LINE} == ${HEADER_PREFIX}* ]]; then
 echo "Adding header ${LINE}"
 OUTPUT="${OUTPUT}\n${LINE}\n\n"
 OUTPUT="${OUTPUT}| Channel | # Videos | Subscribers | Views |\n| --- | --- | --- | --- |\n"
 else
 IFS=';' read -r -a ARRAY_LINE <<< "${LINE}" # Split line by semi-colon
 echo "Adding channel ${ARRAY_LINE[1]} (${ARRAY_LINE[0]})"
 curl "https://youtube.googleapis.com/youtube/v3/channels?part=statistics,snippet&id=${ARRAY_LINE[0]}&key=${API_KEY}" \
 --header 'Accept: application/json' \
 -fsSL -o output.json
 # Pull channel data out of response if possible
 if [[ $(jq -r '.pageInfo.totalResults' output.json) == 1 ]]; then
 TITLE=$(jq -r '.items[0].snippet.title' output.json)
 URL=$(jq -r '.items[0].snippet.customUrl' output.json)
 VIDEO_COUNT=$(jq -r '.items[0].statistics.videoCount' output.json | numfmt --to=si)
 SUBSCRIBER_COUNT=$(jq -r '.items[0].statistics.subscriberCount' output.json | numfmt --to=si)
 VIEW_COUNT=$(jq -r '.items[0].statistics.viewCount' output.json | numfmt --to=si)
 echo "Added ${TITLE}: ${VIDEO_COUNT} videos (${VIEW_COUNT} views)"
 OUTPUT="${OUTPUT}| ${ARRAY_LINE[2]}[${TITLE}](https://youtube.com/${URL}) | ${VIDEO_COUNT} | ${SUBSCRIBER_COUNT} | ${VIEW_COUNT} |\n"
 else
 echo "Failed! Bad response received: $(<output.json)"
 exit 1
 fi
 fi
done < "${WORKSPACE}/automation/channels.txt"
# Replace placeholder in template with output, updating the README
TEMPLATE_CONTENTS=$(<"${WORKSPACE}/automation/template.md")
echo -e "${TEMPLATE_CONTENTS//${PLACEHOLDER_TEXT}/${OUTPUT}}" > "${WORKSPACE}/README.md"
# Debug
cat "${WORKSPACE}/README.md"

For additional context, this script is triggered via a GitHub actions workflow (metadata-update.yml):

name: Update YouTube stats
on:
 schedule:
 - cron: '0 8 * * *'
 workflow_dispatch:
jobs:
 metadata-update:
 runs-on: ubuntu-latest
 permissions:
 contents: write
 steps:
 - name: Checkout channel config file
 uses: actions/[email protected]
 with: 
 sparse-checkout: |
 automation/*
 README.md
 sparse-checkout-cone-mode: false
 - name: Update YouTube data
 run: |
 chmod +x ./automation/youtube-update.sh
 ./automation/youtube-update.sh
 env:
 API_KEY: ${{ secrets.API_KEY }}
 WORKSPACE: ${{ github.workspace }}
 - name: Save changes
 uses: stefanzweifel/git-auto-commit-action@v4
 with:
 commit_message: Updated YouTube statistics
 commit_author: GitHub Actions <[email protected]>
 file_pattern: 'README.md'

The YouTube API response (truncated to relevant fields) looks like:

{
 "pageInfo": {
 "totalResults": 1
 },
 "items": [
 {
 "snippet": {
 "title": "Google for Developers",
 "customUrl": "@googledevelopers"
 },
 "statistics": {
 "viewCount": "234466180",
 "subscriberCount": "2300000",
 "videoCount": "5807"
 }
 }
 ]
}

Examples

A typical run might convert:

#### Stream Archives
UC2oWuUSd3t3t5O3Vxp4lgAA;2018-streams;🐶
UC4ik7iSQI1DZVqL18t-Tffw;2016-2018streams
UCjyrSUk-1AGjALTcWneRaeA;2016-2017streams

into:

#### Stream Archives
| Channel | # Videos | Subscribers | Views |
| --- | --- | --- | --- |
| 🐶[Jerma Stream Archive](https://youtube.com/@jermastreamarchive) | 770 | 274K | 88M |
| [Ster/Jerma Stream Archive](https://youtube.com/@sterjermastreamarchive) | 972 | 47K | 20M |
| [starkiller201096x](https://youtube.com/@starkiller201096x) | 79 | 2.9K | 1.5M |

Question 2

Nice script!

Read multiple values by name rather than into an array

Instead of:

IFS=';' read -r -a ARRAY_LINE <<< "${LINE}" # Split line by semi-colon

You could get the values directly into variables with descriptive names:

IFS=';' read -r channel_id channel_name emoji <<< "${line}"

Read multiple values from a single `jq` call

You could read multiple values with a single call by making jq print all the relevant fields, and using read, for example:

{
 read -r title
 read -r url
} < <(jq -r '.items[0].snippet.title, .items[0].snippet.customUrl' < output.json)

To avoid the line becoming too long, I would put the fields into an array like this:

jq_fields=(
 '.items[0].snippet.title'
 '.items[0].snippet.customUrl'
 '.items[0].statistics.videoCount'
 '.items[0].statistics.subscriberCount'
 '.items[0].statistics.viewCount'
)
{
 read -r title
 read -r url
 read -r video_count
 read -r subscriber_count
 read -r view_count
} < <(IFS=','; jq -r "${jq_fields[*]}" < output.json)

Accumulate lines in an array

The way you accumulated the lines in the string value OUTPUT is ok.

I prefer to use arrays in situations like this, it would look something like this:

output=()
# ...
output+=("${line}")
output+=("")
output+=("| Channel | # Videos | Subscribers | Views |\n| --- | --- | --- | --- |")
# ...
output+=("| ${emoji}[${title}](https://youtube.com/${url}) | ${video_count} | ${subscriber_count} | ${view_count} |")
# ...
(
 IFS=$'\n'
 echo "${template_content//${placeholder_text}/${output[*]}}"
) >"${WORKSPACE}/README.md"

Do not use ALL_CAPS names for your variables

To avoid conflict and confusion with system environment variables, it's recommended to not use ALL_CAPS names in script variables.

By making the script's own variables lowercase, it becomes clear what is expected to be present in the environment and what belongs to the script, which improves readability.

Put repeatedly used constant values into variables

output.json is referenced in multiple places. To leave open the option to use a different name or at a different path, I would put this into a variable. This way code editors could also help you avoid typos.

Define important constants in variables early in the file

The paths "${WORKSPACE}/automation/channels.txt" and "${WORKSPACE}/README.md" are very important key pieces in the behavior of the script. To make them easy to see (and adjust), I would put these values in variables, defined near the top of the file, right alongside header_prefix and placeholder_text.

Question 3

This is absolutely excellent feedback, thank you! Really appreciate the effort, will work on making the suggested changes, having the explanation for each point helps a lot too.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2023-09-07 19:44:02Z

Nice script!

Read multiple values by name rather than into an array

Instead of:

IFS=';' read -r -a ARRAY_LINE <<< "${LINE}" # Split line by semi-colon

You could get the values directly into variables with descriptive names:

IFS=';' read -r channel_id channel_name emoji <<< "${line}"

Read multiple values from a single `jq` call

You could read multiple values with a single call by making jq print all the relevant fields, and using read, for example:

{
 read -r title
 read -r url
} < <(jq -r '.items[0].snippet.title, .items[0].snippet.customUrl' < output.json)

To avoid the line becoming too long, I would put the fields into an array like this:

jq_fields=(
 '.items[0].snippet.title'
 '.items[0].snippet.customUrl'
 '.items[0].statistics.videoCount'
 '.items[0].statistics.subscriberCount'
 '.items[0].statistics.viewCount'
)
{
 read -r title
 read -r url
 read -r video_count
 read -r subscriber_count
 read -r view_count
} < <(IFS=','; jq -r "${jq_fields[*]}" < output.json)

Accumulate lines in an array

The way you accumulated the lines in the string value OUTPUT is ok.

I prefer to use arrays in situations like this, it would look something like this:

output=()
# ...
output+=("${line}")
output+=("")
output+=("| Channel | # Videos | Subscribers | Views |\n| --- | --- | --- | --- |")
# ...
output+=("| ${emoji}[${title}](https://youtube.com/${url}) | ${video_count} | ${subscriber_count} | ${view_count} |")
# ...
(
 IFS=$'\n'
 echo "${template_content//${placeholder_text}/${output[*]}}"
) >"${WORKSPACE}/README.md"

Do not use ALL_CAPS names for your variables

To avoid conflict and confusion with system environment variables, it's recommended to not use ALL_CAPS names in script variables.

By making the script's own variables lowercase, it becomes clear what is expected to be present in the environment and what belongs to the script, which improves readability.

Put repeatedly used constant values into variables

output.json is referenced in multiple places. To leave open the option to use a different name or at a different path, I would put this into a variable. This way code editors could also help you avoid typos.

Define important constants in variables early in the file

The paths "${WORKSPACE}/automation/channels.txt" and "${WORKSPACE}/README.md" are very important key pieces in the behavior of the script. To make them easy to see (and adjust), I would put these values in variables, defined near the top of the file, right alongside header_prefix and placeholder_text.

This is absolutely excellent feedback, thank you! Really appreciate the effort, will work on making the suggested changes, having the explanation for each point helps a lot too.

Stack Exchange Network

Script to loop through a list of YouTube channels, and output metadata to a Markdown file

Overview

Code

Examples

1 Answer 1

Read multiple values by name rather than into an array

Read multiple values from a single `jq` call

Accumulate lines in an array

Do not use ALL_CAPS names for your variables

Put repeatedly used constant values into variables

Define important constants in variables early in the file

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Script to loop through a list of YouTube channels, and output metadata to a Markdown file

Overview

Code

Examples

1 Answer 1

Read multiple values by name rather than into an array

Read multiple values from a single jq call

Accumulate lines in an array

Do not use ALL_CAPS names for your variables

Put repeatedly used constant values into variables

Define important constants in variables early in the file

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Read multiple values from a single `jq` call