5
\$\begingroup\$

Overview

I have created a bash script (triggered via GitHub Actions) that does the following:

  1. Parse a list of YouTube channel IDs and nicknames.
  2. Fetch their metadata via YouTube's Channel API.
  3. Build up Markdown tables using this metadata.
  4. Load a Markdown template, and replace a placeholder with the generated Markdown.

Additional functionality implemented is:

  1. Display optional arbitrary emoji next to specific channel names (${ARRAY_LINE[2]}).
  2. Format numbers to be human readable (e.g. 1200 -> 1.2K).
  3. Log channels being processed.

Whilst I've run the code through ShellCheck and made other improvements, I suspect there are weaknesses around:

  • Parsing output.json 5x, fetching a different field each time.
  • Replacing the placeholder text.

Code

The script itself youtube-update.sh:

#!/bin/bash
HEADER_PREFIX="#### "
PLACEHOLDER_TEXT="dynamic-channel-data"
OUTPUT=""
# Convert list of channels into Markdown tables
while read -r LINE; do
 if [[ ${LINE} == ${HEADER_PREFIX}* ]]; then
 echo "Adding header ${LINE}"
 OUTPUT="${OUTPUT}\n${LINE}\n\n"
 OUTPUT="${OUTPUT}| Channel | # Videos | Subscribers | Views |\n| --- | --- | --- | --- |\n"
 else
 IFS=';' read -r -a ARRAY_LINE <<< "${LINE}" # Split line by semi-colon
 echo "Adding channel ${ARRAY_LINE[1]} (${ARRAY_LINE[0]})"
 curl "https://youtube.googleapis.com/youtube/v3/channels?part=statistics,snippet&id=${ARRAY_LINE[0]}&key=${API_KEY}" \
 --header 'Accept: application/json' \
 -fsSL -o output.json
 # Pull channel data out of response if possible
 if [[ $(jq -r '.pageInfo.totalResults' output.json) == 1 ]]; then
 TITLE=$(jq -r '.items[0].snippet.title' output.json)
 URL=$(jq -r '.items[0].snippet.customUrl' output.json)
 VIDEO_COUNT=$(jq -r '.items[0].statistics.videoCount' output.json | numfmt --to=si)
 SUBSCRIBER_COUNT=$(jq -r '.items[0].statistics.subscriberCount' output.json | numfmt --to=si)
 VIEW_COUNT=$(jq -r '.items[0].statistics.viewCount' output.json | numfmt --to=si)
 echo "Added ${TITLE}: ${VIDEO_COUNT} videos (${VIEW_COUNT} views)"
 OUTPUT="${OUTPUT}| ${ARRAY_LINE[2]}[${TITLE}](https://youtube.com/${URL}) | ${VIDEO_COUNT} | ${SUBSCRIBER_COUNT} | ${VIEW_COUNT} |\n"
 else
 echo "Failed! Bad response received: $(<output.json)"
 exit 1
 fi
 fi
done < "${WORKSPACE}/automation/channels.txt"
# Replace placeholder in template with output, updating the README
TEMPLATE_CONTENTS=$(<"${WORKSPACE}/automation/template.md")
echo -e "${TEMPLATE_CONTENTS//${PLACEHOLDER_TEXT}/${OUTPUT}}" > "${WORKSPACE}/README.md"
# Debug
cat "${WORKSPACE}/README.md"

For additional context, this script is triggered via a GitHub actions workflow (metadata-update.yml):

name: Update YouTube stats
on:
 schedule:
 - cron: '0 8 * * *'
 workflow_dispatch:
jobs:
 metadata-update:
 runs-on: ubuntu-latest
 permissions:
 contents: write
 steps:
 - name: Checkout channel config file
 uses: actions/[email protected]
 with: 
 sparse-checkout: |
 automation/*
 README.md
 sparse-checkout-cone-mode: false
 - name: Update YouTube data
 run: |
 chmod +x ./automation/youtube-update.sh
 ./automation/youtube-update.sh
 env:
 API_KEY: ${{ secrets.API_KEY }}
 WORKSPACE: ${{ github.workspace }}
 - name: Save changes
 uses: stefanzweifel/git-auto-commit-action@v4
 with:
 commit_message: Updated YouTube statistics
 commit_author: GitHub Actions <[email protected]>
 file_pattern: 'README.md'

The YouTube API response (truncated to relevant fields) looks like:

{
 "pageInfo": {
 "totalResults": 1
 },
 "items": [
 {
 "snippet": {
 "title": "Google for Developers",
 "customUrl": "@googledevelopers"
 },
 "statistics": {
 "viewCount": "234466180",
 "subscriberCount": "2300000",
 "videoCount": "5807"
 }
 }
 ]
}

Examples

A typical run might convert:

#### Stream Archives
UC2oWuUSd3t3t5O3Vxp4lgAA;2018-streams;🐢
UC4ik7iSQI1DZVqL18t-Tffw;2016-2018streams
UCjyrSUk-1AGjALTcWneRaeA;2016-2017streams

into:

#### Stream Archives
| Channel | # Videos | Subscribers | Views |
| --- | --- | --- | --- |
| 🐢[Jerma Stream Archive](https://youtube.com/@jermastreamarchive) | 770 | 274K | 88M |
| [Ster/Jerma Stream Archive](https://youtube.com/@sterjermastreamarchive) | 972 | 47K | 20M |
| [starkiller201096x](https://youtube.com/@starkiller201096x) | 79 | 2.9K | 1.5M |
Sara J
4,15612 silver badges37 bronze badges
asked Aug 15, 2023 at 22:16
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Nice script!

Read multiple values by name rather than into an array

Instead of:

IFS=';' read -r -a ARRAY_LINE <<< "${LINE}" # Split line by semi-colon

You could get the values directly into variables with descriptive names:

IFS=';' read -r channel_id channel_name emoji <<< "${line}"

Read multiple values from a single jq call

You could read multiple values with a single call by making jq print all the relevant fields, and using read, for example:

{
 read -r title
 read -r url
} < <(jq -r '.items[0].snippet.title, .items[0].snippet.customUrl' < output.json)

To avoid the line becoming too long, I would put the fields into an array like this:

jq_fields=(
 '.items[0].snippet.title'
 '.items[0].snippet.customUrl'
 '.items[0].statistics.videoCount'
 '.items[0].statistics.subscriberCount'
 '.items[0].statistics.viewCount'
)
{
 read -r title
 read -r url
 read -r video_count
 read -r subscriber_count
 read -r view_count
} < <(IFS=','; jq -r "${jq_fields[*]}" < output.json)

Accumulate lines in an array

The way you accumulated the lines in the string value OUTPUT is ok.

I prefer to use arrays in situations like this, it would look something like this:

output=()
# ...
output+=("${line}")
output+=("")
output+=("| Channel | # Videos | Subscribers | Views |\n| --- | --- | --- | --- |")
# ...
output+=("| ${emoji}[${title}](https://youtube.com/${url}) | ${video_count} | ${subscriber_count} | ${view_count} |")
# ...
(
 IFS=$'\n'
 echo "${template_content//${placeholder_text}/${output[*]}}"
) >"${WORKSPACE}/README.md"

Do not use ALL_CAPS names for your variables

To avoid conflict and confusion with system environment variables, it's recommended to not use ALL_CAPS names in script variables.

By making the script's own variables lowercase, it becomes clear what is expected to be present in the environment and what belongs to the script, which improves readability.

Put repeatedly used constant values into variables

output.json is referenced in multiple places. To leave open the option to use a different name or at a different path, I would put this into a variable. This way code editors could also help you avoid typos.

Define important constants in variables early in the file

The paths "${WORKSPACE}/automation/channels.txt" and "${WORKSPACE}/README.md" are very important key pieces in the behavior of the script. To make them easy to see (and adjust), I would put these values in variables, defined near the top of the file, right alongside header_prefix and placeholder_text.

answered Sep 7, 2023 at 19:44
\$\endgroup\$
1
  • \$\begingroup\$ This is absolutely excellent feedback, thank you! Really appreciate the effort, will work on making the suggested changes, having the explanation for each point helps a lot too. \$\endgroup\$ Commented Sep 8, 2023 at 21:57

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.