How i determine what movie to watch next

I have created a script that scrapes my IMDB Watchlist and figures out which movies are available on the streaming service that I currently have access to.

tl;dr

See the result here, if you don't want a technical break down: https://www.bjornnyborg.com/movies

We all know the struggle of figuring out, what we should watch next. When we browse our streaming services, nothing seems interesting. We then go to the internet and look for interesting movies. When we finally find something that looks interesting, that movie is not available on any of our streaming service... Oh the struggle (first world problem, ikr).

Luckily - i am a programmer, and i think i can solve this first world problem with technology.

Let's break down the problem:

  1. We can't decide what to watch
  2. When we wan't to watch something specific, we don't know if it's available on any of our streaming subscriptions

Useful services

First a couple of recommendations that can help you decide what to watch:

IMDB: For deciding what to watch

I have used IMDB for many years, to keep track of new movies i wan't to watch, and also scoring movies i already watched. So IMDB actually already contains a list of movies that i some point in time added to my watchlist. Also it contains a score on all the movies in the watchlist, that i can prioritize the movies by.

JustWatch: For checking what streaming service a movie is available on

Another awesome service is JustWatch - here you can look up any movie, and it can tell you on what streaming services it's available.

The idea

So, if i somehow could download my IMDB watchlist, look up all the movies on JustWatch, filter them by streaming services i currently subscribe to, and then sort them by IMDB score, and display them on my website, then i would actually have a pretty decent suggestion on what to watch next!

The build

Step 1: Download the IMDB watchlist

I decided to use Puppeteer to login to my IMDB account, and download my Watchlist, since i couldn't find an official API for completing this task.

The script does the following:

  1. Starts a Chromium browser
  2. Logs into my IMDB account
  3. exports my IMDB watchlist and IMDB reviews
  4. Downloads the csv-exports
  5. Converts the CSV files to JSON data

The Puppeteer script looks like this:

import csvtojson from "csvtojson";
import { Page } from "puppeteer";
import path from "path";
import fs from "fs";

const delay = (ms: number) => {
  return new Promise((resolve) => setTimeout(resolve, ms));
};

const downloadLatestExport = async (page: Page) => {
  await page.waitForSelector("button[aria-label='Export']");
  await page.click("button[aria-label='Export']");
  console.log("found export button");

  await page.waitForSelector("a[aria-label='Open exports page']");
  console.log("Go to export page");

  await delay(2000);
  await page.goto("https://www.imdb.com/exports");

  while (await page.$(".PROCESSING")) {
    console.log("waiting for processing");
    await delay(2000);
    await page.reload();
  }

  console.log("processing done, downloading export");

  const downloadPath = `${process.cwd()}\\public\\tmp`;

  await (page as any)._client().send("Page.setDownloadBehavior", {
    behavior: "allow",
    downloadPath,
  });

  await page.waitForSelector(".READY");
  await page.click(".READY");
  await delay(1000);

  const files = fs.readdirSync(downloadPath);
  const latestFile = files[files.length - 1];
  const content = fs.readFileSync(path.join(downloadPath, latestFile), "utf8");

  fs.readdirSync(downloadPath).forEach((file) => {
    fs.unlinkSync(path.join(downloadPath, file));
  });

  console.log("Export file downloaded");

  return content;
};

export const getImdbWatchlist = async (runReviews: boolean) => {
  if (!process.env.IMDB_USER) throw new Error("IMDB_USER not set");
  if (!process.env.IMDB_PASSWORD) throw new Error("IMDB_PASSWORD not set");

  let puppeteer;
  let options = {};

  // Default Chromium is too large for AWS Lambda, so we need to download a smaller version
  if (process.env.AWS_LAMBDA_FUNCTION_VERSION) {
    const chrome = require("@sparticuz/chromium-min");

    puppeteer = require("puppeteer-core");

    console.log("use core");

    chrome.setGraphicsMode = false;

    options = {
      args: chrome.args,
      defaultViewport: chrome.defaultViewport,
      executablePath: await chrome.executablePath(
        "https://github.com/Sparticuz/chromium/releases/download/v116.0.0/chromium-v116.0.0-pack.tar"
      ),
      headless: chrome.headless,
    };
  } else {
    puppeteer = require("puppeteer");
    options = {
      headless: false,
    };
  }

  let browser = await puppeteer.launch(options);

  const page: Page = await browser.newPage();
  await page.setViewport({
    width: 1600,
    height: 1200,
    deviceScaleFactor: 1,
  });

  console.log("browser launched");

  await page.goto("https://www.imdb.com/registration/signin");
  console.log("navigated to imdb");

  await page.waitForSelector("#signin-options");
  console.log("signin options found");

  await page.click("#signin-options .list-group > a");

  console.log("clicked sign in with imdb");

  await page.waitForSelector("#ap_email");

  console.log("typing ", process.env.IMDB_USER);

  await page.type("#ap_email", process.env.IMDB_USER);

  console.log("typed email");

  await page.waitForSelector("#ap_password");
  await page.type("#ap_password", process.env.IMDB_PASSWORD);

  console.log("typed password");

  await page.waitForSelector("#signInSubmit");
  await page.click("#signInSubmit");

  console.log("logged in");

  await page.waitForSelector("#imdbHeader");

  // WATCHLIST
  await page.goto("https://www.imdb.com/user/ur63279589/watchlist?view=detail");
  const watchList = await downloadLatestExport(page);
  console.log("downloaded watchlist");

  // REVIEWS
  let reviewList = "";
  if (runReviews) {
    await page.goto("https://www.imdb.com/user/ur63279589/ratings");
    reviewList = await downloadLatestExport(page);
    console.log("downloaded reviews");
  }

  // END
  await browser.close();

  if (!watchList) throw new Error("Could not find watchList");
  if (!reviewList && runReviews) throw new Error("Could not find reviewList");

  return {
    reviews: runReviews
      ? await csvtojson()
          .fromString(reviewList)
          .then((jsonObj: any) => jsonObj)
      : null,
    movies: await csvtojson()
      .fromString(watchList)
      .then((jsonObj: any) => jsonObj),
  };
};

Step 2: Look up the movies on JustWatch

When we have downloaded and converted the IMDB data, we need to look up all the movies on JustWatch, to fetch data about availability in our country.

I do this in 2 steps, first i make a simple graphql search request request to https://apis.justwatch.com/graphql, searching for the movie title, and recieving the JustWatch ID.

The getJustWatchSearch request looks like this:

import { TJustWatchSearchResult } from "@/types/JustWatch";

export const getJustWatchSearchResult = async (title: string) =>
  await fetch("https://apis.justwatch.com/graphql", {
    method: "POST",
    body: JSON.stringify({
      operationName: "GetSuggestedTitles",
      variables: {
        country: "DK",
        language: "en",
        first: 10,
        filter: { searchQuery: title },
      },
      query: `query GetSuggestedTitles($country: Country!, $language: Language!, $first: Int!, $filter: TitleFilter) {
                popularTitles(country: $country, first: $first, filter: $filter) {
                    edges {
                        node {
                            id
                            objectType
                            objectId
                            content(country: $country, language: $language) {
                                title
                                originalReleaseYear
                                posterUrl
                                fullPath
                            }
                        }
                    }
                }
            }
        `,
    }),
    headers: {
      Accept: "application/json, text/plain, */*",
      "Content-Type": "application/json",
    },
  })
    .then((r) => r.json())
    .then((r) => r.data as TJustWatchSearchResult);

When we get a search result, i compare the IMDB and Justwatch movie release year and title, to make sure it's in fact the correct movie.

When we have recieved the JustWatch ID, we can get the "WatchingOptions", we recieve this by making another GraphQL request, but now with the ID, and requesting all kinds of relevant data, when determining where this movie should be watched.

The getWatchingOptions request looks like this:

import { TJustWatchWatchingOptionsResponse } from "@/types/JustWatch";

export const getWathingOptions = async (justWatchId: string) =>
  await fetch("https://apis.justwatch.com/graphql", {
    method: "POST",
    body: JSON.stringify({
      operationName: "GetTitleOffers",
      variables: {
        platform: "WEB",
        nodeId: justWatchId,
        country: "DK",
        language: "en",
        filterBuy: {
          monetizationTypes: ["BUY"],
          bestOnly: true,
        },
        filterFlatrate: {
          monetizationTypes: [
            "FLATRATE",
            "FLATRATE_AND_BUY",
            "ADS",
            "FREE",
            "CINEMA",
          ],
          bestOnly: true,
        },
        filterRent: { monetizationTypes: ["RENT"], bestOnly: true },
        filterFree: { monetizationTypes: ["ADS", "FREE"], bestOnly: true },
      },
      query: `query GetTitleOffers($nodeId: ID!, $country: Country!, $language: Language!, $filterFlatrate: OfferFilter!, $filterBuy: OfferFilter!, $filterRent: OfferFilter!, $filterFree: OfferFilter!, $platform: Platform! = WEB) {
        node(id: $nodeId) {
          id
          __typename
          ... on MovieOrShowOrSeasonOrEpisode {
            offerCount(country: $country, platform: $platform)
            flatrate: offers(
              country: $country
              platform: $platform
              filter: $filterFlatrate
            ) {
              ...TitleOffer
            }
            buy: offers(country: $country, platform: $platform, filter: $filterBuy) {
              ...TitleOffer
            }
             rent: offers(country: $country, platform: $platform, filter: $filterRent) {
                ...TitleOffer
            }
            free: offers(country: $country, platform: $platform, filter: $filterFree) {
                ...TitleOffer
            }
            }
        }
    }
    fragment TitleOffer on Offer {
        id
        presentationType
        monetizationType
        retailPrice(language: $language)
        type
        package {
            clearName
            technicalName
            icon(profile: S100)
        }
        standardWebURL
        availableTo
    }`,
    }),
    headers: {
      Accept: "application/json, text/plain, */*",
      "Content-Type": "application/json",
    },
  })
    .then((r) => r.json())
    .then((r) => r.data as TJustWatchWatchingOptionsResponse);

I now simply merge the data from IMDB and JustWatch, and store it in my PostGres database!

You can see the result here: https://www.bjornnyborg.com/movies

Video breakdown

(In danish)