How i determine what movie to watch next
I have created a script that scrapes my IMDB Watchlist and figures out which movies are available on the streaming service that I currently have access to.
tl;drSee the result here, if you don't want a technical break down: https://www.bjornnyborg.com/movies
We all know the struggle of figuring out, what we should watch next. When we browse our streaming services, nothing seems interesting. We then go to the internet and look for interesting movies. When we finally find something that looks interesting, that movie is not available on any of our streaming service... Oh the struggle (first world problem, ikr).
Luckily - i am a programmer, and i think i can solve this first world problem with technology.
Let's break down the problem:
- We can't decide what to watch
- When we wan't to watch something specific, we don't know if it's available on any of our streaming subscriptions
Useful services
First a couple of recommendations that can help you decide what to watch:
IMDB: For deciding what to watch
I have used IMDB for many years, to keep track of new movies i wan't to watch, and also scoring movies i already watched. So IMDB actually already contains a list of movies that i some point in time added to my watchlist. Also it contains a score on all the movies in the watchlist, that i can prioritize the movies by.
JustWatch: For checking what streaming service a movie is available on
Another awesome service is JustWatch - here you can look up any movie, and it can tell you on what streaming services it's available.
The idea
So, if i somehow could download my IMDB watchlist, look up all the movies on JustWatch, filter them by streaming services i currently subscribe to, and then sort them by IMDB score, and display them on my website, then i would actually have a pretty decent suggestion on what to watch next!
The build
Step 1: Download the IMDB watchlist
I decided to use Puppeteer to login to my IMDB account, and download my Watchlist, since i couldn't find an official API for completing this task.
The script does the following:
- Starts a Chromium browser
- Logs into my IMDB account
- exports my IMDB watchlist and IMDB reviews
- Downloads the csv-exports
- Converts the CSV files to JSON data
The Puppeteer script looks like this:
import csvtojson from "csvtojson";
import { Page } from "puppeteer";
import path from "path";
import fs from "fs";
const delay = (ms: number) => {
return new Promise((resolve) => setTimeout(resolve, ms));
};
const downloadLatestExport = async (page: Page) => {
await page.waitForSelector("button[aria-label='Export']");
await page.click("button[aria-label='Export']");
console.log("found export button");
await page.waitForSelector("a[aria-label='Open exports page']");
console.log("Go to export page");
await delay(2000);
await page.goto("https://www.imdb.com/exports");
while (await page.$(".PROCESSING")) {
console.log("waiting for processing");
await delay(2000);
await page.reload();
}
console.log("processing done, downloading export");
const downloadPath = `${process.cwd()}\\public\\tmp`;
await (page as any)._client().send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath,
});
await page.waitForSelector(".READY");
await page.click(".READY");
await delay(1000);
const files = fs.readdirSync(downloadPath);
const latestFile = files[files.length - 1];
const content = fs.readFileSync(path.join(downloadPath, latestFile), "utf8");
fs.readdirSync(downloadPath).forEach((file) => {
fs.unlinkSync(path.join(downloadPath, file));
});
console.log("Export file downloaded");
return content;
};
export const getImdbWatchlist = async (runReviews: boolean) => {
if (!process.env.IMDB_USER) throw new Error("IMDB_USER not set");
if (!process.env.IMDB_PASSWORD) throw new Error("IMDB_PASSWORD not set");
let puppeteer;
let options = {};
// Default Chromium is too large for AWS Lambda, so we need to download a smaller version
if (process.env.AWS_LAMBDA_FUNCTION_VERSION) {
const chrome = require("@sparticuz/chromium-min");
puppeteer = require("puppeteer-core");
console.log("use core");
chrome.setGraphicsMode = false;
options = {
args: chrome.args,
defaultViewport: chrome.defaultViewport,
executablePath: await chrome.executablePath(
"https://github.com/Sparticuz/chromium/releases/download/v116.0.0/chromium-v116.0.0-pack.tar"
),
headless: chrome.headless,
};
} else {
puppeteer = require("puppeteer");
options = {
headless: false,
};
}
let browser = await puppeteer.launch(options);
const page: Page = await browser.newPage();
await page.setViewport({
width: 1600,
height: 1200,
deviceScaleFactor: 1,
});
console.log("browser launched");
await page.goto("https://www.imdb.com/registration/signin");
console.log("navigated to imdb");
await page.waitForSelector("#signin-options");
console.log("signin options found");
await page.click("#signin-options .list-group > a");
console.log("clicked sign in with imdb");
await page.waitForSelector("#ap_email");
console.log("typing ", process.env.IMDB_USER);
await page.type("#ap_email", process.env.IMDB_USER);
console.log("typed email");
await page.waitForSelector("#ap_password");
await page.type("#ap_password", process.env.IMDB_PASSWORD);
console.log("typed password");
await page.waitForSelector("#signInSubmit");
await page.click("#signInSubmit");
console.log("logged in");
await page.waitForSelector("#imdbHeader");
// WATCHLIST
await page.goto("https://www.imdb.com/user/ur63279589/watchlist?view=detail");
const watchList = await downloadLatestExport(page);
console.log("downloaded watchlist");
// REVIEWS
let reviewList = "";
if (runReviews) {
await page.goto("https://www.imdb.com/user/ur63279589/ratings");
reviewList = await downloadLatestExport(page);
console.log("downloaded reviews");
}
// END
await browser.close();
if (!watchList) throw new Error("Could not find watchList");
if (!reviewList && runReviews) throw new Error("Could not find reviewList");
return {
reviews: runReviews
? await csvtojson()
.fromString(reviewList)
.then((jsonObj: any) => jsonObj)
: null,
movies: await csvtojson()
.fromString(watchList)
.then((jsonObj: any) => jsonObj),
};
};
Step 2: Look up the movies on JustWatch
When we have downloaded and converted the IMDB data, we need to look up all the movies on JustWatch, to fetch data about availability in our country.
I do this in 2 steps, first i make a simple graphql search request request to https://apis.justwatch.com/graphql
, searching for the movie title, and recieving the JustWatch ID.
The getJustWatchSearch
request looks like this:
import { TJustWatchSearchResult } from "@/types/JustWatch";
export const getJustWatchSearchResult = async (title: string) =>
await fetch("https://apis.justwatch.com/graphql", {
method: "POST",
body: JSON.stringify({
operationName: "GetSuggestedTitles",
variables: {
country: "DK",
language: "en",
first: 10,
filter: { searchQuery: title },
},
query: `query GetSuggestedTitles($country: Country!, $language: Language!, $first: Int!, $filter: TitleFilter) {
popularTitles(country: $country, first: $first, filter: $filter) {
edges {
node {
id
objectType
objectId
content(country: $country, language: $language) {
title
originalReleaseYear
posterUrl
fullPath
}
}
}
}
}
`,
}),
headers: {
Accept: "application/json, text/plain, */*",
"Content-Type": "application/json",
},
})
.then((r) => r.json())
.then((r) => r.data as TJustWatchSearchResult);
When we get a search result, i compare the IMDB and Justwatch movie release year and title, to make sure it's in fact the correct movie.
When we have recieved the JustWatch ID, we can get the "WatchingOptions", we recieve this by making another GraphQL request, but now with the ID, and requesting all kinds of relevant data, when determining where this movie should be watched.
The getWatchingOptions
request looks like this:
import { TJustWatchWatchingOptionsResponse } from "@/types/JustWatch";
export const getWathingOptions = async (justWatchId: string) =>
await fetch("https://apis.justwatch.com/graphql", {
method: "POST",
body: JSON.stringify({
operationName: "GetTitleOffers",
variables: {
platform: "WEB",
nodeId: justWatchId,
country: "DK",
language: "en",
filterBuy: {
monetizationTypes: ["BUY"],
bestOnly: true,
},
filterFlatrate: {
monetizationTypes: [
"FLATRATE",
"FLATRATE_AND_BUY",
"ADS",
"FREE",
"CINEMA",
],
bestOnly: true,
},
filterRent: { monetizationTypes: ["RENT"], bestOnly: true },
filterFree: { monetizationTypes: ["ADS", "FREE"], bestOnly: true },
},
query: `query GetTitleOffers($nodeId: ID!, $country: Country!, $language: Language!, $filterFlatrate: OfferFilter!, $filterBuy: OfferFilter!, $filterRent: OfferFilter!, $filterFree: OfferFilter!, $platform: Platform! = WEB) {
node(id: $nodeId) {
id
__typename
... on MovieOrShowOrSeasonOrEpisode {
offerCount(country: $country, platform: $platform)
flatrate: offers(
country: $country
platform: $platform
filter: $filterFlatrate
) {
...TitleOffer
}
buy: offers(country: $country, platform: $platform, filter: $filterBuy) {
...TitleOffer
}
rent: offers(country: $country, platform: $platform, filter: $filterRent) {
...TitleOffer
}
free: offers(country: $country, platform: $platform, filter: $filterFree) {
...TitleOffer
}
}
}
}
fragment TitleOffer on Offer {
id
presentationType
monetizationType
retailPrice(language: $language)
type
package {
clearName
technicalName
icon(profile: S100)
}
standardWebURL
availableTo
}`,
}),
headers: {
Accept: "application/json, text/plain, */*",
"Content-Type": "application/json",
},
})
.then((r) => r.json())
.then((r) => r.data as TJustWatchWatchingOptionsResponse);
I now simply merge the data from IMDB and JustWatch, and store it in my PostGres database!
You can see the result here: https://www.bjornnyborg.com/movies
Video breakdown
(In danish)