Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mtumilowicz/scala-graphql-caliban-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

66 Commits

Repository files navigation

Build Status License: GPL v3

scala-graphql-caliban-workshop

preface

  • goals of this workshop
    • introduction into GraphQL
      • motivation
      • schema
      • query, mutation, subscription
      • data loaders
  • workshop plan
    • task1: implement query for returning all customers (with a size limit)
    • task2: implement subscription for deleted customers
    • task3: implement getting orders in batch

introduction

  • BFF - backend for frontend API
    • different clients need different sets of data
      • Web, Iphone, Android, Tv
    • instead of the frontend application aggregating data by calling multiple sources - create a BFF layer
      • layer does the following:
        • receive request from the client application
        • call multiple backend services
        • format and response according to what is needed by the client
        • respond to the client
    • pros
      • simplify the frontend logic
      • avoid over-fetching or under-fetching
      • reduce the number of network calls from the client perspective
  • real world data representation
    • best way: graph-like data structure
      • data model is usually a graph of objects with relations between them
    • why think of data in terms of resources (in URLs) or tables?

graphql

  • GraphQL = Graph Query Language
  • is a new API standard that provides
    • more efficient,
    • powerful
    • flexible
    • alternative to REST
  • can be written in any programming language
  • has two major parts
    • structure = strongly typed schema
      • schema = graph of fields that have types
        • all possible data objects that can be read (or updated)
      • client uses the schema to know what are the capabilities of API
    • behavior = resolver functions
      • each field in a GraphQL schema is backed by a resolver function
      • resolver function defines what data to fetch for its field
      • resolver function represents the instructions on how and where to access raw data
  • takes the custom endpoint idea to an extreme
    • the whole server = single smart endpoint
    • multiple round-trip problem
      • client has to communicate with the server multiple times to gather all data
      • API server knows how to answer questions about a single resource
      • solution: GraphQL
  • 5 key characteristics
    1. hierarchical: queries = hierarchies of data definitions
    2. view-centric: built to satisfy frontend requirements
    3. strongly-typed: typed context (schema) + queries are executed within this context
    4. introspective: type system (schema) itself is queryable
    5. version-free: tools for the continuous evolution
  • vs REST
    • don't need any documentation like swagger
    • REST = over-fetching (to many fields) and under-fetching (many calls to get data you need)
      • client cannot specify which fields to select
    • REST: each endpoint represents a resource
      • multiple network requests
    • REST: problem with versioning
      • usually means new endpoints
      • GraphQL: add new fields and types without removing the old ones
        • clients can continue to use older features
          • can also incrementally update their code to use new features
        • important for mobile clients
          • you cannot control the version of the API they are using
        • GraphQL offers a way to deprecate (and hide)
    • REST: turn into a mix of regular REST endpoints plus custom endpoints crafted for performance reasons
  • summary
    • pros
      • decouples clients from servers
        • allows both of them to evolve and scale independently
      • efficiency
      • simplify client logic: client communicate with the GraphQL service
        • GraphQL service communicates with the different services
    • cons
      • malicious queries
        • complex queries
          • solution: analyze AST complexity
        • large result size
          • solution: limit depth
        • long execution time
          • solution: limit the execution time
          • solution: stream the results
      • n+1 problem
        • solution: data loader
      • caching is no longer simple
        • network layer - unsuitable as there is a common URL for all operations
        • solution: granularity (per field)
      • hard to return simple Map

schema

  • something like swagger: graphql/graphiql
  • example
    type Starship {
     id: ID!
     name: String! // non nullable
     appearsIn: [Episode!]! // list of objects
     length(unit: LengthUnit = METER): Float // argument
    }
    
  • good practices
    • usually make the types of fields non-null
    • however, make all root fields nullable
      • in this case, nullability means that something went wrong but we’re allowing it to show other fields
  • scalar
    • don't have any sub-fields
    • predefined: ID, Boolean, Int, String

operations

  • three types of operations
    • queries (READ operations)
    • mutations (WRITE-then-READ operations)
      • queries that have side effects
    • subscriptions
      • stream of responses
      • used for real-time data monitoring
      • require the use of a data-transport channel that supports continuous pushing of data
        • usually done with WebSockets

query

  • example
    {
     hero {
     name
     friends {
     name
     }
     }
    }
    
  • steps
    1. validate the request against its schema
    2. traverse the tree of fields and invoke the resolver functions
  • fragment
    • example
      {
       leftComparison: hero(episode: EMPIRE) {
       ...comparisonFields // spread that fragment
       }
       rightComparison: hero(episode: JEDI) {
       ...comparisonFields // spread that fragment
       }
      }
      fragment comparisonFields on Character {
       name
       appearsIn
       friends {
       name
       }
      }
      
    • are the composition units of the language
    • are the reusable piece of any GraphQL operation
    • split operations into smaller parts
    • data required by an application = sum of the data required by individual components
      • makes a fragment the perfect match for a component
      • represent the data requirements for a single component and then compose them

mutation

  • is always a WRITE operation followed by a READ operation
  • vs queries
    • queries will be done in parallel
    • mutations sequentially
      • if an API consumer sends two mutation fields, the first is guaranteed to finish before the second begins
  • example
    mutation CreateReviewForEpisode($ep: Episode!, $review: ReviewInput!) {
     createReview(episode: $ep, review: $review) {
     stars
     commentary
     }
    }
    { // variables
     "ep": "JEDI",
     "review": {
     "stars": 5,
     "commentary": "This is a great movie!"
     }
    }
    

subscription

  • client should NOT use subscriptions to stay up to date with backend
    • use poll intermittently with queries
    • re-execute queries on demand when a user performs a relevant action (such as clicking a button)
  • use subscriptions for
    • small, incremental changes to large objects
      • polling for a large object is expensive
        • especially when most of the object's fields rarely change
      • fetch the object's initial state with a query
        • server can proactively push updates to individual fields
    • low-latency, real-time updates
      • example: a chat application

data loaders

  • problem
    • querying for authors and books
    • authors "has many" books
    • we would like to achieve two SQL calls
      SELECT *
      FROM authors;
      -- pretend this returns 3 authors
      SELECT *
      FROM books
      WHERE author_id in (1, 2, 3); -- an array of the author's ids
      
    • query
      {
       query { // 1 call
       authors {
       name
       books { // each book resolver would only get it’s own parent author: N calls
       title
       }
       }
       }
      }
      
    • in REST: ORM will help
    • in graphQl: each resolver function really only knows about its own parent object
      • ORM won’t have the luxury of a list of author IDs anymore
      • result: N+1 calls
        SELECT *
        FROM authors;
        SELECT *
        FROM books
        WHERE author_id in (1);
        SELECT *
        FROM books
        WHERE author_id in (2);
        SELECT *
        FROM books
        WHERE author_id in (3);
        
  • solutions
    • batching
      • delay asking the database until we will have all appropriate IDs
    • caching
      • no application-level caching shared among requests
      • rather simple memoization in the context of a single request
    • example library: DataLoader (JavaScript utility library)

client cache

  • responses from REST are easy to cache
    • dictionary nature
      • specific URL gives certain data
      • use the URL itself as the cache key
  • in graphQl: graph cache
    • no URL-like primitive (globally unique identifier for a given object)
    • best practice: expose such an identifier for clients to use
      {
       starship(id:"3003") { // id field provides a globally unique key
       id
       name
       }
       droid(id:"2001") {
       id
       name
       friends {
       id
       name
       }
       }
      }
      

pagination

  • two scenarios
    • UX concern: too many items to display
      • mental overload for the user to see them all at once
    • performance concern: too many items to load
      • it would overload our backend, the connection, or the client to load all of the items at once
  • types of pagination from the UX perspective
    • numbered pages
      • example: book, Google search
      • expect it to be consistent over some period of time
      • sql
        SELECT * FROM posts ORDER BY created_at LIMIT 10 OFFSET 20; // page 3, with a page size of 10
        SELECT COUNT(*) FROM posts; // total number of entries or pages in the results
        
      • drawbacks
        • only for mostly static content
        • however, usually items are added and removed while the user is navigating
          • leads to skipping items
          • or displaying the same item twice
            • new item was added at the top of the list
              • skip and limit approach to show the item at the boundary between pages twice
    • sequential pages like Reddit
      • aren’t numbered
      • content changes so rapidly - no point in page numbers at all
      • specify the place in the list we want to begin, and how many items we want to fetch
        • it doesn’t matter how many items were added to the top of the list
        • we have a constant pointer to the specific spot where we left off
          • pointer is called a cursor
          • cursor is a piece of data
            • generally some kind of ID
            • represents a location in a paginated list
      • example
        SELECT * FROM posts
        WHERE created_at < $after
        ORDER BY created_at LIMIT $page_size;
        https://www.reddit.com/?count=25&after=t3_49i88b
        
      • good practice: encoded cursor with some metadata or a timestamp (instead of a row ID)
        • resilient to row deletion
        • we don’t want the query to fail if a specific item is removed
    • infinite scroll like Twitter
      • illusion of one very long page
    • modern apps today use either the second or third approach
      • app’s content is constantly changing
      • doesn’t make sense to create the illusion of numbered pages

relay cursor connections

  • generic specification for how a GraphQL server should expose paginated data
  • generalized concepts we were talking about above
    • friends(first:2 after:$opaqueCursor) // vs friends(first:2 after:$friendId)
      • cursors are opaque and their format should not be relied upon
        • suggestion: base64 encoding
      • additional flexibility for pagination model changes
        • user just uses opaque cursors
  • example
    • request
      {
       user {
       id
       name
       friends(first: 10, after: "opaqueCursor") {
       edges { // each edge has a reference to the user object of the friend, and a cursor
       cursor // every item in the paginated list has its own cursor
       node {
       id
       name
       }
       }
       pageInfo {
       hasNextPage
       }
       }
       }
      }
      
      • notice that if we want to, we can ask for 10 friends starting from the middle of the list we last fetched
    • response
      {
       "data": {
       "products": {
       "pageInfo": {
       "hasNextPage": true,
       "hasPreviousPage": false
       },
       "edges": [
       {
       "cursor": "eyJsYXN0X2lkIjoxMDA3OTc4ODg3NiwibGFzdF92YWx1ZSI6IjEwMDc5Nzg4ODc2In0=",
       "node": {
       "id": "1",
       "name": "Michal"
       }
       },
       {
       "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5MzQyMCwibGFzdF92YWx1ZSI6IjEwMDc5NzkzNDIwIn0=",
       "node": {
       "id": "2",
       "name": "Marcin"
       }
       },
       {
       "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5NDM4MCwibGFzdF92YWx1ZSI6IjEwMDc5Nzk0MzgwIn0=",
       "node": {
       "id": "3",
       "name": "Anna"
       }
       }
       ]
       }
       },
       ...
      }
      
  • glossary
    • connection - paginated field on an object
      • example: friends field on a user
    • edge - metadata about one object in the paginated list
      • includes a cursor to allow pagination starting from that object
    • node - actual object
    • pageInfo - info about more pages of data to fetch

security

  • critical threat: resource-exhaustion attacks (DOS attacks) with overly complex queries
    • are not specific to GraphQL
    • example: query for deeply nested relationships
      • (user –> friends –> friends –> friends ...)
    • example use field aliases to ask for the same field many times
      {
       empireHero: hero(episode: EMPIRE) {
       name
       }
       jediHero: hero(episode: JEDI) {
       name
       }
      }
      
  • solution
    • cost analysis on the query
    • enforce limits on the amount of data
    • timeouts

caliban

  • features
    • minimize boilerplate
    • purely functional (strongly typed, explicit errors)
    • user friendly
    • schema / resolver separation
  • vs sangria
    • sangria: lots of boilerplate (macros to the rescue)
    • sangria: future based (effects are better)
    • sangria: schema and resolved tied together
  • schema is derived automatically from the case classes
    • mangolia - used for create schema for traits and case classes
    val api = graphQL(resolver)
    println(api.render) // prints derived schema
    
  • schema deriving examples
    • case classes
      case class Pug(name: String, nicknames: List[String], pictureUrl: Option[String])
      
      is transformed into
      type: Pug {
       name: String,
       nicknames: [String!]!,
       pictureUrl: String // optionality -> option
      }
      
    • enums
      sealed trait Color
      case object FAWN extends Color
      
      is transformed into
      type: enum Color { FAWN }
      
    • arguments
      case class PugName(name: String)
      case class Queries(pug: PugName => Option[Pug])
      
      is transformed into
      type: Queries { pug(name: String!): Pug }
      
  • n+1 problem
    • solution: ZQuery
      • parallelize queries
      • cache identical queries
      • batch queries if batching function provided
  • builtin wrappers
    val api = graphQL(...) @@
     maxDdepth(30) @@
     maxFields(200) @@
     timeout(10 seconds) @@
     printSlowQueries(1 second)
    

AltStyle によって変換されたページ (->オリジナル) /