Scan vs Query API Call

What is a Query?

  • A Query operation finds items in a table based on the Primary Key attribute and a distinct value to search for
  • e.g. select an item where the user ID is equal to 212, will select all the attributes for that item, e.g. first name, last name, email, etc
  • Use an optional Sort Key name and value to refine the results
  • e.g. if your Sort Key is a timestamp, you can refind the query to only select items with a timestamp of the last 7 days
  • By Default, a query returns all the attributes for the items but you can use the ProjectionExpression parameter if you want the query to only return the specific attributes you want
  • e.g. if you only want to see the email address rather than all attributes
  • Results are always sorted by the Sort Key
  • numeric order - by default in ascending order (1, 2, 3, 4)
  • ASCII character code values
  • You can reverse the order by setting the ScanIndexForward paramater to false
    • this parameter only applies to queries and NOT scan
  • By default, Queries are Eventually Consistent
  • You need to explicitly set the query to be strongly consistent

What is a Scan?

  • A Scan operation examines every item in the table
  • by default returns all data attributes
  • Use the ProjectionExpression parameter to refine the scan to only return the attributes you want

Query or Scan?

  • Query is more efficient than a scan
  • Scan dumps the entire table, then filters out the values to provide the desired result - removing the unwanted data
    • This adds an extra step of removing the data you don't want
    • As the table grows, the scan operation takes longer
  • Scan operation on a large table can use up the provisioned throughput for a large table in just a single operation

How To Improve Performance

  • You can reduce the impact of a query or scan by setting a smaller page size whjich uses fewer read operations
  • e.g. set the page size to return 40 items
  • larger number of smaller operations will allow other requests to succeed without throttling
  • Avoid using scan operatinos if you can: design tables in a way that you can use the Query, Get, or BatchGetItem APIs

How to Improve Scan Performance

  • By default, a scan operation proccesses data sequentially in returning 1 MB increments before moving on to retrieve the next 1 MB of data. It can only scan one partition at a time.
  • You can configure DynamoDB to use Parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel
  • Best to avoid parallel scans if your table or index is arleady incurring heavy read/write activity from other applications

Scan vs Query Exam Tips

  • A query operation finds items in a table using only the Primary Key attribute
    • You provide the primary key name and a distinct value to search for
  • A scan operation examines every item in the table
    • By default returns all data attributes
    • Use the ProjectionExpression parameter to refine the results
  • Query results are always sorted by the Sort Key if there is one
    • Sorted in ascending order
    • Set ScanIndexForward parameter to false to reverse the order - queries only
  • Query operation is generally more efficient than a Scan
  • Reduce the impact of a query or scan by setting a smaller page size which uses fewer read operations
  • Isolate scan operations to specific tables and segregate them from your mission-critical traffic
  • Try Parallel scans, rather than the default sequential scan
  • Avoid using scan operations if you can: design tables in a way that you can use the Query, Get, or BatchGetItem APIs

Extras

  • ProjectionExpression is used for GetItem, Query, or Scan - not just scans - to get attributes of an item