Returns the specified snapshot.

If you set row_limit > 0, the table_rows attribute returns a TableView object with the requested number of snapshot rows (max. 10,000). To obtain more than 10,000, use snapshot.export_stream( ) or snapshot.export_dataframe( ).

Note: The snapshot’s get_rows( ) method also returns a TableView object. However, it does not use any of the parameters you specified in the original snapshots.get() request (query, row_sort, etc.), so you’ll need to include them as arguments. The returned TableView object includes the snapshot’s first 200 rows, or a number you specify using row_limit.

See the examples below for more information about the use of table_rows and get_rows().

Required arguments

Name Type Description
id string The snapshot ID.

Optional keyword arguments

NameTypeDescriptionRequired?
geo_query string Query string to locate points within the specified distance of a central point in the specified geo_point column4. You must set row_limit > 0. false
highlight boolean When true, for rows returned based on a query string, the endpoint sets the returned highlights attribute with positional information indicating where instances of the specified string occur. Defaults to false. false
include_serialids boolean Set to true to include the Enigma Serial ID column in the exported CSV file (default is false). The serial ID is row identifier added by Enigma that is unique within this snapshot. false
phrase_distance integer If query_mode=phrase, this specifies the proximity search distance. For example, if query is sovereign%20country and phrase_distance=0, the two words must be next to each other. If phrase_distance=1, it also matches 'sovereign island country'. false
query string Query string to return only rows that contain specific information (must set row_limit > 0). false
query_mode string Values: advanced phrase simple
Defaults to simple2.
false
row_limit integer Number of rows to return (defaults to 0; maximum 10,000). false
row_offset integer Number of rows to skip at the beginning of the snapshot (for example, row_offset=10 skips the first 10 rows). If you specify row_sort as well, the records are sorted first and then the offset is applied. false
row_sort string Specifies the field used to sort the records (must set row_limit > 0). If you specify row_offset as well, the records are sorted first and then the offset is applied. Prepend the field name with a minus sign (-) to specify descending order (defaults to ascending). false
stats boolean When true, the endpoint returns a stats attribute with statistical information about the snapshot3. Defaults to false. false

2 advanced lets you search specific columns, rather than the entire row. See Advanced query mode below.

3 For information about the stats parameter, see Snapshot statistics below.

4 For information about the geo_query parameter, see Location-based searches below.

Returns

The snapshot model.

Example 1

This example returns the specified snapshot (from the “H-1B Visa Applications - 2015” dataset).

public = enigma.Public()
snap_id = public.datasets.get('f4b5323d-37b4-4b6d-ba5c-af9bedc75de0').current_snapshot.id
snapshot = public.snapshots.get(snap_id)

The snapshot’s get_rows method returns a TableView object with snapshot records (the first 200 rows by default; use row_limit to request a specific number, max 10,000).

tableview = snapshot.get_rows()
tableview[0]

Output (the first row of the snapshot):

[[case_number: 'I-200-09121-701936', case_status: 'WITHDRAWN', case_submitted: '2015-02-05T00:00:00', decision_date: '2015-02-05T00:00:00', visa_class: 'H-1B', employment_start_date: '2015-02-09T00:00:00', employment_end_date: '2015-02-28T00:00:00', employer_name: 'MEDTRONIC, INC.', employer_address1: '710 MEDTRONIC PARKWAY NE', employer_address2: None, employer_city: 'MINNEAPOLIS', employer_state: 'MN', employer_postal_code: '55432', employer_country: 'UNITED STATES OF AMERICA'...]]

Example 2

This example returns the snapshot model and the first 10 rows containing the string “Apple Inc” from specified snapshot (from the “H-1B Visa Applications - 2015” dataset). The row_sort parameter specifies that rows be sorted by the “Prevailing Wage” column (prevailing_wage) in descending order.

public = enigma.Public()
snap_id = public.datasets.get('f4b5323d-37b4-4b6d-ba5c-af9bedc75de0').current_snapshot.id
snapshot = public.snapshots.get(
  snap_id, 
  query='Apple Inc', 
  row_limit=10,
  row_sort='-prevailing_wage'
  )

Since we specified a row_limit argument, the snapshot’s table_rows attribute references a TableView object containing the first 10 matching rows. The code below displays the first row of the TableView.

tableview = snapshot.table_rows
tableview[0]

Output (the first row containing ‘Apple Inc’):

[case_number: 'I-200-15064-967479', case_status: 'CERTIFIED', case_submitted: '2015-03-05T00:00:00', decision_date: '2015-03-13T00:00:00', visa_class: 'H-1B', employment_start_date: '2015-09-04T00:00:00', employment_end_date: '2018-09-03T00:00:00', employer_name: 'APPLE INC.', employer_address1: 'ONE INFINITE LOOP', employer_address2: 'N/A', employer_city: 'CUPERTINO', employer_state: 'CA', employer_postal_code: '95014', employer_country: 'UNITED STATES OF AMERICA'...]

Note: If you call the snapshot’s get_rows() method (for example, tableview = snapshot.get_rows()), the SDK returns the snapshot’s first 200 rows, ignoring the query and row_sort parameters specified in the original request. You would need to specify these as arguments on get_rows().

Location-based searches

The geo_query parameter lets you perform location-based searches on snapshots that include location information in geo_point format, where geo_point is a comma-separated latitude/longitute pair (for example, 40.848387, -73.456015).

In the example below, the “Geo Location” column (field name = geo_location) is defined in the snapshot schema as a geo_point field and the values are supplied in the required lat,lng format:

Grade Violations Food Type Geo Location
A 06D 10F Bakery 40.848387,-73.456015
A 04H 10F French 40.662893,-73.361824
A 04H 10F Italian 40.767745,-73.384869
A 06C 10B 10F American 40.579554,-73.782198
  {
    "data_type": "geo_point", 
    "description": "", 
    "display_name": "Geo Location", 
    "is_serialid": false, 
    "name": "geo_location", 
    "visible_by_default": true
  }

Since the geo_location field is of type geo_point, you can use the geo_query string to locate snapshot rows that relate to a specific location. In the geo_query string, you need to specify:

  • The column with the geocoded location information
  • The center point from which to start the search
  • The search radius, in one of the supported distance units:

    Unit Abbreviation or name
    Mile mi or miles
    Yard yd or yards
    Feet ft or feet
    Inch in or inch
    Kilometer km or kilometers
    Meter m or meters
    Centimeter cm or centimeters
    Millimeter mm or millimeters
    Nautical mile NM, nmi, or nauticalmiles

For example, to search the geo_location column for points within 50 meters of the point (40.744460,-73.987340):

snap_id = public.datasets.get('d8c29d0d-f283-4eb5-b4d4-460c9779d05d').current_snapshot.id
geo_query='geo_location:40.744460,-73.987340;distance:50m'
snapshot = public.snapshots.get(snap_id, geo_query=geo_query, row_limit=20)
tableview = snapshot.table_rows

The SDK returns the resulting rows based on the distance from the search point, with the closest point first. geo_point values are returned as a TableRow object (see TableView for more information) in {'lat': lat, 'lng': lng} format, as shown in the example below:

[serial_f2d64ea0_4060_44bb_904f_7a01b1208496: '3259', camis: '40993542', dba: 'Cafe 28', boro: 'Manhattan', address: '245 5th Ave', zipcode: '10016', neighborhood: 'Midtown', phone: '2126867300', cuisine_description: 'American', food_type: 'American', inspection_date: '2018-06-04T00:00:00', violation_code: '04L 04N', grade: 'A', score: '5.0', latitude: '40.74446', longitude: '-73.98734', geo_location: {'lat': 40.74446, 'lng': -73.98734}]

Snapshot statistics

If you include the query parameter stats=true, the API returns statistical information about the snapshot. If you add query=<query_string>, you’ll get statistics for just the matching rows, as shown in the sample output below.

If you use query=, you don’t need to specify row_limit= unless you want the actual rows as well. If you do specify row_limit=, stats will be for all matching rows, not just the rows that are returned.
  "stats": {
    "api_number": {
      "buckets": [
        {
          "doc_count": 11, 
          "key": "608174119700"
        }
      ], 
      "doc_count_error_upper_bound": 0, 
      "fill_rate": 1.0, 
      "sum_other_doc_count": 0
    }, 
    "gas": {
      "avg": 536440.8181818182, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 783719.0, 
      "min": 0.0, 
      "sum": 5900849.0
    }, 
    "month": {
      "avg": 1401604363636.3635, 
      "avg_as_string": "2014-06-01T06:32:43.636Z", 
      "buckets": [
        {
          "doc_count": 1, 
          "key": 1388534400000, 
          "key_as_string": "2014-01-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1391212800000, 
          "key_as_string": "2014-02-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1393632000000, 
          "key_as_string": "2014-03-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1396310400000, 
          "key_as_string": "2014-04-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1398902400000, 
          "key_as_string": "2014-05-01T00:00:00.000Z"
        }
      ], 
      "count": 11, 
      "doc_count_error_upper_bound": 0, 
      "fill_rate": 1.0, 
      "max": 1414800000000.0, 
      "max_as_string": "2014-11-01T00:00:00.000Z", 
      "min": 1388534400000.0, 
      "min_as_string": "2014-01-01T00:00:00.000Z", 
      "sum": 15417648000000.0, 
      "sum_as_string": "2458-07-26T00:00:00.000Z", 
      "sum_other_doc_count": 6
    }, 
    "oil": {
      "avg": 512950.7272727273, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 755074.0, 
      "min": 0.0, 
      "sum": 5642458.0
    }, 
    "serial_ea8cd08a_f55d_4e12_bc35_aa45596c86dc": {
      "avg": 60157.0, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 109953.0, 
      "min": 8466.0, 
      "sum": 661727.0
    }, 
    "serialid": {
      "avg": 60158.0, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 109954.0, 
      "min": 8467.0, 
      "sum": 661738.0
    }, 
    "water": {
      "avg": 131.54545454545453, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 595.0, 
      "min": 0.0, 
      "sum": 1447.0
    }
  }, 

In the JSON response, each child of the stats object represents one column. The information returned for each column depends on the column type:

Type Stats returned Descriptions
integer, decimal avg Mean of non-empty values
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Maximum value
  min Minimum value
  sum Sum of all non-empty values
string buckets Top five strings by frequency of occurrence, with the key (string) and doc_count (count) for each
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
datetime buckets Top five datetimes by frequency of occurrence, with the key (datetime), key_as_string (datetime), and doc_count (count) for each
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Latest date as milliseconds since January 1, 1970 (midnight UTC/GMT)
  max_as_string Latest date in ISO 8601 YYYY-MM-DDThh:mm:ss.sTZD format
  min Earliest date as milliseconds since January 1, 1970 (midnight UTC/GMT)
  min_as_string Earliest date in ISO 8601 YYYY-MM-DDThh:mm:ss.sTZD format
  sum Sum of dates as milliseconds since January 1, 1970 (midnight UTC/GMT)
boolean avg Mean of non-empty values (true = 1; false = 0)
  avg_as_string True if avg >= 0.5; false otherwise
  buckets Values by frequency of occurrence, with the key (numeric value where 1 = true; 0 = false), key_as_string (boolean value), and doc_count (count) for each
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Maximum value (1 or 0)
  max_as_string Maximum value as a string (true or false)
  min Minimum value (1 or 0)
  min_as_string Minimum value as a string (true or false)
  sum Sum of all non-empty values (true = 1; false = 0)

Advanced query mode

The advanced query mode lets you search specific columns, rather than entire rows. As with simple query mode, you must set row_limit greater than zero to get back any resulting rows. Search terms are not case sensitive.

To search a specific column, use the query parameter to specify the field name, followed by a colon (:) and the search term, for example:

public = enigma.Public()
snapshot_id = public.datasets.get('f4b5323d-37b4-4b6d-ba5c-af9bedc75de0').current_snapshot.idsnapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='employer_name:apple',
    row_limit=10
)

To search for a phrase, put the phrase in double quotes, for example:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='employer_name:"apple inc."',
    row_limit=10
)

To search for a partial match, use the * wildcard character as shown here:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='employer_name:goog*',
    row_limit=10
)

To search for multiple terms within the same column, use parentheses and the appropriate logical operator (AND or OR).

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='job_title:(software AND engineer)',
    row_limit=10
)

Within the parentheses, the default operator is AND, so in job_title:(software engineer) the AND operator is implied.

To exclude a term, use a minus (-) character (NOT) as shown in this example:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='job_title:(software AND -engineer)',
    row_limit=10
)

For numeric columns, you can specify a range of values using square brackets and the keyword “TO” (uppercase).

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='total_workers:[2 TO 10]',
    row_limit=10
)

For date columns, ranges use the same [<start> TO <end>] syntax, with dates specified in YYYY-MM-DD format, for example:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='decision_date:[2014-11-18 TO 2014-11-19]',
    row_limit=10
)

When specifying a range, you can use * as the range start or range end to specify <= or >= respectively. This example specifies the range >= 5:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='total_workers:[5 TO *]',
    row_limit=10
)

To search multiple columns, put the search criteria for each column within parentheses and include the appropriate logical operator (AND or OR), for example:

snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='(employer_name:apple)AND(job_title:"software engineer")',
    row_limit=10
)
snapshot = public.snapshots.get(
    snapshot_id, 
    query_mode='advanced',
    query='(employer_name:apple)OR(employer_city:cupertino)',
    row_limit=10
)