Returns the snapshot model for the snapshot specified by the {snapshot_id} parameter. This endpoint takes the same query parameters and returns the same data as the GET /datasets/{dataset_id}/snapshots/{snapshot_id} endpoint.

If you set row_limit > 0, the API returns the specified number of data records (rows) from the snapshot as well. row_offset + row_limit must be less than 10,000. To obtain more than this, use GET /export/{snapshot_id}.

If you specify a query string, the API returns matching rows from the snapshot, as well as positional information indicating where instances of the specified query string occur (you can use this to highlight text in a search results page). The geo_query parameter lets you perform location-based searches (see Location-based searches). You must set row_limit > 0 when specifying a query or geo_query string1.

Query parameters

NameTypeDescriptionRequired?
geo_query string Query string to locate points within the specified distance of a central point in the specified geo_point column4. You must set row_limit > 0. false
highlight boolean When true, for rows returned based on a query string, the endpoint sets the returned highlights attribute with positional information indicating where instances of the specified string occur. Defaults to false (highlights = []). false
include_serialids boolean Set include_serialids=true to include the Enigma Serial ID column in the exported CSV file (default is false). The serial ID is row identifier added by Enigma that is unique within this snapshot. false
phrase_distance integer If query_mode=phrase, this specifies the proximity search distance. For example, if query=sovereign%20country and phrase_distance=0, the two words must be next to each other. If phrase_distance=1, it also matches 'sovereign island country'. false
query string Query string to return only rows that contain specific information (must set row_limit > 0). false
query_mode string Values: advanced phrase simple
Defaults to simple.
false
row_limit integer Number of rows to return (defaults to 0; maximum 10,000)1. false
row_offset integer Number of rows to skip at the beginning of the snapshot (for example, row_offset=10 skips the first 10 rows). If you specify row_sort as well, the records are sorted first and then the offset is applied. false
row_sort string Specifies the field used to sort the records (must set row_limit > 0). If you specify row_offset as well, the records are sorted first and then the offset is applied. Prepend the field name with a minus sign (-) to specify descending order (defaults to ascending). false
stats boolean When true, the endpoint returns a stats attribute with statistical information about the snapshot3. Defaults to false. false

2 advanced lets you search specific columns, rather than the entire row. See Example 2 and Advanced query mode below.

3 For information about the stats parameter, see Snapshot statistics below.

4 For information about the geo_query parameter, see Location-based searches below.

Try it out

Enter any desired query parameters and click Send to view the response:

GET https://public.enigma.com/api/snapshots/{id}?


Responses

CodeReturns
200 The snapshot model plus any resulting data.
401 Invalid login credentials
404 Requested resource not found
405 Method not allowed

Example 1

This example returns the snapshot attributes and the first 10 rows containing the string “Apple Inc” from specified snapshot (from the “H-1B Visa Applications - 2015” dataset). The last parameter specifies that rows be sorted by the “Prevailing Wage” column (prevailing_wage) in descending order.

$ curl -X GET 'https://public.enigma.com/api/snapshots/df2d466d-8da4-46bd-bd15-8d5318889f2a?query=Apple%20Inc&row_limit=10&row_sort=-prevailing_wage'

Example 2

This example demonstrates the advanced query mode. It returns the snapshot attributes and the first ten rows from the “H-1B Visa Applications - 2015” snapshot where the “Employer Name” column (employer_name) includes the word “Apple” and the “Job Title” column (job_title) includes the word “engineer”. For more advanced query examples, see Advanced query mode below.

$ curl -X GET 'https://public.enigma.com/api/snapshots/df2d466d-8da4-46bd-bd15-8d5318889f2a?query_mode=advanced&query=(employer_name:apple)AND(job_title:engineer)&row_limit=10'

Location-based searches

The geo_query parameter lets you perform location-based searches on snapshots that include location information in geo_point format:

In the sample data below, the “Geo Location” column (field name = geo_location) is defined in the snapshot schema as a geo_point field:

Grade Violations Food Type Geo Location
A 06D 10F Bakery -73.456015, 40.848387
A 04H 10F French -73.361824, 40.662893
A 04H 10F Italian -73.384869, 40.767745
A 06C 10B 10F American -73.782198, 40.579554
  {
    "data_type": "geo_point", 
    "description": "", 
    "display_name": "Geo Location", 
    "is_serialid": false, 
    "name": "geo_location", 
    "visible_by_default": true
  }

Since the geo_location field is of type geo_point, you can use the geo_query string to locate snapshot rows that relate to a specific location. In the geo_query string, you need to specify:

  • The column with the geocoded location information
  • The center point from which to start the search
  • The search radius, in one of the supported distance units:

    Unit Abbreviation or name
    Mile mi or miles
    Yard yd or yards
    Feet ft or feet
    Inch in or inch
    Kilometer km or kilometers
    Meter m or meters
    Centimeter cm or centimeters
    Millimeter mm or millimeters
    Nautical mile NM, nmi, or nauticalmiles

For example, to search the geo_location column for points within 50 meters of the point (-73.588778, 40.744237):

?geo_query=geo_location:-73.588778,40.744237;distance:50m&row_limit=20

Snapshot statistics

If you include the query parameter stats=true, the API returns statistical information about the snapshot. If you add query=<query_string>, you’ll get statistics for just the matching rows, as shown in the sample output below.

If you use query=, you don’t need to specify row_limit= unless you want the actual rows as well. If you do specify row_limit=, stats will be for all matching rows, not just the rows that are returned.
  "stats": {
    "api_number": {
      "buckets": [
        {
          "doc_count": 11, 
          "key": "608174119700"
        }
      ], 
      "doc_count_error_upper_bound": 0, 
      "fill_rate": 1.0, 
      "sum_other_doc_count": 0
    }, 
    "gas": {
      "avg": 536440.8181818182, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 783719.0, 
      "min": 0.0, 
      "sum": 5900849.0
    }, 
    "month": {
      "avg": 1401604363636.3635, 
      "avg_as_string": "2014-06-01T06:32:43.636Z", 
      "buckets": [
        {
          "doc_count": 1, 
          "key": 1388534400000, 
          "key_as_string": "2014-01-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1391212800000, 
          "key_as_string": "2014-02-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1393632000000, 
          "key_as_string": "2014-03-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1396310400000, 
          "key_as_string": "2014-04-01T00:00:00.000Z"
        }, 
        {
          "doc_count": 1, 
          "key": 1398902400000, 
          "key_as_string": "2014-05-01T00:00:00.000Z"
        }
      ], 
      "count": 11, 
      "doc_count_error_upper_bound": 0, 
      "fill_rate": 1.0, 
      "max": 1414800000000.0, 
      "max_as_string": "2014-11-01T00:00:00.000Z", 
      "min": 1388534400000.0, 
      "min_as_string": "2014-01-01T00:00:00.000Z", 
      "sum": 15417648000000.0, 
      "sum_as_string": "2458-07-26T00:00:00.000Z", 
      "sum_other_doc_count": 6
    }, 
    "oil": {
      "avg": 512950.7272727273, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 755074.0, 
      "min": 0.0, 
      "sum": 5642458.0
    }, 
    "serial_ea8cd08a_f55d_4e12_bc35_aa45596c86dc": {
      "avg": 60157.0, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 109953.0, 
      "min": 8466.0, 
      "sum": 661727.0
    }, 
    "serialid": {
      "avg": 60158.0, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 109954.0, 
      "min": 8467.0, 
      "sum": 661738.0
    }, 
    "water": {
      "avg": 131.54545454545453, 
      "count": 11, 
      "fill_rate": 1.0, 
      "max": 595.0, 
      "min": 0.0, 
      "sum": 1447.0
    }
  }, 

In the JSON response, each child of the stats object represents one column. The information returned for each column depends on the column type:

Type Stats returned Descriptions
integer, decimal avg Mean of non-empty values
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Maximum value
  min Minimum value
  sum Sum of all non-empty values
string buckets Top five strings by frequency of occurrence, with the key (string) and doc_count (count) for each
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
datetime buckets Top five datetimes by frequency of occurrence, with the key (datetime), key_as_string (datetime), and doc_count (count) for each
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Latest date as milliseconds since January 1, 1970 (midnight UTC/GMT)
  max_as_string Latest date in ISO 8601 YYYY-MM-DDThh:mm:ss.sTZD format
  min Earliest date as milliseconds since January 1, 1970 (midnight UTC/GMT)
  min_as_string Earliest date in ISO 8601 YYYY-MM-DDThh:mm:ss.sTZD format
  sum Sum of dates as milliseconds since January 1, 1970 (midnight UTC/GMT)
boolean avg Mean of non-empty values (true = 1; false = 0)
  avg_as_string True if avg >= 0.5; false otherwise
  buckets Values by frequency of occurrence, with the key (numeric value where 1 = true; 0 = false), key_as_string (boolean value), and doc_count (count) for each
  count Number of non-empty values
  fill_rate Fraction of cells that are filled (1.0 = all cells are filled; 0.0 = all cells are empty)
  max Maximum value (1 or 0)
  max_as_string Maximum value as a string (true or false)
  min Minimum value (1 or 0)
  min_as_string Minimum value as a string (true or false)
  sum Sum of all non-empty values (true = 1; false = 0)

Advanced query mode

The advanced query mode lets you search specific columns, rather than entire rows. As with simple query mode, you must set row_limit greater than zero to get back any resulting rows. Search terms are not case sensitive.

To search a specific column, use the query parameter to specify the field name, followed by a colon (:) and the search term, for example:

?query_mode=advanced&query=employer_name:apple&row_limit=10

To search for a phrase, put the phrase in quotes and use URL encoded space characters (%20) within the quoted string, for example:

?query_mode=advanced&query=employer_name:"apple%20inc."&row_limit=10

To search for a partial match, use the * wildcard character as shown here:

?query_mode=advanced&query=employer_name:goog*&row_limit=10

To search for multiple terms within the same column, use parentheses and the appropriate logical operator (AND or OR). You must include a URL encoded space character (%20) before and after the operator, as shown here:

?query_mode=advanced&query=job_title:(software%20AND%20engineer)&row_limit=10

Within the parentheses, the default operator is AND, so in job_title:(software%20engineer) the AND operator is implied.

To exclude a term, use a minus (-) character (NOT) as shown in this example:

?query_mode=advanced&query=job_title:(software%20AND%20-engineer)&row_limit=10

For numeric columns, you can specify a range of values using URL encoded square brackets (%5B %5D) and the keyword “TO” (uppercase). You must include a URL encoded space character (%20) before and after “TO” as shown below. This example specifies the range [2 TO 10]:

?query_mode=advanced&query=total_workers:%5B2%20TO%2010%5D&row_limit=10

For date columns, ranges use the same [<start> TO <end>] syntax, with dates specified in YYYY-MM-DD format, for example:

?query_mode=advanced&query=decision_date:%5B2014-11-18%20TO%202014-11-19%5D&row_limit=10

When specifying a range, you can use * as the range start or range end to specify <= or >= respectively. This example specifies the range >= 5:

?query_mode=advanced&query=total_workers:%5B5%20TO%20*%5D&row_limit=10

To search multiple columns, put the search criteria for each column within parentheses and include the appropriate logical operator (AND or OR), for example:

?query_mode=advanced&query=(employer_name:apple)AND(job_title:"software%20engineer")&row_limit=10
?query_mode=advanced&query=(employer_name:apple)OR(employer_city:cupertino)&row_limit=10