Function: WebGetTable

Extract tabular data (rows and columns) from a source reference, using query conditions to identify the characteristics of the data in the event there are multiple tables that could be identified in the source.

Usage

  • HTTP GET
    https://www.xs.codexframework.com/api/WebGetTable?cc=CLIENT_CODE
    &url=ENCODED_URL
    &query=TABLE_CRITERIA_QUERY
    &tableValueFilter=TABLE_VALUE_FILTER
    &tableValueSort=TABLE_VALUE_SORT
    &includeHeader=true|false
    &headerRenaming=RENAMING_EXPRESSION
    &maxrow=ROW_COUNT
    &maxcol=COLUMN_COUNT
    &sdt=SERVER_DATE_TREATMENT

Parameters

  • CLIENT_CODE
    This is an authorization token that identifies your xSkrape.com account to which any credit charges will be applied. You can locate this code by logging into your xSkrape.com account, under My Account - Queries. (Remember: it's your duty to secure your client code against unauthorized use.) Required.
  • ENCODED_URL
    A reference to the source data to be queried. Can be a reference that is HTTP or HTTPS based. Can optionally be an expression that supports loading data across multiple requests (or performs other special parsing), as described here. The text needs to be URL encoded. Required.
  • TABLE_CRITERIA_QUERY
    A query specification to isolate a specific table from the raw source data. Details about the query syntax are covered here. If there are multiple tables that meet the given criteria, the first table is used. Optional. If this is omitted, uses "tableindex=1" as the default value. The text needs to be URL encoded.
  • TABLE_VALUE_FILTER
    A filter expression that can be applied to the table to help isolate a candidate row or rows of interest. The format for the expression is more or less the same as described here. Optional.
  • TABLE_VALUE_SORT
    A string that contains the column name followed by "ASC" (ascending) or "DESC" (descending). Columns are sorted ascending by default. Multiple columns can be separated by commas. Optional.
  • includeHeader
    If True, table header names are included in the returned data feed. Optional - default behavior is no table headers (just raw contents).
  • RENAMING_EXPRESSION
    An optional set of column renaming expressions of the format "OldColumnName becomes NewColumnName", delimited by "&&" if there are multiple. (See examples.) An empty string or omitted implies header names will be as they are in the data source.
  • ROW_COUNT
    Optionally specify the maximum number of rows to return.
  • COLUMN_COUNT
    Optionally specify the maximum number of columns to return.
  • SERVER_DATE_TREATMENT
    Valid values include "fromdate" which implies there is no date conversion undertaken (or an embedded explicit time zone within a matched date is used), or names a specific time zone which is assumed to apply to any matched date. The time zone name if given must be one of the standard time zone names listed here. Optional.

Return Value(s)

  • Returns JSON in all cases (success and failure).
  • Format: { "c":[{"cn":"columnName","dt":"dataType","fmt":formatString}, ...], "r":[{"v":["val1", "val2", ...],"n":["isnull1", "isnull2", ...]}, ...], "Success":true|false, "Message":"message", "Source":"source", "NoData":true|false, "RobotsTxtWarning":true|false, "Truncated":true|false, "HasHeader":true|false }
  • c: An array of column objects
  • columnName: Name of column
  • dataType: Data type for the column (standard .NET data types are used, e.g. "int32", "decimal", etc.)
  • formatString: A "suggested format string" provided by the service for values in the column (not always applicable)
  • r: An array of row objects
  • val1, 2, etc.: Values for specific cells (columns within rows), positionally based (i.e. first value matches first column, etc.) Date handling is described in more detail here.
  • isnull1, 2, etc.: 0 = false, 1 = true to identify "database nullable" values
  • Success: Indicates overall success or failure status of the request.
  • Message: In the case of failure, provides an error message.
  • Source: In the case of failure, identifies the origin of the message.
  • NoData: Indicates whether the query returned any data or not.
  • RobotsTxtWarning: If true, indicates that the accessed web site has a robots.txt file that may imply disallowed access (check with the site administrator or terms of use).
  • Truncated: If true, indicates that the data result was truncated due to metering settings on your xskrape.com user profile.
  • HasHeader: If true, the data set includes header name.

Remarks

This function is typically used to extract tabular data from a data source that either is tabular by nature (e.g. CSV, Excel), or tabular details can be inferred (e.g. HTML).

Examples

  • cURL

    1. Pull data from a tabular web source

    In this example we use the "curl" command line tool to illustate calling the WebGetTable API.

    curl --data-urlencode "cc=5e.....2e" --data-urlencode "url=https://www.quandl.com/api/v1/datasets/WIKI/AAPL.csv?sort_order=desc&collapse=daily&trim_start=2016-01-01" --data-urlencode "maxrow=2" -G "https://www.xs.codexframework.com/api/WebGetTable"

    Results:

    {"c":[{"cn":"Date","dt":"DateTime","fmt":null}, {"cn":"Open","dt":"Double","fmt":null}, {"cn":"High","dt":"Double","fmt":null}, {"cn":"Low","dt":"Double","fmt":null}, {"cn":"Close","dt":"Double","fmt":null}, {"cn":"Volume","dt":"Int32","fmt":null}, {"cn":"ExDividend","dt":"Double","fmt":null}, {"cn":"SplitRatio","dt":"Int32","fmt":null}, {"cn":"AdjOpen","dt":"Double","fmt":null}, {"cn":"AdjHigh","dt":"Double","fmt":null}, {"cn":"AdjLow","dt":"Double","fmt":null}, {"cn":"AdjClose","dt":"Double","fmt":null}, {"cn":"AdjVolume","dt":"Int32","fmt":null}], "r":[{"v":["1466726400000","92.91","94.655","92.65","93.4","65771655","0","1","92.91","94.655","92.65","93.4","65771655"],"n":["0","0","0","0","0","0","0","0","0","0","0","0","0"]}, {"v":["1466640000000","95.94","96.29","95.25","96.1","31381839","0","1","95.94","96.29","95.25","96.1","31381839"],"n":["0","0","0","0","0","0","0","0","0","0","0","0","0"]}], "Message":null, "Source":null, "NoData":false, "Success":true, "HasHeader":false, "RobotsTxtWarning":false, "Truncated":false}

    Of note, we've omitted most of our client token: you would substitute your own, as found when you log into your xSkrape.com account. Various parameters are encoded as part of our request to WebGetTable, and the result shows a successful completion (Success = true). Tabular data is returned using column definitions, along with arrays of values, per row. Row values are positional with respect to column definitions: the first row value is for the first column, the second value is for the second column, and so on. Nullability status for every row value is returned separately since there is a difference between an empty string and a null value.

  • jQuery

    2. Pull data from a tabular web source

    This is effectively the same example as above, except implemented using a JQuery get call.

    <div id="result"></div>
    <script type="text/javascript">
    var requestUrl = "https://www.xs.codexframework.com/api/WebGetTable";

    var parm = {
        cc: "5e.....2e",
        url: "https://www.quandl.com/api/v1/datasets/WIKI/AAPL.csv?sort_order=desc&collapse=daily&trim_start=2016-01-01",
        maxrow: 20
    };

    $.get(requestUrl, parm).done(function (data) {
        if (data.Success) {
            var i = 0;
            var alltext = "";
            while (i < data.r.length) {
                var j = 0;
                var rowtext = "";
                while (j < data.r[i].v.length) {
                    if (rowtext != "") {
                        rowtext = rowtext + ",";
                    }
                    rowtext = rowtext + data.r[i].v[j];
                    ++j;
                }
                alltext = alltext + rowtext + "<br/>";
                ++i;
            }
            $("#result").html(alltext);
        } else {
            $("#result").text(data.Message);
        }
    });
    </script>

Related