Function: WebGetTable
Extract tabular data (rows and columns) from a source reference, using query conditions to identify the characteristics of the data in the event there are multiple tables that could be identified in the source.
Usage
-
HTTP GET
https://www.xs.codexframework.com/api/WebGetTable?cc=CLIENT_CODE
&url=ENCODED_URL
&query=TABLE_CRITERIA_QUERY
&tableValueFilter=TABLE_VALUE_FILTER
&tableValueSort=TABLE_VALUE_SORT
&includeHeader=true|false
&headerRenaming=RENAMING_EXPRESSION
&maxrow=ROW_COUNT
&maxcol=COLUMN_COUNT
&sdt=SERVER_DATE_TREATMENT
Parameters
-
CLIENT_CODEThis is an authorization token that identifies your xSkrape.com account to which any credit charges will be applied. You can locate this code by logging into your xSkrape.com account, under My Account - Queries. (Remember: it's your duty to secure your client code against unauthorized use.) Required.
-
ENCODED_URLA reference to the source data to be queried. Can be a reference that is HTTP or HTTPS based. Can optionally be an expression that supports loading data across multiple requests (or performs other special parsing), as described here. The text needs to be URL encoded. Required.
-
TABLE_CRITERIA_QUERYA query specification to isolate a specific table from the raw source data. Details about the query syntax are covered here. If there are multiple tables that meet the given criteria, the first table is used. Optional. If this is omitted, uses "tableindex=1" as the default value. The text needs to be URL encoded.
-
TABLE_VALUE_FILTERA filter expression that can be applied to the table to help isolate a candidate row or rows of interest. The format for the expression is more or less the same as described here. Optional.
-
TABLE_VALUE_SORTA string that contains the column name followed by "ASC" (ascending) or "DESC" (descending). Columns are sorted ascending by default. Multiple columns can be separated by commas. Optional.
-
includeHeaderIf True, table header names are included in the returned data feed. Optional - default behavior is no table headers (just raw contents).
-
RENAMING_EXPRESSIONAn optional set of column renaming expressions of the format "OldColumnName becomes NewColumnName", delimited by "&&" if there are multiple. (See examples.) An empty string or omitted implies header names will be as they are in the data source.
-
ROW_COUNTOptionally specify the maximum number of rows to return.
-
COLUMN_COUNTOptionally specify the maximum number of columns to return.
-
SERVER_DATE_TREATMENTValid values include "fromdate" which implies there is no date conversion undertaken (or an embedded explicit time zone within a matched date is used), or names a specific time zone which is assumed to apply to any matched date. The time zone name if given must be one of the standard time zone names listed here. Optional.
Return Value(s)
- Returns JSON in all cases (success and failure).
- Format: { "c":[{"cn":"columnName","dt":"dataType","fmt":formatString}, ...], "r":[{"v":["val1", "val2", ...],"n":["isnull1", "isnull2", ...]}, ...], "Success":true|false, "Message":"message", "Source":"source", "NoData":true|false, "RobotsTxtWarning":true|false, "Truncated":true|false, "HasHeader":true|false }
- c: An array of column objects
- columnName: Name of column
- dataType: Data type for the column (standard .NET data types are used, e.g. "int32", "decimal", etc.)
- formatString: A "suggested format string" provided by the service for values in the column (not always applicable)
- r: An array of row objects
- val1, 2, etc.: Values for specific cells (columns within rows), positionally based (i.e. first value matches first column, etc.) Date handling is described in more detail here.
- isnull1, 2, etc.: 0 = false, 1 = true to identify "database nullable" values
- Success: Indicates overall success or failure status of the request.
- Message: In the case of failure, provides an error message.
- Source: In the case of failure, identifies the origin of the message.
- NoData: Indicates whether the query returned any data or not.
- RobotsTxtWarning: If true, indicates that the accessed web site has a robots.txt file that may imply disallowed access (check with the site administrator or terms of use).
- Truncated: If true, indicates that the data result was truncated due to metering settings on your xskrape.com user profile.
- HasHeader: If true, the data set includes header name.
Remarks
This function is typically used to extract tabular data from a data source that either is tabular by nature (e.g. CSV, Excel), or tabular details can be inferred (e.g. HTML).
Examples
-
cURL
1. Pull data from a tabular web source
In this example we use the "curl" command line tool to illustate calling the WebGetTable API.
curl --data-urlencode "cc=5e.....2e" --data-urlencode "url=https://www.quandl.com/api/v1/datasets/WIKI/AAPL.csv?sort_order=desc&collapse=daily&trim_start=2016-01-01" --data-urlencode "maxrow=2" -G "https://www.xs.codexframework.com/api/WebGetTable"
Results:
{"c":[{"cn":"Date","dt":"DateTime","fmt":null}, {"cn":"Open","dt":"Double","fmt":null}, {"cn":"High","dt":"Double","fmt":null}, {"cn":"Low","dt":"Double","fmt":null}, {"cn":"Close","dt":"Double","fmt":null}, {"cn":"Volume","dt":"Int32","fmt":null}, {"cn":"ExDividend","dt":"Double","fmt":null}, {"cn":"SplitRatio","dt":"Int32","fmt":null}, {"cn":"AdjOpen","dt":"Double","fmt":null}, {"cn":"AdjHigh","dt":"Double","fmt":null}, {"cn":"AdjLow","dt":"Double","fmt":null}, {"cn":"AdjClose","dt":"Double","fmt":null}, {"cn":"AdjVolume","dt":"Int32","fmt":null}], "r":[{"v":["1466726400000","92.91","94.655","92.65","93.4","65771655","0","1","92.91","94.655","92.65","93.4","65771655"],"n":["0","0","0","0","0","0","0","0","0","0","0","0","0"]}, {"v":["1466640000000","95.94","96.29","95.25","96.1","31381839","0","1","95.94","96.29","95.25","96.1","31381839"],"n":["0","0","0","0","0","0","0","0","0","0","0","0","0"]}], "Message":null, "Source":null, "NoData":false, "Success":true, "HasHeader":false, "RobotsTxtWarning":false, "Truncated":false}
Of note, we've omitted most of our client token: you would substitute your own, as found when you log into your xSkrape.com account. Various parameters are encoded as part of our request to WebGetTable, and the result shows a successful completion (Success = true). Tabular data is returned using column definitions, along with arrays of values, per row. Row values are positional with respect to column definitions: the first row value is for the first column, the second value is for the second column, and so on. Nullability status for every row value is returned separately since there is a difference between an empty string and a null value.
-
jQuery
2. Pull data from a tabular web source
This is effectively the same example as above, except implemented using a JQuery get call.
<div id="result"></div>
<script type="text/javascript">
var requestUrl = "https://www.xs.codexframework.com/api/WebGetTable";
var parm = {
cc: "5e.....2e",
url: "https://www.quandl.com/api/v1/datasets/WIKI/AAPL.csv?sort_order=desc&collapse=daily&trim_start=2016-01-01",
maxrow: 20
};
$.get(requestUrl, parm).done(function (data) {
if (data.Success) {
var i = 0;
var alltext = "";
while (i < data.r.length) {
var j = 0;
var rowtext = "";
while (j < data.r[i].v.length) {
if (rowtext != "") {
rowtext = rowtext + ",";
}
rowtext = rowtext + data.r[i].v[j];
++j;
}
alltext = alltext + rowtext + "<br/>";
++i;
}
$("#result").html(alltext);
} else {
$("#result").text(data.Message);
}
});
</script>
Related
- WebGetTable as documented for the xSkrape Excel Web Add-in
- WebGetTable as documented for the xSkrape Excel Desktop Add-in
- WebGetTable as documented for SQL Server