Function: WebGetSingle
Extract a single value (number, text, etc.) from a source reference, using query conditions that identify the value of interest.
Usage
-
Excel (Online)
-
Excel (Desktop)
=WebGetSingle(string url
, string query
, string additionalConfig = ""
, object noDataValue = null) -
SQL Server
WebGetSingle(@serviceUrl nvarchar(1000)
, @parseUrl nvarchar(MAX)
, @query nvarchar(MAX)
, @additionalConfig nvarchar(2000)
, @noDataValue nvarchar(MAX))
Parameters
-
Excel
-
TargetNames the specific cell that will receive the data value, or the target can be a cell range (either 1 row or 1 column wide) that allows multiple data values to be returned, evaluated using each cell of the range. Only applies to Excel "Web" Add-in.
-
urlA reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based. Can optionally be an expression that supports loading data across multiple requests (or performs other special parsing), as described here. Required. Excel "Web" settings.
-
queryA query specification to isolate the specific data element from the raw source data. Details about the query syntax are covered here. Required. Excel "Web" settings.
-
additionalConfigAn optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Optional - provide an empty string or omit the parameter if there are no configuration parameters needed (i.e. default behavior).
-
noDataValueWhen no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. Optional - no value is treated as an empty string.
-
-
SQL Server
-
@serviceUrlA URL reference to your SQL-Hero application server services. Typically this would be "http://servername:46837" where "servername" matches the server on which you installed the SQL-Hero services, with port 46837 being the default port number used by the SQL-Hero installer program.
-
@parseUrlA reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based.
-
@queryA query specification to isolate the specific data element from the raw source data. Details about the query syntax are covered here.
-
@additionalConfigAn optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Provide an empty string or NULL if there are no configuration parameters needed (i.e. default behavior).
-
@noDataValueWhen no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. (NULL is a typical value to use for this parameter.)
-
Return Value(s)
-
Excel
- Returns a value of an appropriate type. For example, a value that looks like a number will be returned as a numeric (i.e. can be formatted as a number). Populates exactly one cell for the desktop add-in - can populate multiple for "Web" add-in (see how).
- Values that can't be parsed to a native type are left as strings.
- No data matching the query expression returns the value of the noDataValue parameter (if none provided, returns an empty string).
- Error values can be returned as well, depending on your "ErrorHandling" configuration setting.
-
SQL Server
- Returns nvarchar(MAX)
- Can be cast to other types, if the returned string is castable.
- No data matching the query expression returns the value of the @noDataValue parameter.
- Error values can be returned as well, depending on your "ErrorHandling" configuration setting.
Remarks
This function is typically used to extract a single value from a non-tabular data source such as HTML, JSON or XML. For tabular sources like CSV files, etc. - one would typically use functions such as WebGetTable or WebGetSingleFromTable. Some pre-parsing of the return value can be done depending on the query expression used. For the Excel "Web" add-in, functionality described here for the Desktop add-in can be carried over based on the parameter names mapping to the fields made available in the data connection in the add-in task pane.
Examples
-
Excel
1. Pull data from a HTML page - based on related text
=WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee")
Opening the http://www.marketwatch.com/investing/stock/MSFT page, notice there is the text "Rev. per Employee", followed closely by an amount in dollars. With this example, we're extracting the dollar value using the "currencyfollowsnear" query which works here since the underlying HTML looks something like this:
<div class="section">
<p class="column">Rev. per Employee</p>
<p class="data lastcolumn">$737,992</p>
</div>
2. Pull data from a HTML page - based on specific path to the data
=WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT/financials", "xpath=/span[1]/html[1]/body[1]/tr[5]/td[6]/text()")
The use of "xpath" assumes a fixed page location. In the example above, it's the first span element, containing the first html element, containig the first body element, containing the fifth tr element (table row), containing the sixth td element (table column). (If this seems arcane to determine: Page Explorer helps a lot to determine a valid xpath expression.) This happens to match a specific data element that on 7/15/15 is "59.8B". As that value changes, the xpath query will return the current value: however, if the page structure changes - for example, the table is restructured to have additional preceding rows - then the query would not return the expected data (if any).
3. Pull and parse a dollar value with a "suffix" from a HTML page
=WebGetSingle("http://finance.yahoo.com/q/ks?s=GS+Key+Statistics", "numberwithsuffixfollowsnear=yfs_j10_")
In this example, we could have text such as "92.00B" which is reflective of "92 billion", and this would be parsed as "9200000000" when returned to Excel.
4. Pull data from a HTML page, uncached and with long timeout
=WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee", "Caching=False;Timeout=200")
By default, data returned per URL is cached for later use (by default for 15 minutes, up to 50 pages). Using "Caching=False", the request here will not use potentially cached data. In addition, "Timeout=200" increases the timeout for the request to 200 seconds (default is 90 seconds).
5. Adjusting throttling settings
=WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee", "MaxRequestCount=2000;MaxRequestIntervalSizeSeconds=300;MaxRequestWaitSeconds=120;MinRequestIntervalMilliseconds=10")
With the above configuration settings, we allow up to 2000 page requests per 300 second interval. Furthermore, we will wait up to 120 seconds if we do exceed the maximum, assuming such a wait could result in falling below the maximum threshold. Additionally, we won't issue roughly more than 100 requests per second since we require a minimum interval between request starts of 10 milliseconds.
6. Pull data from a text file on a local drive and extract a value based on a regular expression
=WebGetSingle("file://c:\temp\sample.txt", "removeregex=(.|\n)+?PROD:\s*484AB && regex=(?<=LOC:\s*)[A-Z0-9-]+")
This example assumes you have a file called sample.txt in your c:\temp directory. Here's an example of what it might contain:
PROD: 091AB TIB: 01/07/2016
LOC: 391-58A POD: 01911
-----------------------------------------
PROD: 484AB TIB: 06/17/2016
LOC: 391-59A POD: 01919
-----------------------------------------
PROD: 810CB TIB: 04/09/2016
LOC: 391-60B POD: 01833Say we wanted to extract the code "391-59A" which is associated with the "LOC" value for the "PROD" value of "484AB". We achieve that by using "removeregex" to strip text that precedes our PROD code of interest. This allows the latter "regex" query to find the first match of the "LOC" field. Our "LOC" regular expression matches the alphanumeric portion that follows the text "LOC:", not including the "LOC:" text (since we used a look-behind match for that).
7. Error handling scenarios
=WebGetSingle("http://wowowowowo", "currencyfollowsnear=XYZ")
The above function call is intended to fail - there is no domain "http://wowowowowo". Default error handling behavior for xSkrape for Excel is to rethrow exceptions such as would happen here. This results in getting a "#VALUE!" message in the target cell. If we change the error handling setting to "ReturnValue", error details are returned within the target cell:
=WebGetSingle("http://wowowowowo", "currencyfollowsnear=XYZ", "ErrorHandling=ReturnValue")
With the above adjustment, we see "xSkrape: The remote name could not be resolved: 'wowowowowo'" in the cell containing the function call. xSkrape errors are preceded by "xSkrape:" as one way to identify a failure. Other error handling options for Excel include "NoData" (use the "no data" value) and "Exception" (default).
-
SQL Server
1. Pull data from a HTML page - based on related text
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', NULL, NULL);
Opening the http://www.marketwatch.com/investing/stock/MSFT page, notice there is the text "Rev. per Employee", followed closely by an amount in dollars. With this example, we're extracting the dollar value using the "currencyfollowsnear" query which works here since the underlying HTML looks something like this:
<div class="section">
<p class="column">Rev. per Employee</p>
<p class="data lastcolumn">$737,992</p>
</div>
2. Pull data from a HTML page - based on specific path to the data
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT/financials', 'xpath=/span[1]/html[1]/body[1]/tr[5]/td[6]/text()', NULL, NULL);
The use of "xpath" assumes a fixed page location. In the example above, it's the first span element, containing the first html element, containig the first body element, containing the fifth tr element (table row), containing the sixth td element (table column). (If this seems arcane to determine: Page Explorer helps a lot to determine a valid xpath expression.) This happens to match a specific data element that on 7/15/15 is "59.8B". As that value changes, the xpath query will return the current value: however, if the page structure changes - for example, the table is restructured to have additional preceding rows - then the query would not return the expected data (if any).
3. Pull and parse a dollar value with a "suffix" from a HTML page
SELECT dbo.WebGetSingle('http://localhost', 'http://finance.yahoo.com/q/ks?s=GS+Key+Statistics', 'numberwithsuffixfollowsnear=yfs_j10_', NULL, NULL);
In this example, we could have text such as "92.00B" which is reflective of "92 billion", and this would be parsed as "9200000000" when returned to Excel.
4. Pull data from a HTML page, uncached and with long timeout
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'Caching=False;Timeout=200', NULL);
By default, data returned per URL is cached for later use (by default for 15 minutes, up to 50 pages). Using "Caching=False", the request here will not use potentially cached data. In addition, "Timeout=200" increases the timeout for the request to 200 seconds (default is 90 seconds).
5. Adjusting throttling settings
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'MaxRequestCount=2000;MaxRequestIntervalSizeSeconds=300;MaxRequestWaitSeconds=120;MinRequestIntervalMilliseconds=10', NULL);
With the above configuration settings, we allow up to 2000 page requests per 300 second interval. Furthermore, we will wait up to 120 seconds if we do exceed the maximum, assuming such a wait could result in falling below the maximum threshold. Additionally, we won't issue roughly more than 100 requests per second since we require a minimum interval between request starts of 10 milliseconds.
6. Pull data from a text file on a local drive and extract a value based on a regular expression
SELECT dbo.WebGetSingle('http://localhost', 'file://c:\temp\sample.txt', 'removeregex=(.|\n)+?PROD:\s*484AB && regex=(?<=LOC:\s*)[A-Z0-9-]+', NULL, NULL);
This example assumes you have a file called sample.txt in your c:\temp directory and that your SQL-Hero services have permission to access it. Here's an example of what it might contain:
PROD: 091AB TIB: 01/07/2016
LOC: 391-58A POD: 01911
-----------------------------------------
PROD: 484AB TIB: 06/17/2016
LOC: 391-59A POD: 01919
-----------------------------------------
PROD: 810CB TIB: 04/09/2016
LOC: 391-60B POD: 01833Say we wanted to extract the code "391-59A" which is associated with the "LOC" value for the "PROD" value of "484AB". We achieve that by using "removeregex" to strip text that precedes our PROD code of interest. This allows the latter "regex" query to find the first match of the "LOC" field. Our "LOC" regular expression matches the alphanumeric portion that follows the text "LOC:", not including the "LOC:" text (since we used a look-behind match for that).
7. Error handling scenarios
SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', NULL, NULL);
The above function call is intended to fail - there is no domain "http://wowowowowo". Default error handling behavior for xSkrape for SQL is to return error text such as would happen here. This returns a value of "xSkrape: The remote name could not be resolved: 'wowowowowo'". If we change the error handling setting to "Exception", a SQL error ("A .NET Framework error occurred during execution of user-defined routine or aggregate "WebGetSingle"", severity 16) is returned:
SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', 'ErrorHandling=Exception', NULL)
Other error handling options for SQL include "NoData" (use the "no data" value) and "ReturnValue" (default).
Related
- WebGetSingle is mentioned in the Using WebGetSingle and/or WebGetMultiple against an RSS feed example for Excel
- Make a suggestion about this documentation (for Excel)
- WebGetSingle is mentioned in the Using WebGetSingle against an RSS feed example for SQL Server
- WebGetSingle is mentioned in the Pulling a cloud-based Excel file into SQL Server example for SQL Server
- Make a suggestion about this documentation (for SQL Server)