Function: WebGetSingle
Extract a single value (number, text, etc.) from a source reference, using query conditions that identify the value of interest.
Usage
-
SQL Server
WebGetSingle(@serviceUrl nvarchar(1000)
, @parseUrl nvarchar(MAX)
, @query nvarchar(MAX)
, @additionalConfig nvarchar(2000)
, @noDataValue nvarchar(MAX))
Parameters
-
SQL Server
-
@serviceUrlA URL reference to your SQL-Hero application server services. Typically this would be "http://servername:46837" where "servername" matches the server on which you installed the SQL-Hero services, with port 46837 being the default port number used by the SQL-Hero installer program.
-
@parseUrlA reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based.
-
@queryA query specification to isolate the specific data element from the raw source data. Details about the query syntax are covered here.
-
@additionalConfigAn optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Provide an empty string or NULL if there are no configuration parameters needed (i.e. default behavior).
-
@noDataValueWhen no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. (NULL is a typical value to use for this parameter.)
-
Return Value(s)
-
SQL Server
- Returns nvarchar(MAX)
- Can be cast to other types, if the returned string is castable.
- No data matching the query expression returns the value of the @noDataValue parameter.
- Error values can be returned as well, depending on your "ErrorHandling" configuration setting.
Remarks
This function is typically used to extract a single value from a non-tabular data source such as HTML, JSON or XML. For tabular sources like CSV files, etc. - one would typically use functions such as WebGetTable or WebGetSingleFromTable. Some pre-parsing of the return value can be done depending on the query expression used. For the Excel "Web" add-in, functionality described here for the Desktop add-in can be carried over based on the parameter names mapping to the fields made available in the data connection in the add-in task pane.
Examples
-
SQL Server
1. Pull data from a HTML page - based on related text
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', NULL, NULL);
Opening the http://www.marketwatch.com/investing/stock/MSFT page, notice there is the text "Rev. per Employee", followed closely by an amount in dollars. With this example, we're extracting the dollar value using the "currencyfollowsnear" query which works here since the underlying HTML looks something like this:
<div class="section">
<p class="column">Rev. per Employee</p>
<p class="data lastcolumn">$737,992</p>
</div>
2. Pull data from a HTML page - based on specific path to the data
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT/financials', 'xpath=/span[1]/html[1]/body[1]/tr[5]/td[6]/text()', NULL, NULL);
The use of "xpath" assumes a fixed page location. In the example above, it's the first span element, containing the first html element, containig the first body element, containing the fifth tr element (table row), containing the sixth td element (table column). (If this seems arcane to determine: Page Explorer helps a lot to determine a valid xpath expression.) This happens to match a specific data element that on 7/15/15 is "59.8B". As that value changes, the xpath query will return the current value: however, if the page structure changes - for example, the table is restructured to have additional preceding rows - then the query would not return the expected data (if any).
3. Pull and parse a dollar value with a "suffix" from a HTML page
SELECT dbo.WebGetSingle('http://localhost', 'http://finance.yahoo.com/q/ks?s=GS+Key+Statistics', 'numberwithsuffixfollowsnear=yfs_j10_', NULL, NULL);
In this example, we could have text such as "92.00B" which is reflective of "92 billion", and this would be parsed as "9200000000" when returned to Excel.
4. Pull data from a HTML page, uncached and with long timeout
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'Caching=False;Timeout=200', NULL);
By default, data returned per URL is cached for later use (by default for 15 minutes, up to 50 pages). Using "Caching=False", the request here will not use potentially cached data. In addition, "Timeout=200" increases the timeout for the request to 200 seconds (default is 90 seconds).
5. Adjusting throttling settings
SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'MaxRequestCount=2000;MaxRequestIntervalSizeSeconds=300;MaxRequestWaitSeconds=120;MinRequestIntervalMilliseconds=10', NULL);
With the above configuration settings, we allow up to 2000 page requests per 300 second interval. Furthermore, we will wait up to 120 seconds if we do exceed the maximum, assuming such a wait could result in falling below the maximum threshold. Additionally, we won't issue roughly more than 100 requests per second since we require a minimum interval between request starts of 10 milliseconds.
6. Pull data from a text file on a local drive and extract a value based on a regular expression
SELECT dbo.WebGetSingle('http://localhost', 'file://c:\temp\sample.txt', 'removeregex=(.|\n)+?PROD:\s*484AB && regex=(?<=LOC:\s*)[A-Z0-9-]+', NULL, NULL);
This example assumes you have a file called sample.txt in your c:\temp directory and that your SQL-Hero services have permission to access it. Here's an example of what it might contain:
PROD: 091AB TIB: 01/07/2016
LOC: 391-58A POD: 01911
-----------------------------------------
PROD: 484AB TIB: 06/17/2016
LOC: 391-59A POD: 01919
-----------------------------------------
PROD: 810CB TIB: 04/09/2016
LOC: 391-60B POD: 01833Say we wanted to extract the code "391-59A" which is associated with the "LOC" value for the "PROD" value of "484AB". We achieve that by using "removeregex" to strip text that precedes our PROD code of interest. This allows the latter "regex" query to find the first match of the "LOC" field. Our "LOC" regular expression matches the alphanumeric portion that follows the text "LOC:", not including the "LOC:" text (since we used a look-behind match for that).
7. Error handling scenarios
SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', NULL, NULL);
The above function call is intended to fail - there is no domain "http://wowowowowo". Default error handling behavior for xSkrape for SQL is to return error text such as would happen here. This returns a value of "xSkrape: The remote name could not be resolved: 'wowowowowo'". If we change the error handling setting to "Exception", a SQL error ("A .NET Framework error occurred during execution of user-defined routine or aggregate "WebGetSingle"", severity 16) is returned:
SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', 'ErrorHandling=Exception', NULL)
Other error handling options for SQL include "NoData" (use the "no data" value) and "ReturnValue" (default).
Related
- WebGetSingle is mentioned in the Using WebGetSingle against an RSS feed example for SQL Server
- WebGetSingle is mentioned in the Pulling a cloud-based Excel file into SQL Server example for SQL Server
- Make a suggestion about this documentation (for SQL Server)