Function: WebGetSingle

Extract a single value (number, text, etc.) from a source reference, using query conditions that identify the value of interest.

Usage

  • Excel (Online)
  • Excel (Desktop)
    =WebGetSingle(string url
    , string query
    , string additionalConfig = ""
    , object noDataValue = null)
  • SQL Server
    WebGetSingle(@serviceUrl nvarchar(1000)
    , @parseUrl nvarchar(MAX)
    , @query nvarchar(MAX)
    , @additionalConfig nvarchar(2000)
    , @noDataValue nvarchar(MAX))

Parameters

  • Excel
    • Target
      Names the specific cell that will receive the data value, or the target can be a cell range (either 1 row or 1 column wide) that allows multiple data values to be returned, evaluated using each cell of the range. Only applies to Excel "Web" Add-in.
    • url
      A reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based. Can optionally be an expression that supports loading data across multiple requests (or performs other special parsing), as described here. Required. Excel "Web" settings.
    • query
      A query specification to isolate the specific data element from the raw source data. Details about the query syntax are covered here. Required. Excel "Web" settings.
    • additionalConfig
      An optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Optional - provide an empty string or omit the parameter if there are no configuration parameters needed (i.e. default behavior).
    • noDataValue
      When no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. Optional - no value is treated as an empty string.
  • SQL Server
    • @serviceUrl
      A URL reference to your SQL-Hero application server services. Typically this would be "http://servername:46837" where "servername" matches the server on which you installed the SQL-Hero services, with port 46837 being the default port number used by the SQL-Hero installer program.
    • @parseUrl
      A reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based.
    • @query
      A query specification to isolate the specific data element from the raw source data. Details about the query syntax are covered here.
    • @additionalConfig
      An optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Provide an empty string or NULL if there are no configuration parameters needed (i.e. default behavior).
    • @noDataValue
      When no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. (NULL is a typical value to use for this parameter.)

Return Value(s)

  • Excel
    • Returns a value of an appropriate type. For example, a value that looks like a number will be returned as a numeric (i.e. can be formatted as a number). Populates exactly one cell for the desktop add-in - can populate multiple for "Web" add-in (see how).
    • Values that can't be parsed to a native type are left as strings.
    • No data matching the query expression returns the value of the noDataValue parameter (if none provided, returns an empty string).
    • Error values can be returned as well, depending on your "ErrorHandling" configuration setting.
  • SQL Server
    • Returns nvarchar(MAX)
    • Can be cast to other types, if the returned string is castable.
    • No data matching the query expression returns the value of the @noDataValue parameter.
    • Error values can be returned as well, depending on your "ErrorHandling" configuration setting.

Remarks

This function is typically used to extract a single value from a non-tabular data source such as HTML, JSON or XML. For tabular sources like CSV files, etc. - one would typically use functions such as WebGetTable or WebGetSingleFromTable. Some pre-parsing of the return value can be done depending on the query expression used. For the Excel "Web" add-in, functionality described here for the Desktop add-in can be carried over based on the parameter names mapping to the fields made available in the data connection in the add-in task pane.

Examples

  • Excel

    1. Pull data from a HTML page - based on related text

    =WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee")

    Opening the http://www.marketwatch.com/investing/stock/MSFT page, notice there is the text "Rev. per Employee", followed closely by an amount in dollars. With this example, we're extracting the dollar value using the "currencyfollowsnear" query which works here since the underlying HTML looks something like this:

    <div class="section">
        <p class="column">Rev. per Employee</p>
        <p class="data lastcolumn">$737,992</p>
    </div>

    2. Pull data from a HTML page - based on specific path to the data

    =WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT/financials", "xpath=/span[1]/html[1]/body[1]/tr[5]/td[6]/text()")

    The use of "xpath" assumes a fixed page location. In the example above, it's the first span element, containing the first html element, containig the first body element, containing the fifth tr element (table row), containing the sixth td element (table column). (If this seems arcane to determine: Page Explorer helps a lot to determine a valid xpath expression.) This happens to match a specific data element that on 7/15/15 is "59.8B". As that value changes, the xpath query will return the current value: however, if the page structure changes - for example, the table is restructured to have additional preceding rows - then the query would not return the expected data (if any).

    3. Pull and parse a dollar value with a "suffix" from a HTML page

    =WebGetSingle("http://finance.yahoo.com/q/ks?s=GS+Key+Statistics", "numberwithsuffixfollowsnear=yfs_j10_")

    In this example, we could have text such as "92.00B" which is reflective of "92 billion", and this would be parsed as "9200000000" when returned to Excel.

    4. Pull data from a HTML page, uncached and with long timeout

    =WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee", "Caching=False;Timeout=200")

    By default, data returned per URL is cached for later use (by default for 15 minutes, up to 50 pages). Using "Caching=False", the request here will not use potentially cached data. In addition, "Timeout=200" increases the timeout for the request to 200 seconds (default is 90 seconds).

    5. Adjusting throttling settings

    =WebGetSingle("http://www.marketwatch.com/investing/stock/MSFT", "currencyfollowsnear=Rev\.\ per\ Employee", "MaxRequestCount=2000;MaxRequestIntervalSizeSeconds=300;MaxRequestWaitSeconds=120;MinRequestIntervalMilliseconds=10")

    With the above configuration settings, we allow up to 2000 page requests per 300 second interval. Furthermore, we will wait up to 120 seconds if we do exceed the maximum, assuming such a wait could result in falling below the maximum threshold. Additionally, we won't issue roughly more than 100 requests per second since we require a minimum interval between request starts of 10 milliseconds.

    6. Pull data from a text file on a local drive and extract a value based on a regular expression

    =WebGetSingle("file://c:\temp\sample.txt", "removeregex=(.|\n)+?PROD:\s*484AB && regex=(?<=LOC:\s*)[A-Z0-9-]+")

    This example assumes you have a file called sample.txt in your c:\temp directory. Here's an example of what it might contain:

    PROD: 091AB              TIB: 01/07/2016
    LOC: 391-58A             POD: 01911
    -----------------------------------------
    PROD: 484AB              TIB: 06/17/2016
    LOC: 391-59A             POD: 01919
    -----------------------------------------
    PROD: 810CB              TIB: 04/09/2016
    LOC: 391-60B             POD: 01833

    Say we wanted to extract the code "391-59A" which is associated with the "LOC" value for the "PROD" value of "484AB". We achieve that by using "removeregex" to strip text that precedes our PROD code of interest. This allows the latter "regex" query to find the first match of the "LOC" field. Our "LOC" regular expression matches the alphanumeric portion that follows the text "LOC:", not including the "LOC:" text (since we used a look-behind match for that).

    7. Error handling scenarios

    =WebGetSingle("http://wowowowowo", "currencyfollowsnear=XYZ")

    The above function call is intended to fail - there is no domain "http://wowowowowo". Default error handling behavior for xSkrape for Excel is to rethrow exceptions such as would happen here. This results in getting a "#VALUE!" message in the target cell. If we change the error handling setting to "ReturnValue", error details are returned within the target cell:

    =WebGetSingle("http://wowowowowo", "currencyfollowsnear=XYZ", "ErrorHandling=ReturnValue")

    With the above adjustment, we see "xSkrape: The remote name could not be resolved: 'wowowowowo'" in the cell containing the function call. xSkrape errors are preceded by "xSkrape:" as one way to identify a failure. Other error handling options for Excel include "NoData" (use the "no data" value) and "Exception" (default).

  • SQL Server

    1. Pull data from a HTML page - based on related text

    SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', NULL, NULL);

    Opening the http://www.marketwatch.com/investing/stock/MSFT page, notice there is the text "Rev. per Employee", followed closely by an amount in dollars. With this example, we're extracting the dollar value using the "currencyfollowsnear" query which works here since the underlying HTML looks something like this:

    <div class="section">
        <p class="column">Rev. per Employee</p>
        <p class="data lastcolumn">$737,992</p>
    </div>

    2. Pull data from a HTML page - based on specific path to the data

    SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT/financials', 'xpath=/span[1]/html[1]/body[1]/tr[5]/td[6]/text()', NULL, NULL);

    The use of "xpath" assumes a fixed page location. In the example above, it's the first span element, containing the first html element, containig the first body element, containing the fifth tr element (table row), containing the sixth td element (table column). (If this seems arcane to determine: Page Explorer helps a lot to determine a valid xpath expression.) This happens to match a specific data element that on 7/15/15 is "59.8B". As that value changes, the xpath query will return the current value: however, if the page structure changes - for example, the table is restructured to have additional preceding rows - then the query would not return the expected data (if any).

    3. Pull and parse a dollar value with a "suffix" from a HTML page

    SELECT dbo.WebGetSingle('http://localhost', 'http://finance.yahoo.com/q/ks?s=GS+Key+Statistics', 'numberwithsuffixfollowsnear=yfs_j10_', NULL, NULL);

    In this example, we could have text such as "92.00B" which is reflective of "92 billion", and this would be parsed as "9200000000" when returned to Excel.

    4. Pull data from a HTML page, uncached and with long timeout

    SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'Caching=False;Timeout=200', NULL);

    By default, data returned per URL is cached for later use (by default for 15 minutes, up to 50 pages). Using "Caching=False", the request here will not use potentially cached data. In addition, "Timeout=200" increases the timeout for the request to 200 seconds (default is 90 seconds).

    5. Adjusting throttling settings

    SELECT dbo.WebGetSingle('http://localhost', 'http://www.marketwatch.com/investing/stock/MSFT', 'currencyfollowsnear=Rev\.\ per\ Employee', 'MaxRequestCount=2000;MaxRequestIntervalSizeSeconds=300;MaxRequestWaitSeconds=120;MinRequestIntervalMilliseconds=10', NULL);

    With the above configuration settings, we allow up to 2000 page requests per 300 second interval. Furthermore, we will wait up to 120 seconds if we do exceed the maximum, assuming such a wait could result in falling below the maximum threshold. Additionally, we won't issue roughly more than 100 requests per second since we require a minimum interval between request starts of 10 milliseconds.

    6. Pull data from a text file on a local drive and extract a value based on a regular expression

    SELECT dbo.WebGetSingle('http://localhost', 'file://c:\temp\sample.txt', 'removeregex=(.|\n)+?PROD:\s*484AB && regex=(?<=LOC:\s*)[A-Z0-9-]+', NULL, NULL);

    This example assumes you have a file called sample.txt in your c:\temp directory and that your SQL-Hero services have permission to access it. Here's an example of what it might contain:

    PROD: 091AB              TIB: 01/07/2016
    LOC: 391-58A             POD: 01911
    -----------------------------------------
    PROD: 484AB              TIB: 06/17/2016
    LOC: 391-59A             POD: 01919
    -----------------------------------------
    PROD: 810CB              TIB: 04/09/2016
    LOC: 391-60B             POD: 01833

    Say we wanted to extract the code "391-59A" which is associated with the "LOC" value for the "PROD" value of "484AB". We achieve that by using "removeregex" to strip text that precedes our PROD code of interest. This allows the latter "regex" query to find the first match of the "LOC" field. Our "LOC" regular expression matches the alphanumeric portion that follows the text "LOC:", not including the "LOC:" text (since we used a look-behind match for that).

    7. Error handling scenarios

    SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', NULL, NULL);

    The above function call is intended to fail - there is no domain "http://wowowowowo". Default error handling behavior for xSkrape for SQL is to return error text such as would happen here. This returns a value of "xSkrape: The remote name could not be resolved: 'wowowowowo'". If we change the error handling setting to "Exception", a SQL error ("A .NET Framework error occurred during execution of user-defined routine or aggregate "WebGetSingle"", severity 16) is returned:

    SELECT dbo.WebGetSingle('http://localhost', 'http://wowowowowo', 'currencyfollowsnear=XYZ', 'ErrorHandling=Exception', NULL)

    Other error handling options for SQL include "NoData" (use the "no data" value) and "ReturnValue" (default).

Related