Function: WebGetSingleFromTable

Extract a single value (number, text, etc.) from a source reference, using query conditions that identify both a table and the row(s) and column that contain the specific data value of interest.

Usage

  • Excel (Online)
  • Excel (Desktop)
    =WebGetSingleFromTable(string url
    , string tableCriteriaQuery
    , string tableColumnName
    , string tableValueFilter = ""
    , string tableValueSort = ""
    , string additionalConfig = ""
    , object noDataValue = null)
  • SQL Server
    WebGetSingleFromTable(@serviceUrl nvarchar(1000)
    , @parseUrl nvarchar(MAX)
    , @tableCriteriaQuery nvarchar(MAX)
    , @tableColumnName nvarchar(1000)
    , @tableValueFilter nvarchar(1000)
    , @tableValueSort nvarchar(1000)
    , @additionalConfig nvarchar(2000)
    , @noDataValue nvarchar(MAX))

Parameters

  • Excel
    • Target
      Names the specific cell that will receive the data value, or the target can be a cell range (either 1 row or 1 column wide) that allows multiple data values to be returned, evaluated using each cell of the range. Only applies to Excel "Web" Add-in.
    • url
      A reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based. Can optionally be an expression that supports loading data across multiple requests (or performs other special parsing), as described here. Required. Excel "Web" settings.
    • tableCriteriaQuery
      A query specification to isolate a specific table from the raw source data. Details about the query syntax are covered here. If there are multiple tables that meet the given criteria, the first table is used. Optional. If this is omitted, uses "tableindex=1" as the default value. Excel "Web" settings.
    • tableColumnName
      Either the resolved column name from the matched table (as show for example in Page Explorer), or the column number (position-based, starting at zero). One can optionally apply one of the following aggregates against the column: SUM, AVG, MIN, MAX, COUNT, FIRST, LAST. (e.g. "SUM(ColumnName)") Required.
    • tableValueFilter
      A filter expression that can be applied to the table to help isolate a candidate row or rows of interest. The format for the expression is more or less the same as described here. Optional. Excel "Web" settings.
    • tableValueSort
      A string that contains the column name followed by "ASC" (ascending) or "DESC" (descending). Columns are sorted ascending by default. Multiple columns can be separated by commas. Optional.
    • additionalConfig
      An optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Optional - provide an empty string or omit the parameter if there are no configuration parameters needed (i.e. default behavior).
    • noDataValue
      When no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. Optional - no value is treated as an empty string.
  • SQL Server
    • @serviceUrl
      A URL reference to your SQL-Hero application server services. Typically this would be "http://servername:46837" where "servername" matches the server on which you installed the SQL-Hero services, with port 46837 being the default port number used by the SQL-Hero installer program.
    • @parseUrl
      A reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based. Can optionally be an expression that supports loading data across multiple requests, as described here.
    • @tableCriteriaQuery
      A query specification to isolate a specific table from the raw source data. Details about the query syntax are covered here. If there are multiple tables that meet the given criteria, the first table is used.
    • @tableColumnName
      Either the resolved column name from the matched table (as show for example in Page Explorer), or the column number (position-based, starting at zero). One can optionally apply one of the following aggregates against the column: SUM, AVG, MIN, MAX, COUNT, FIRST, LAST. (e.g. "SUM(ColumnName)")
    • @tableValueFilter
      A filter expression that can be applied to the table to help isolate a candidate row or rows of interest. The format for the expression is more or less the same as described here. Optional - can pass NULL or an empty string for no filtering.
    • @tableValueSort
      A string that contains the column name followed by "ASC" (ascending) or "DESC" (descending). Columns are sorted ascending by default. Multiple columns can be separated by commas. Optional - can pass NULL or an empty string for default ordering.
    • @additionalConfig
      An optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Provide an empty string or NULL if there are no configuration parameters needed (i.e. default behavior).
    • @noDataValue
      When no data value is retrieved, this value is returned as a "default." This value also plays a role depending on your "ErrorHandling" configuration setting. (NULL is a typical value to use for this parameter.)

Return Value(s)

  • Excel
    • Returns a value of an appropriate type. For example, a value that looks like a number will be returned as a numeric (i.e. can be formatted as a number). Populates exactly one cell for the desktop add-in - can populate multiple for "Web" add-in (see how).
    • Values that can't be parsed to a native type are left as strings.
    • No data matching the query expression returns the value specified in the noDataValue parameter (or an empty string if no value is provided).
    • Error values can be returned as well, depending on your "ErrorHandling" configuration setting.
  • SQL Server
    • Returns nvarchar(MAX)
    • Can be cast to other types, if the returned string is castable.
    • No data matching the query expression results in NULL.
    • Error values can be returned as well, depending on your "ErrorHandling" configuration setting.

Remarks

This function can be used to extract a single value from a tabular data source such as can be embedded in a larger HTML page or Excel workbooks - or "native tabular" sources such as comma/tab separated files. As opposed to WebGetSingle, focus is given to identifying the value(s) of interest through "stronger" references such as column names, filter expressions, etc. Also, aggregations can be applied over multiple rows. Use of sorting along with functions such as FIRST and LAST can identify "top 1" values. For the Excel "Web" add-in, functionality described here for the Desktop add-in can be carried over based on the parameter names mapping to the fields made available in the data connection in the add-in task pane.

Examples

  • Excel

    1. Pull data from a HTML page - use a row filter and an aggregate over a paged grid

    =WebGetSingleFromTable("{ url:""https://www.codexframework.com/xskrape/sampledata1?page={0}"", method:""step_sequential"", stop_on:""any"", arguments:[{start:1, step:1}] }", "tableindex=1", "AVG(Quantity)", "Quantity > 400")

    Review the page https://www.codexframework.com/xskrape/sampledata1. With this example we're doing several things: a) using a URL expression to load multiple pages of data, stopping when no more data is available, b) accessing the first table on the page (there is only one), c) accessing the Quantity column, but also getting an average over available quantity values, d) filtering to quantities greater than 400 - these are the ones that are averaged using the AVG function.

    2. Pull data from a CSV data feed - use a sort and FIRST

    =WebGetSingleFromTable("http://earthquake.usgs.gov/fdsnws/event/1/query?starttime=2015-06-18&minmagnitude=4.0&format=csv", "tableindex=1", "first(mag)", "", "mag DESC")

    The above function call is logically equivalent to this call:

    =WebGetSingleFromTable("http://earthquake.usgs.gov/fdsnws/event/1/query?starttime=2015-06-18&minmagnitude=4.0&format=csv", "tableindex=1", "max(mag)")

    In the first case, we order the in-memory copy of the retrieved table by the value of the "mag" column, in descending order, thereby making the first row hold the greatest value. We then pick the first row's "mag" value using "first(mag)". ("First" is technically optional here, as it's the default operation if no aggregate is specified.) In the second case, we simply get the maximum "mag" value using the "max" aggregator. (There are other, more complex scenarios where sorting isn't equivalent to using aggregators such as MAX or MIN, such as for example sorting by a date and then picking a different column's value.)

    3. Isolate one of possibly many HTML tables, filter to a specific row, uncached and with a long timeout

    =WebGetSingleFromTable("http://finance.yahoo.com/q/ae?s=BDSI+Analyst+Estimates", "musthaverowfilter=RevenueEst='Avg. Estimate'", "3", "RevenueEst LIKE 'Sales Growth%'", "", "Caching=False;Timeout=200")

    In this example we're pulling from a page that has multiple HTML tables with data. We're locating a specific table of interest by using "musthaverowfilter". This accepts a row filter expression that's applied against each table in the source. Normally you'd try to identify a single table that meets the criteria - and multiple criteria can be combined by concatenating them with "&&" symbols. However, if mutiple tables do end up matching the criteria, the first table is considered the final data source. The "3" refers to the fourth column of the matched table (zero based). "RevenueEst LIKE 'Sales Growth%'" takes the matched table and applies a row filter to identify a subset of data to consider. In this case, we're able to isolate a specific row of interest. "Caching=False;Timeout=200" means this request must be satisfied from the remote data source (no local caching is used), and the remote connection will timeout after 200 seconds (default is 90 seconds).

    4. Error handling scenarios

    =WebGetSingleFromTable("http://finance.yahoo.com/q/ae?s=GOOG+Analyst+Estimates", "musthaverowfilter=RevenueEst='Avg. Estimate'", "wowowow", "RevenueEst LIKE 'Sales Growth%'")

    The above function call is intended to fail - there is no column named "wowowow" in our matched table. Default error handling behavior for xSkrape for Excel is to rethrow exceptions such as would happen here. This results in getting a "#VALUE!" message in the target cell. If we change the error handling setting to "ReturnValue", error details are returned within the target cell:

    =WebGetSingleFromTable("http://finance.yahoo.com/q/ae?s=GOOG+Analyst+Estimates", "musthaverowfilter=RevenueEst='Avg. Estimate'", "wowowow", "RevenueEst LIKE 'Sales Growth%'", "", "ErrorHandling=ReturnValue")

    With the above adjustment, we see "xSkrape: wowowow is neither a DataColumn nor a DataRelation for table Table3." in the cell containing the function call. xSkrape errors are preceded by "xSkrape:" as one way to identify a failure. Other error handling options for Excel include "NoData" (use the "no data" value) and "Exception" (default).

  • SQL Server

    1. Pull data from a HTML page - use a row filter and an aggregate over a paged grid

    SELECT dbo.WebGetSingleFromTable('http://localhost', '{ url:"https://www.codexframework.com/xskrape/sampledata1?page={0}", method:"step_sequential", stop_on:"any", arguments:[{start:1, step:1}] }', 'tableindex=1' , 'AVG(Quantity)', 'Quantity > 400', NULL, NULL, NULL);

    Review the page https://www.codexframework.com/xskrape/sampledata1. With this example we're doing several things: a) using a URL expression to load multiple pages of data, stopping when no more data is available, b) accessing the first table on the page (there is only one), c) accessing the Quantity column, but also getting an average over available quantity values, d) filtering to quantities greater than 400 - these are the ones that are averaged using the AVG function.

    2. Pull data from a CSV data feed - use a sort and FIRST

    SELECT dbo.WebGetSingleFromTable('http://localhost', 'http://earthquake.usgs.gov/fdsnws/event/1/query?starttime=2015-06-18&minmagnitude=4.0&format=csv', 'tableindex=1', 'first(mag)' , NULL, 'mag DESC', NULL, NULL)

    The above function call is logically equivalent to this call:

    SELECT dbo.WebGetSingleFromTable('http://localhost', 'http://earthquake.usgs.gov/fdsnws/event/1/query?starttime=2015-06-18&minmagnitude=4.0&format=csv', 'tableindex=1', 'max(mag)' , NULL, NULL, NULL, NULL)

    In the first case, we order the in-memory copy of the retrieved table by the value of the "mag" column, in descending order, thereby making the first row hold the greatest value. We then pick the first row's "mag" value using "first(mag)". ("First" is technically optional here, as it's the default operation if no aggregate is specified.) In the second case, we simply get the maximum "mag" value using the "max" aggregator. (There are other, more complex scenarios where sorting isn't equivalent to using aggregators such as MAX or MIN, such as for example sorting by a date and then picking a different column's value.)

    3. Isolate one of possibly many HTML tables, filter to a specific row, uncached and with a long timeout

    SELECT dbo.WebGetSingleFromTable('http://localhost', 'http://finance.yahoo.com/q/ae?s=GOOG+Analyst+Estimates', 'musthaverowfilter=RevenueEst=''Avg. Estimate''', '3', 'RevenueEst LIKE ''Sales Growth%''', NULL, 'Caching=False;Timeout=200', NULL)

    In this example we're pulling from a page that has multiple HTML tables with data. We're locating a specific table of interest by using "musthaverowfilter". This accepts a row filter expression that's applied against each table in the source. Normally you'd try to identify a single table that meets the criteria - and multiple criteria can be combined by concatenating them with "&&" symbols. However, if mutiple tables do end up matching the criteria, the first table is considered the final data source. The "3" refers to the fourth column of the matched table (zero based). "RevenueEst LIKE 'Sales Growth%'" takes the matched table and applies a row filter to identify a subset of data to consider. In this case, we're able to isolate a specific row of interest. "Caching=False;Timeout=200" means this request must be satisfied from the remote data source (no local caching is used), and the remote connection will timeout after 200 seconds (default is 90 seconds).

    4. Error handling scenarios

    SELECT dbo.WebGetSingleFromTable('http://localhost', 'http://finance.yahoo.com/q/ae?s=GOOG+Analyst+Estimates', 'musthaverowfilter=RevenueEst=''Avg. Estimate''', 'wowowow', 'RevenueEst LIKE ''Sales Growth%''', NULL, NULL, NULL)

    The above function call is intended to fail - there is no column named "wowowow" in our matched table. Default error handling behavior for xSkrape for SQL is to return error text such as would happen here. This results in seeing a value of "xSkrape: wowowow is neither a DataColumn nor a DataRelation for table Table6.". If we change the error handling setting to "Exception", a hard SQL error is raised instead (severity 16, "A .NET Framework error occurred during execution of user-defined routine or aggregate "WebGetSingleFromTable"").

    SELECT dbo.WebGetSingleFromTable('http://localhost', 'http://finance.yahoo.com/q/ae?s=GOOG+Analyst+Estimates', 'musthaverowfilter=RevenueEst=''Avg. Estimate''', 'wowowow', 'RevenueEst LIKE ''Sales Growth%''', NULL, 'ErrorHandling=Exception', NULL)

    Other error handling options for SQL include "NoData" (use the "no data" value) and "ReturnData" (default).

Related