Function: WebGetDynamicTable

Extract tabular data (rows and columns) from a source reference, using query conditions to identify the characteristics of the data in the event there are multiple tables that could be identified in the source. Results are returned in adjacent cells, relative to the function and the number of rows and columns returned is determined by the size of the source data (i.e. as opposed to a fixed cell range offered by WebGetTable).

Usage

  • Excel
    =WebGetDynamicTable(string url
    , string tableCriteriaQuery
    , bool includeHeaders = false
    , string headerRenaming = ""
    , string additionalConfig = ""
    , object noDataValue = null
    , string tableValueFilter = ""
    , string tableValueSort = "")

Parameters

  • Excel
    • url
      A reference to the source data to be queried. Can be a reference that is HTTP, HTTPS or FILE based. Can optionally be an expression that supports loading data across multiple requests, as described here. Required.
    • tableCriteriaQuery
      A query specification to isolate a specific table from the raw source data. Details about the query syntax are covered here. If there are multiple tables that meet the given criteria, the first table is used. Required.
    • includeHeaders
      If True, table header names are included in the returned data feed. Optional - default behavior is no table headers (just raw contents).
    • headerRenaming
      An optional set of column renaming expressions of the format "OldColumnName becomes NewColumnName", delimited by "&&" if there are multiple. (See examples.) An empty string or omitted implies header names will be as they are in the data source.
    • additionalConfig
      An optional list of configuration parameters that control how the query is issued. Details about the configuration parameters syntax are covered here. Note that there are some configuration parameters specific to WebGetDynamicTable, such as those that let you define where data is placed relative to the cell containing the function. Optional - provide an empty string or omit the parameter if there are no configuration parameters needed (i.e. default behavior).
    • noDataValue
      When no data value is retrieved, this value is returned as the value of function call (i.e. fills the cell containing the function). This value also plays a role depending on your "ErrorHandling" configuration setting. Optional - no value is treated as an empty string.
    • tableValueFilter
      A filter expression that can be applied to the table to help isolate a candidate row or rows of interest. The format for the expression is more or less the same as described here. Optional.
    • tableValueSort
      A string that contains the column name followed by "ASC" (ascending) or "DESC" (descending). Columns are sorted ascending by default. Multiple columns can be separated by commas. Optional.

Return Value(s)

  • Excel
    • Data source values are treated as strings when loaded in Excel (by default).
    • Populates a range of cells. The range size is determined by the size of the data source and by default is placed in the row following the cell containing the function call.
    • No data matching the query expression returns the value specified in the noDataValue parameter for the cell containing the function. (Any pre-existing source data is left in place.)
    • Error values can be returned as well, depending on your "ErrorHandling" configuration setting.

Remarks

This function is typically used to extract tabular data from a data source that either is tabular by nature (e.g. CSV, Excel), or tabular details can be inferred (e.g. HTML).

Data populates a cell range that is sized based on the "shape" of the source data. For example, if the source data has 4 columns and 16 rows and you've elected to return a header row, by default expect to see a header row populate cells on the row following the function, starting with the same column as the function. On the next row and for 15 more rows, source data would populate 4 columns. Options exist to pad additional cells to support the possibility of the source data size decreasing, thereby clearing previously populated cells. The function call being housed in its own cell means status information can be returned into that cell, with source data injected into the worksheet. This means if connectivity is broken to the data source, data from the last refresh will not be lost. This function effectively operates as if data is being entered manually in worksheet cells with automatic calculation suspended until all cells are filled.

The order of the data is determined solely by the order in the source, unless the tableValueSort parameter is used.

Examples

  • Excel

    1. Pull one of several tables from a HTML page, overriding default timeout and throttling settings, renaming columns, and using non-default positioning of data

    =WebGetDynamicTable("http://finance.yahoo.com/q/ae?s=GS+Analyst+Estimates", "columnname=RevenueEst", TRUE, "CurrentQtrSep15 becomes CurrentQtr && NextQtrDec15 becomes NextQtr && CurrentYearDec15 becomes CurrentYr && NextYearDec16 becomes NextYr", "AddinCellOffsetDown=0; AddinCellOffsetRight=1; AddinPadRows=1000; Timeout=200; MaxRequestCount=2000; MaxRequestIntervalSizeSeconds=300; MaxRequestWaitSeconds=120; MinRequestIntervalMilliseconds=10")

    This URL has a number of candidate tables we could get data from. In this example we're picking the one that has a column called "RevenueEst" as shown when using Page Explorer. (In fact, Page Explorer offers this as a suggestion.) Note that the name may not match the exact text on the page - whitespace can be removed. The "TRUE" here indicates we want to include column names as determined from the source, and our cell range selected accounts for this extra row. The next parameter is composed of four column renames where we've effectively stripped out year references. The final parameter overrides the default 90 second request timeout with 200 seconds, and establishes non-default throttling settings which among other things sets the minimum interval between requests to be 10 milliseconds instead of the default of 25. It also changes the default placement of data using "AddinCellOffsetDown=0;AddinCellOffsetRight=1": instead of placing it on the next row, it places it starting in the next column. The "AddinPadRows=1000" configuration parameter changes behavior to fill in an additional 1000 blank rows following the loaded data (versus the default of 200). This is useful if we expect the number of rows in the source data might decrease, thus clearing away defunct data from any prior load.

    2. Pull data from a tab-separated value (TSV) file located on OneDrive

    =WebGetDynamicTable("https://codexenterprisesllc-my.sharepoint.com/personal/sqlheroguy_codexenterprisesllc_onmicrosoft_com/_layouts/15/guestaccess.aspx?guestaccesstoken=ykGxUq9k2htUzjAPlJWCX%2fVPoozyAkPvT70Qk5bpcrc%3d&docid=0a99014156eb240c2be82c206e20e2578", "tableindex=1", FALSE, "", "AddinCellOffsetDown=2")

    This example assumes we've put a tab-separted file on OneDrive and created a sharing link for it. (This is covered in more detail in the Pulling a cloud-based CSV file into Excel example.) The table reference "tableindex=1" is consistent with the fact there's only one table available: formats such as CSV and TSV are tabular by nature. We've also used "AddinCellOffsetDown=2" to place our data two rows down, not the default of one. This allows us to create our own custom headings, while data is filling the range B4:D7 as illustrated here:

    3. Pull data from a paged grid on an HTML page, with pre-sorting and pre-filtering

    =WebGetDynamicTable("{ ""url"":""https://www.codexframework.com/xskrape/sampledata1?page={0}"", ""method"":""step_sequential"", ""stop_on"":""any"", ""arguments"":[{""start"":1, ""step"":1}] }", "tableindex=1", TRUE)

    With the function call illustrated here, we can pull multiple pages of data from a paged grid into a dynamic Excel cell range:

    With WebGetDynamicTable, we're retrieving all available data from the data source, unlike WebGetTable which requries a fixed range size. In order to create a "top 10" report, native Excel functionality won't necessarily give us exactly what we want - but we can use xSkrape functionality as illustrated here:

    =WebGetDynamicTable("{ ""url"":""https://www.codexframework.com/xskrape/sampledata1?page={0}"", ""method"":""step_sequential"", ""stop_on"":""any"", ""arguments"":[{""start"":1, ""step"":1}] }", "tableindex=1", TRUE, "", "AddinMaxDataRows=11", "", "Category='Electronics' OR Category='Clothing'", "Quantity DESC")

    What's changed here is we're using the "AddinMaxDataRows=11" configuration parameter: this limits us to the first ten data rows (plus one header row), after considering any active sorts and/or filter expressions. We're using the non-trivial filter expression "Category='Electronics' OR Category='Clothing'" which yields more than 10 rows of matched data. For sorting, we're using "Quantity DESC" which is giving us the greatest quantities first as the native order of the source data, even before we land it in Excel. The net effect is we're showing the top ten items with the greatest quantity that are either "Electronics" or "Clothing".

    4. Error handling scenarios

    =WebGetDynamicTable("{ ""url"":""https://www.codexframework.com/xskrape/sampledata1?page={0}"", ""method"":""step_sequential"", ""stop_on"":""any"", ""arguments"":[{""start"":1, ""step"":1}] }", "tableindex=1", TRUE, "", "ErrorHandling=ReturnValue", "N/A", "aaa")

    In this example, we've purposely induced an error: the filter expression of "aaa" is not valid (a syntax error, effectively). Default error handling causes the cell holding the function call to show the error message, namely "xSkrape: Cannot find column [aaa]." Other options include using "ErrorHandling=Exception" (cell returns "#VALUE!"), or "ErrorHandling=NoData" (cell returns what was passed in the noDataValue parameter).

Related