Data structure
The best ways to organise data were first put forward in the early 1970s and have changed remarkably little since. The use of data analysis, and the resulting data tables, form the start point for every good database system. There is a standard way of splitting the data into separate tables, which is known as normalisation.
Excel, however, has always been different. The start point is an empty grid, and the user can put anything anywhere - there is nothing to lead the user towards any approach rather than any other. What can happen as a result is that data becomes scattered throughout a workbook and is mixed in with the analysis which is the purpose of the workbook. Such workbooks can become very difficult to extend and maintain.
However the advantages of good data design apply just as much to Excel as to any database system:
- Separating the data from its subsequent analysis ensures that the data is easy to maintain.
- Organising the data in separate (normalised) tables ensures that data is not duplicated.
- Many of the features of Excel are designed to work with data in tables.
To give a simple example of how this would work: imagine a table of invoices, and each invoice is associated with a customer. Customers have codes, also names, addresses and other information. One could include all this information on the invoice table, but doing so would result in a great deal of duplication, as one would need to repeat the address etc. on every invoice for the same customer, and there would also be the risk of error in doing it this way. So instead one would just include the customer code on the invoice, then set up a separate customer table with the other customer details.
The end point of this process is data in a number of different tables. Possibly the most significant improvement to Excel was the introduction of support for Tables in Excel 2007, which among other features, for the first time allowed the data to be referenced by its name rather than its address. However there were still limitations - for example the traditional Pivot Table is designed to work from a single Excel table. And although it is easy to set up a formula to lookup a table based on one column, looking up two columns requires a slightly awkward array formula which can be slow with large tables.
Subsequent versions of Excel have improved on this, for example with the introduction of Power Pivot and Power Query, now known as Get and Transform. However these do not always fit easily into existing workbooks or the types of analysis which the user wants to carry out. And to use Get and Transform for anything beyond what is provided in the user interface, the user has to learn the M language which is unlike anything else in Excel.
Data Manipulation Tools
We use a range of tools which simplify the processing of data in tables. These can be set up either so the user can easily maintain the logic, or can be entirely VBA based, depending on which is best in any situation. Where the logic is set up to be user-maintainable, there is a worksheet where the logic steps are laid out in a table, and it is possible to view the results at every step. On the other hand, an entirely VBA-based approach may be more suitable when working with an existing workbook where the data needs to be extracted from an existing design which is not tables-based. Temporary, virtual tables can be created and manipulated in a succession of steps to produce the final desired result. The facilities provided include:
- Grouping and creation of totals, subtotals and other types of summary.
- Pivot and unpivot.
- Selection and filtering.
- Match and join operations, to bring in data from other tables.
- Sorting.
- Cross product, which creates all combinations of two tables and is useful for evaluating multiple instances of the data.
- Calculation using standard Excel formulae (unlike Get and Transform which requires the user to learn a completely different formula language).
There is a wide variety of output options. The results can be written back to the workbook as a table, a simple list or fed into a pivot table. Alternatively a lookup function is provided so individual results can be placed in any cell.
Use of SQL
In almost every database system, the main way data is handled is through SQL. And with good reason - almost any transformation of the data can be carried out in a single step using a single line of SQL code. But although it is possible to use SQL with Excel, it is not straightforward. We also provide a tool to set up the necessary infrastructure so that SQL can be used easily in a variety of ways:
- At the simplest, the result of a SQL query can be returned as another Table (or simple list output) in the workbook.
- We also provide a UDF (user-defined function) which returns the result of a SQL query. This makes for a simple but very powerful way of producing summaries of the data, and by paramaterising the SQL, such summaries can be made to respond to user selections.
- The output from a SQL query can form the data source for a pivot table.
- Where the output requirements are more complex, the SQL query provides a start point for further analysis using the virtual table approach described above.
Also, sometimes Excel tables are not adequate for the storage of data and a true database is needed. Please see Database Options for our approach to this.