The speed of calculations will be faster on the smaller set of data. What does that mean in practical terms? As one example, if you do any calculations based on the column, the computer hosting the model has to use 25MB to hold the Calculated Column version vs 18KB for the imported version. As a result, even with 2 values, it is almost 25MB in size, or 1,338 times larger than the imported version! The column can be compressed, but not by much, as there are still 100M unique values, which is why it is not quite half the size of the column - roughly 0.86GB vs 1.49GB.īut the column had to resign itself to be compressed at whatever compression algorithm was used by VertiPaq for the imported columns. In this table, the Vertipaq engine only looked at the and columns when optimizing for compression, and there isn’t a lot of compression going on here. Now look at the top table - Calculated Column. At 18KB that is certainly more than 2 text values, but again, this gets deeper in to how VeritiPaq splits the data, and this 200M record table is broken up into 191 segments in Power BI. This means theoritically it could represent this by just two values - “Even” for the first 100M records, and “Odd” for the second 100M records, and that is, in effect, what it did. However, the VertiPaq engine looked at this column and realized it had a low cardinality, only 2, and sorted by that column first. On the bottom table, called “Imported”, the column consumes roughly 18KB. There are only 2 values in the column - “Even” or “Odd.” Note: The scenario gets a bit more complex here as these text values create a dictionary, which is a further optimization I am not going to get into here because the effect is the same on both tables - again, see the Definitive Guide to DAX for more on this. This field only has 2 values.įocus on the Total Size column. 0.5 and 1.0 from the first column are 1 in the second, 1.5 and 2.0 are 2 in the second, etc.Ī third column is a text value of “Even” or “Odd” that is based on the second column. This has 200M unique values.Ī list of numbers that rounds those numbers to whole numbers. I have a 200M record table with three columns:Ī list of numbers from 0 to 100,000,000 in 0.5 increments. It does a thorough job of explaining how the VertiPaq engine works. If you want to know how this works, I suggest you read Chapter 17 in The Definitive Guide to DAX, Second Edition. All calculations are done on the compressed data. Power BI, which includes Power Pivot models and SSAS Tabular models, compresses the data in tables as it is loaded to the service. Note that the data I am going to show is a contrived set of data for demonstration purposes only to explain. I am going to focus on the model size, and thus performance. What I am saying is they should be avoided, and there are several reasons. First off I want to make sure it is clear that I am not saying “never use calculated columns.” I am not.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |