Tasks on the Relational Model
This document contains answers to self-study tasks of this text on the Relational Model.
Please, put some effort into those tasks, before you read on.
Roughly, a key is a minimal set of attributes that functionally determines all attributes of a relation schema. Phrased differently, key values need to be unique for distinct tuples, i.e., no two tuples can share the same key value, and the key does not contain “superfluous” attributes. (See this introduction to functional dependencies for a formal definition based on superkeys.)
The primary key is a key selected and designated by the database designer. The primary key can be declared in SQL.
In sample scenarios, the scope or context influences whether a set of attributes is a key or not. For example, when considering students of the University of Münster, the matriculation number can serve as key. However, if students of multiple universities are considered, two students from different universities might share the same matriculation number, indicating that the matriculation number alone is no key. A combination of university name (or university identifier) and matriculation number might be a key then. This compound key of the multi-university setting would not be a key in the single-university setting (as the university identifier would not be needed there, violating subset minimality).
3. Functional Dependencies
3.1. Assert FDs
For the sample relation Account, the following FDs are “natural” ones:
- AccID Date \(\to\) AccID Date CustID Type Balance CustAge
- This FD embodies a key constraint: There is at most one row per Account on each day. Neither AccID nor Date alone determines every other attribute.
- AccID \(\to\) CustID Type
- Each Account is owned by one customer and has a type.
- This FD tells us that CustID and Type are independent of dates. Thus, they do not “belong” into normalized relation schemata whose key contains Date. In fact, relation schema Account violates 3NF. (It also violates 2NF.)
- CustID Date \(\to\) CustAge
- The age of each customer is determined by the date (and can be computed if we know the birthday).
- Note that the first FD in this list implies AccID Date \(\to\) CustAge. Again, Account violates 3NF.
3.2. Valid FDs
An FD is valid if no counter-example exists. Counter-examples are given by tuples that agree on the attributes of the left-hand side but disagree on the right-hand side. In particular, each attribute functionally determines itself (e.g., Date \(\to\) Date cannot have counter-examples).
There are no two tuples that agree on the pair of attributes AccID and Date. Thus, this pair functionally determines every attribute.
There are only two tuples that share the same AccID (which is 1). In both cases, the CustID is 42. Thus, we have: AccID \(\to\) CustID
There are no two tuples that share the same Balance. Thus, Balance functionally determines every attribute (in this sample instance; on the next day, two accounts might share the same balance, showing that such FDs are purely accidental—in contrast to semantic constraints that would hold in all instances).
3.3. Invalid FDs
An FD is violated if a counter-example exists.
The first two tuples agree on CustID but disagree on AccID. (In other words, a customer can own multiple accounts.) Thus, the FD CustID \(\to\) AccID does not hold.
The first and fourth tuple agree on AccID but disagree on Balance. Thus, AccID \(\to\) Balance does not hold.
The first two tuples agree on CustID and Date but disagree on Balance. Thus, CustID Date \(\to\) Balance does not hold.
The sample instance shows that customer 42 was born on January 2. (As the age increases on that day.)
The net worth of transactions on the account with ID 1 on 2015-01-02 was 40. (As the balance at the end of the previous day was 1000.)
For two FDs above, 3NF violations are mentioned.
4. Relational Algebra
Queries such as
pi name, balance ( Customer join sigma balance > 1000000 (Account) )
need to be read inside-out, which is supported by the tree representation of RelaX. Conceptually, execution starts by accessing tuples at the leaf nodes (Customer and Account), which are then processed by operators of inner nodes up to the root node, which represents the overall result. How operations are executed in practice, depends on implementation and optimization details, which are not important here.
It should not be hard to see that the query
sigma balance > 1000000 ( pi name, balance ( Account join Customer ) )
always produces exactly the same result as the previous one. Thus, both expressions are equivalent. (In fact, there are algebraic laws/rules that can be applied to prove such equivalences.) You will learn that the query optimizer of a DBMS comes up with several equivalent execution plans, for which it estimates execution costs before executing the cheapest considered plan. (One rule of thumb is to reduce sizes of intermediate as early as possible. E.g., here, applying the selection based on Balance before computing the join is likely to have lower costs and faster execution times than the other way round. Note “rule of thumb” and “likely” in the previous sentences: If only millionaires exist in the database, the selection does not reduce intermediate sizes at all. Also, while in this case we might expect all tuples to have join partners, in other cases joins can actually decrease intermediate sizes …)