Saturday, February 22, 2020

Data Mining


Data Mining

Learning Outcomes:
Be aware of the uses and applications of data mining in daily life and explain how data mining can improve the quality of living.

Relevant past paper:
DSE ICT Elec A(SP-2016): PP 4d. 2013 4f. 2015 4c. 2016 4eii.


Questions on data mining focus on application, rather than the techniques of data mining.
The focus is on:
  • Purpose of data mining
  • What data you are going to collect
  • How you can use the generated information


Purpose

Data mining helps to discover the hidden patterns and trends. This knowledge allows prediction of future trends and behaviours. An organisation can use this information to make appropriate decision.

What data to collect

Depending on what you want to know.
For example, if you want to promote sales, you want to present relevant items to the buyers when they are making purchases on the web. You may see other products as “Related to items you viewed”. So you may want to collect data on buying habits and what items are usually bought together in order to achieve that.

Generated information

With further analysis, you can now group different people into groups with different behaviours. You can present different products to the groups based on their behaviours.



References:

Tuesday, February 18, 2020

Database application development lifecycle

Database application development lifecycle


Learning Outcomes:
  • Understand the stages involved in the database application development lifecycle and the links between them.
    • The main stages involved in the database application development lifecycle include requirements collection and analysis, system definition, application and database design, DBMS selection, form and report design, prototyping, implementation, data migration, conversion and loading, testing, and operational maintenance.
    • Students should recognize that the stages of the database application development lifecycle often involve review and repetition of previous stages through feedback loops.
  • Practise the main activities associated with the stages of the database application development lifecycle in an educational or commercial scenario.
    • Students should be able to apply simple Computer Aided Software Engineering (CASE) tools to automate, manage and simplify the database design process. CASE supports may include tools for data dictionary construction, and those for facilitating data models development, etc.

Relevant past paper:
DSE Elec A(SP-2016): 2012 4a. 2013 3d. 2014 3bc. 2015 4ab. 2016 4bc.

Stages involved in the database application development lifecycle:

Variations exist for what should be included in the lifecycle. This is a general guide.
Components
Deliverables i.e. Products of the development process / Details of the stages
System definition
5Ws (What, Why, Who, When, Where)
Feasibility report, project plan
Requirements collection and analysis
Collect information from potential users
e.g. questionnaires, interview
à know what data to collect for the DBMS, what functions are expected
Application and database design
DBMS selection (database)
Form and report design (application)
Conceptual, logical, physical design…
Conceptual – ER diagrams
Logical – normalisation of tables
Physical - hardwares, physical data design
Prototyping
Implementation
An early model is presented to users for feedback
Errors can be detected earlier and resolved by going through previous stages
Data migration, conversion and loading
Format of original data may not be suitable for use à need conversion to usable format
Ensure data consistency
Testing
Test plan with different test cases (create pseudo-data)
Use of real data after data conversion
Operational maintenance
System put into use
Documentation: system and user manuals (for record and future maintenance)
Backup, monitor for errors, etc.
The lifecycle often involves review and repetition of previous stages through feedback loops.

Computer Aided Software Engineering (CASE) tools

These tools are software tools that assist the development of a database application in different stages of the development lifecycle. A long list of CASE tools exists.
CASE tools have a central repository, function as a data dictionary, source of common information and storage of product specifications, documents and diagrams. Different CASE tools can obtain essential information from the central repository.

Some examples:
Stages
CASE tools
Deliverables from CASE tools
Definition/Planning
Project management tool
Project schedule
Application and database design
Graphical modelling tool
ER diagram
implementation
Query editor
Query command
Testing/maintenance
Query analysis tool
Database performance


References:


Saturday, February 15, 2020

Normalisation

Normalisation


Learning Outcomes:

  • Explain the concepts of data redundancy and discuss the methods or measures used to reduce data redundancy.
    • Students should be able to reduce data redundancy through normalization up to Third Normal Form (3NF).

Relevant past paper:
DSE ICT Elec A(SP-2016): SP 4c. PP 4c. 2012 3c, 4c. 2013 4c. 2014 2f. 2016 3b,dii.
AS CA(2000-2013): 2008 5b. 2009 2bc. 2011 3b. 2013 5a.
AL CS P2: 2004 1b.


To tackle the problem of data redundancy, we can perform normalisation.

First normal form (1NF)

Rules: 1. no multi-valued attribute/no repeating element
2. Has a primary key

StudentSubjects1
StuID(key)
FirstName
LastName
Subjects
001
David
Chan
ICT, Physics
The Subjects field is a multi-valued attribute as it allows multiple values in the attribute.

To convert it into 1NF
StudentSubjects2
StuID(key)
Subjects(key)
FirstName
LastName
001
ICT
David
Chan
001
Physics
David
Chan
(StuID+Subjects as a composite key)

Functional dependency

Before going into 2NF and 3NF, you need to learn the concept of functional dependency.
Club
ClubID(key)
ClubName
1001
Running Club
1002
Swimming Club
ClubID determines exactly one value of ClubName. Therefore, ClubName is said to be functionally dependent on ClubID.
i.e. ClubID 1001 will give you the ClubName Running Club, not Swimming Club.



Second normal form (2NF)

Rules: 1. in 1NF
2. No partial functional dependency between primary key fields and non-key fields
Partial functional dependency means some non-key fields depend on part of the key (instead of the primary key as a whole). (i.e.: Partial dependency can only happen when there is a composite key)
In the table StudentSubjects2, FirstName and LastName are functionally dependent on StuID, which is only a part of the composite key StuID+Subjects.

To convert it into 2NF
StudentSubjects
StuID(key)
Subjects(key)
001
ICT
001
Physics
Students
StuID(key)
FirstName
LastName
001
David
Chan



Third normal form (3NF)

Rules: 1. in 2NF
2. No transitive functional dependency between primary key field(s) and non-key fields
Transitive functional dependency happens when non-key field depends on another non-key field (instead of the primary key field).

ClubLeader
ClubID(key)
StuID
FirstName
LastName
1001
001
David
Chan
StuID is functionally dependent on ClubID.
Yet, the FirstName and LastName are functionally dependent on StuID, but not ClubID.

To convert it into 3NF
ClubLeader
ClubID(key)
StuID
1001
001
Students
StuID(key)
FirstName
LastName
001
David
Chan


Advantage of normalisation: reduce data redundancy
Disadvantage of normalisation: the table(s) are decomposed into smaller tables. Multiple join operation may be needed to retrieve data, which can reduce database performance.

Question:

Is the table in 2NF? (Give the reason(rule), and state an example using the table provided)

Yes – because it is in 1NF and there is no partial dependency. All attributes fully depend on whole key______(write down the primary key).

No – because there is partial dependency. ____(an attribute) depends on part of the key ___(part of a composite key).

No – the table is not in 1NF. (If you have already worked this out in other question)

Thursday, February 13, 2020

Data Redundancy

Data Redundancy

Learning Outcomes:
Explain the concepts of data redundancy and discuss methods or measures used to reduce data redundancy.

Relevant past paper:
DSE ICT Elec A(SP-2016): SP 2a. PP 2a, 4ai. 2012 3a, 4b. 2014 4aii. 2015 2di. 2016 2a.
AS CA(2000-2013): 2000 1a. 2002 4bi. 2005 6c. 2006 9g. 2012 9aii.
AL CS P2: 2004 1a. 2005 1.


A poorly designed database can lead to data redundancy.

Borrow
BookID
StuID
StuName
001
123
John
002
123
John
003
245
Peter

Suppose this is a record of student borrowing books. The attribute StuName is not necessary here because StuName depends on StuID. (This is known as functional dependency and will be covered in normalisation)

Problems occur when a student borrows more than 1 book. (more than 1 entry with StuID 123). StuName John is repeatedly stored.

The expected problems include:
  • data redundancy: repeated data unnecessarily inserted.
  • data inconsistency: when the repeated (redundant) data are not updated at the same time.
  • storage wastage: the unnecessarily repeated data take up extra storage.
  • overheads on database update: more fields, which are redundant, need to be updated when they are changed à affect database performance

 (These problems may also be referred to insertion anomaly, deletion anomaly and modification anomaly)

Derived attribute

Derived attribute is an attribute that is derived from other field(s).

Test
Test1
Test2
TestTotal




The TestTotal is a derived attribute because its content can be calculated from other fields (Test1 and Test2). In this sense, it is also redundant.

Sometimes, the derived attribute may still be included in a table.
  • Advantage: more convenient to access the data.
  • Disadvantage: refer to the 4 points in data redundancy.


Saturday, February 8, 2020

ER diagrams


ER diagrams


Learning Outcomes:
  • Be aware of the different types of relationships among entities in a relational database.
  • Analyse simple scenarios in business, education or other fields and create simple ER diagrams involving binary relationship only in designing databases.
    • The resolution of many-to-many relationship into multiple one-to-many relationships should also be introduced.
  • Transform the ER diagrams to tables in relational databases.

Simple ER diagrams:

Cardinality constraint

  • One-to-one relationship 1:1 : Each A has a B.
  • One-to-many relationship 1:M : Each A can have more than 1 B.
  • Many-to-many relationship M:N : Each A can have more than 1 B. Each B can have more than 1 A.

*Participation constraint

  • Optional participation:
    Question to ask: All A must have a B? No = optional participation

  • Mandatory participation:
    Question to ask: All A must have a B? Yes = mandatory participation


Tips: 
An entity present in your table could be due to their action. E.g. Customers(A) buy products(B).
Customers must have bought some products to be included in your table. Therefore, all customers must be related to certain products. i.e. All A must have at least 1 B. à mandatory

Does it mean all products must be related to customers? Not necessarily. Some products in the shop maybe so outdated that no one wants to buy. i.e. Not all B has an A. à optional

Relevant past paper:
DSE ICT Elect A(SP-2016): SP 2b,d. PP 2b. 2012 3d. 2013 4b. 2014 3a. 2015 2b. 2016 4d.
AS CA(2000-2013): 2007 6. 2008 8f. 2011 10a. 2012 2a.



Resolution of many-to-many relationship

By adding 1 more entity, a M:N relationship can be resolved into 2 1:M relationships.

Relevant past paper:
AS CA(2000-2013): 2009 4. 2012 2b. 2013 2.

Transform the ER diagrams to tables in relational databases


Relevant past paper:
DSE ICT Elect A(SP-2016): SP 2c. PP 2c.

Monday, February 3, 2020

SQL part 6 – Interpretation


SQL part 6 – Interpretation
 

  • In some questions, you are required to explain the purpose of a SQL statement. These should be answered in the context of the scenario.


Relevant past paper:
DSE ICT Elect A(SP-2016): SP 1e. 2012 1cii,2c,4d. 2013 1ei. 2014 2e. 2015 1e. 2016 2bi.
AS CA(2000-2013): 2004 1d. 2009 8f. 2010 8e. 2011 7a. 2012 1c. 2013 9eiii.




  • Some questions require you to make changes to the table. For examples, adding fields, decomposing tables, etc. Need to specify the data type if new fields are created.


Relevant past paper:
DSE ICT Elect A(SP-2016): SP 2e. 2012 3bii. 2014 1c. 2016 1f.
AS CA(2000-2013): 2002 4bii. 2005 6b. 2005 8g.




  • You may be required to write the output of a SQL statement.


Relevant past paper:
AS CA(2000-2013): 2010 5ai.
AL CS(2003-2006): 2004 2. 2005 8a.

Syllabus comparison

 Syllabus comparison   DSE ICT 2025 New syllabus DSE ICT 2012-2024 CE CIT 2005-2011 CE CS 1994-2004 ...