Friday, April 18, 2014

SQL for the Fractured : Starting Point - The FROM clause

Many people, when encountering a language like SQL, run into issues trying to work out what it actually does. I am going to focus on the SELECT statement over the next few posts. a sql statement usually has many clauses.

I am starting with the FROM because I want to emphasis that the SELECT statement does NOT run all at once, so why should we learn it all at once. Most people, myself included, start teaching SELECT with the SELECT clause. Perhaps that's a mistake? The SELECT clause is not even the first clause to get resolved. It is the FROM clause that gets resolved first because, after all, we need to know the structure and location of the data.

Let's look at the ERD for "Tiny U". (You can create the ERD and populate it with data with this script here and, if you don't have an Oracle database handy, you can get one here.)

Each of these Yellow boxes are tables. Tables are like two dimensional spreadsheets. Each table in this set consist of rows and each row has a unique way of identifying it. For example, EXAMPLE_COURSES has a column COURSE_ID. Every row in the Courses table has a unique Course_ID. The Course_ID is the Primary Key (because I DEFINED it as such).

Now, if you have any education, you know that courses have sections (like different terms like Spring '14 and Fall '14 or even have multiple sections in the same term.) If we want to display all the data in the EXAMPLE_COURSES table, it would just be a simple SELECT statement like this -

SELECT *
FROM   EXAMPLE_COURSES;

(The "*" just means EVERYTHING. I am basically asking to show ALL COLUMNS in the EXAMPLE_COURSES table.

And, with the data I have, that would give you this result set.
COURSE_IDFACULTYCOURSETITLECREDIT
1079CCIS2701Database Design & SQL4
1080CCIS2801Systems Analysis4

If we wanted to see the data from EXAMPLE_SECTIONS, that would also be a simple SELECT statement like this -

SELECT *
FROM   EXAMPLE_SECTIONS;

Which would give you the following result set.

SECTIONCAMPUSROOMSTART_DATEEND_DATECOURSE_ID
40Eden PrairieH25716-JAN-201415-MAY-20141079
40Eden PrairieH25513-JAN-201419-MAY-20141080

Look at the last result set. How would you KNOW which section belongs to which Course?

In this case, the Sections table has a column called COURSE_ID which happens to have information that matches the column COURSE_ID in the Courses table. No accident! When I created the Courses table, I needed a way to uniquely identify each course. COURSE_ID does that.

When I created the Sections table, I needed a way to identify which section belonged to which course. COURSE_ID in the Sections table does that. Because I defined the COURSE_ID in the Sections table to point to the Primary Key of the Courses table, COURSE_ID is called the Foreign key.

(I am NOT going into thorough definitions here. If you want a good treatement of this, go to W3SCHOOLS! 

)

If I want to list each Course with its Section, I know which course goes with each section because the section's course_id will match the course's course_id. So, I am combining (JOINING) these tables ON the condition that the course_id in both tables is equal! (And so the JOIN ON clause is BORN!)

SELECT *
FROM   EXAMPLE_COURSES
  INNER JOIN EXAMPLE_SECTIONS
    ON (EXAMPLE_COURSES.COURSE_ID EXAMPLE_SECTIONS.COURSE_ID)

With the above data, this query (SQL Statement) would return the following results.

COURSE_IDFACULTYCOURSETITLECREDITSECTIONCAMPUSROOMSTART_DATEEND_DATECOURSE_ID
1079CCIS2701Database Design & SQL440Eden PrairieH25716-JAN-201415-MAY-20141079
1080CCIS2801Systems Analysis440Eden PrairieH25513-JAN-201419-MAY-20141080

(The result is pretty wide ... thankfully, in the next post, we are going to start trimming this down by picking only the columns we want to see. But, for now, we will have to put up with this.)

We now have combined two tables. We are only going to get a combination of the rows where there is an exact match (A Course that has a Section -and- A Section that is in a Course) This is considered in INNER Join.


With this data, this is good enough. But please be aware that the data model would support Courses that down have sections. If you wanted those, you would want to do an OUTER join. I may do a post on that but, for now, if you are curious, W3Schools has you covered!

One last thing about this code -

SELECT *
FROM   EXAMPLE_COURSES
  INNER JOIN EXAMPLE_SECTIONS
    ON (EXAMPLE_COURSES.COURSE_ID EXAMPLE_SECTIONS.COURSE_ID)


Notice that, in the ON clause, we can't just say COURSE_ID without "prefixing" it with the table name that the COURSE_ID is in. This is because BOTH tables have COURSE_ID. If one table had called it COURSE_NUM, we could have had SQL like this...





    ON (COURSE_ID COURSE_NUM)

... But even this BAD FORM! In the case of SQL that has more than ONE table in it, you really should be in the practice of identifying which column is in which table for EVERY reference of the column in EVERY clause. But that could get bulky! So, thankfully, someone invented the "Table Alias."

SELECT *
FROM   EXAMPLE_COURSES         ec
  INNER JOIN EXAMPLE_SECTIONS  es
    ON (EC.COURSE_ID ES.COURSE_ID)

What if you now want a complete list of all the students in all the courses? The following bullet points are the "business logic" that would match the relationships of the tables in that diagram at the top of the post.

  • Students are Enrolled in courses (Students, Enrollments would be the tables that share the STUDENT_ID)
  • Enrollments are in Sections (Enrollments, Sections would be the tables that share the SECTION and COURSE_ID) 
  • Sections are in Courses (Sections, Courses would be the tables that share the COURSE_ID)
First, let's join our COURSE_SECTION set to Enrollments.

SELECT *
FROM   EXAMPLE_COURSES         ec
  INNER JOIN EXAMPLE_SECTIONS  es
    ON (EC.COURSE_ID ES.COURSE_ID)
  INNER JOIN EXAMPLE_ENROLLMENTS er
    ON (ER.COURSE_ID ES.COURSE_ID
    AND ER.SECTION   ES.SECTION)

In the result set, which is far to wide to show here), there is a STUDENT_ID. We will use that ID to pull the Student info from the STUDENTS set.

SELECT *
FROM   EXAMPLE_COURSES         ec
  INNER JOIN EXAMPLE_SECTIONS  es
    ON (EC.COURSE_ID ES.COURSE_ID)
  INNER JOIN EXAMPLE_ENROLLMENTS er
    ON (ER.COURSE_ID ES.COURSE_ID
    AND ER.SECTION   ES.SECTION)
  INNER JOIN EXAMPLE_STUDENTS    s
    ON (S.STUDENT_ID ER.STUDENT_ID)

And now you have a very wide, but complete set of all the data. Remember, these were INNER joins. What could be missing would be ...
  • Courses with no sections.
  • Students who have not taken any courses
But that is good enough for now. The BEST way to learn this is to try (and fail) with this code, so I would strongly encourage you to try this stuff on your own. You have all the code. You can have access to an environment! It's up to you!!!

(Why do you link to W3SCHOOLS so much? Because W3SCHOOLS is ANSI and do a MUCH BETTER JOB with these topics that I am! So, why reinvent what they have already done?)

No comments:

Post a Comment