SQL is not difficult to learn — which is good since any attempt to bypass learning it when working with a database server will almost certainly end in an inefficient failure. This session will introduce the basics of SQL (Structured Query Language, occasionally pronounced "sequel") grammar, and then progress to include intermediate syntax such as JOINs, aggregate functions. Finally, some Interbase-specific keywords and optimizations will be covered.
Introduction to InterBase SQL
SQL is not difficult to learn — which is good since any attempt to bypass learning it when working with a database server will almost certainly end in an inefficient failure. This session will introduce the basics of SQL (Structured Query Language, occasionally pronounced "sequel") grammar, and then progress to include intermediate syntax such as JOINs, aggregate functions. Finally, some Interbase-specific keywords and optimizations will be covered.
Introduction
SQL is not difficult to learn — which is good since any attempt to bypass learning it when working with a database server will almost certainly end in an inefficient failure. This session will introduce the basics of SQL (Structured Query Language, occasionally pronounced "sequel") grammar, and then progress to include intermediate syntax such as JOINs, aggregate functions. Finally, some Interbase-specific keywords and optimizations will be covered.
All database servers are slightly different, but most of them use SQL as the primary interface for data modification and retrieval. SQL is an ANSI standard, and was designed to provide a database server-independent way of requesting or modifying data. In practice, the standard, in spite of its monstrous length, leaves enough details for the implementers (the developers of the database servers) to decide upon, that writing a complex query which will work on every DB server on the market can range from tricky to impossible. Since most DB producers add proprietary features to their SQL implementation, SQL is only partially successful as a way to write database-independent client software.
While they're significant enough to frustrate developers attempting to make a single query run against different servers, these differences are not so great that you will have to re-learn your SQL skills from scratch when moving from, say, SQL Server to InterBase. Once you've learned basic SQL, you should be able to do productive work with most of the popular commercial database servers.
All of the SQL examples in this paper are designed to work with employee.gdb, one of the example databases which ships with InterBase. Feel free to try each query using the ISQL window in IBConsole. There is a lot of information in this paper, and if you're new to SQL you may find it helpful to try out each section as you go.
The four essential database operations
Generally, users need to be able to do one of four things with a particular record (often referred to as CRUD)
1. Create a new record
2. Read the record
3. Update the data in the record
4. Delete the record
SELECT
The SELECT verb is used to retrieve one or more records from one or more tables in the database. Some database servers allow you to retrieve records from multiple databases in a single SELECT statement, but InterBase limits SELECT statements to tables, views, and procedures within a single database.
A simple SELECT statement looks like this:
SELECT
EMP_NO, LAST_NAME
FROM EMPLOYEE
A SELECT statement produces a result set, or a collection of data produced by the server in response to a query. When dealing with data from a database in a client application, it is helpful to think of data in terms of result sets instead of as tables, since the returned records may or may not correspond to a single physical table in the database.
This particular SELECT statement will produce a result set with two columns, and with one row for every row in the EMPLOYEE table. Each row in the result set will have one column corresponding to EMP_NO, and one column corresponding to LAST_NAME.
Another way to write this query is:
SELECT
EMPLOYEE.EMP_NO, EMPLOYEE.LAST_NAME
FROM EMPLOYEE
In this case, we have specified the table name both in the list of fields we'd like in the result set, and in the FROM clause, where table names need to be listed. It's not necessary to do this in a SELECT statement which only references a single table, but this syntax will be useful later on when we examine how to join data from multiple tables together into a single SQL statement.
Specifying a fields list
In the queries above, we have specified that we'd like two fields from the EMPLOYEE table inour result set. There is a shorthand syntax to indicate that we'd like every field from the EMPLOYEE table in the result set. Just specify "*" instead of a list of fields:
SELECT *
FROM EMPLOYEE
This syntax is useful when running queries against a table when you're not sure what the field names are, but I don't recommend that you use it in client applications. The reason is that network traffic is one of the biggest bottlenecks in an application which works with a database server. The less network traffic, the better! So it is wise to specify only the columns which your application absolutely requires in your SELECT statements. (We'll examine how to retrieve only the rows your application absolutely requires in just a moment.) Even if you need every column in a given table, it is still wise to specify the list of fields explicitly instead of using "*" because additional columns not required by your application might be added to the table later on.
So you should always specify the field names required by your application when writing SELECT statements. You can specify as many fields as your application requires, just separate them with commas like in the example above.
Restricting returned records with WHERE
For the same reason that it is advisable to specify only the columns required by your application in a SELECT statement — network bandwidth — it is even more advisable to restrict the rows returned to only those your user is interested in seeing. It is not at all uncommon to have hundreds of megabytes of data in a single table, and chances are very good that your users only want to see a few of these records at a time. Chances are even better that they don't want to wait for millions of records they're not interested in to cross the network before they can see the 10 or so records that they asked for.
A note for those migrating older applications where users could simply browse entire tables.
I regularly encounter a great deal of resistence to the idea that this mode of operation must change. The usual excuse is "the users need to be able to see all the records" and "they've always done it that way".
The short answer is: No, they do not need to see all the records.
No user can do anything productive browsing back and forth among thousands of records, they are always looking for particular records, or at worse, a small set of related records. The key is to give them an easy to use means of specifying what they are actually looking for. While there are often a few users who continue to insist, the vast majority catch on easily and find it much more productive.
A WHERE clause is the part of a select statement used to restrict the rows returned as a part of a result set. Here is an example of a SELECT statement with a WHERE clause added:
SELECT
LAST_NAME
FROM EMPLOYEE
WHERE
EMP_NO = 20
Imagine that we want to know the name of employee #20. This query will return that information with minimal network traffic. If we looked at the metadata for the employee table we would find that there is a constraint, in the form of a Primary Key, placed on the EMP_NO column which tells the database server to ensure that values in this column are unique — no two employee records can have the same value in the EMP_NO column. Therefore, the query above, which specifies a single value for EMP_NO in the WHERE clause will return at most one row. It would return no rows if no employee with EMP_NO = 20 existed.
Comparison Operators: =, <, < =, > =, >, <>
The above query used the Equals operator. As with programming languages, the other standard comparison operators are also available. The Not Equal operator can be specified as either"<>" or "!=".
Thus…
SELECT
LAST_NAME
FROM
EMPLOYEE WHERE
EMP_NO <> 20
… will bring back all employee names except the one for employee #20.
SELECT
LAST_NAME
FROM
EMPLOYEE WHERE
EMP_NO <= 20 … will bring back names for those employees having an Emp_No of less than or equal to 20. Boolean Operators: AND, OR, NOT Sometimes we need more complex conditions in a WHERE clause. The AND, OR, and NOT boolean operators can be used to combine multiple conditions: SELECT LAST_NAME FROM EMPLOYEE WHERE ((EMP_NO >= 20)
AND
(EMP_NO <= 30)) AND (NOT (JOB_CODE = 'VP')) The use of parentheses when combining multiple conditions is strongly recommended. There is a set order of operations, but using parentheses will save confusion. Note that strings and dates are always specified with single quotes, since double quotes are reserved for a special SQL feature called delimited identifiers. We'll discuss delimited identifiers later. Much later, if you're lucky… Conditions: IN, BETWEEN, LIKE We can also do wildcard matching using LIKE. LIKE works similarly to =, except that it can accept wildcard characters. The character "%" matches a string of any length. The character "_" matches a single character. Here is an example of how to use LIKE: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (LAST_NAME LIKE 'S%') OR (LAST_NAME LIKE '___') This query will return one row in its result set for any employee whose name starts with "S" or whose name is three characters in length. (Hey, I didn't say it was a useful query.) Note that, depending upon the collation used, LIKE may be case-sensitive, and 'S' may not equal 's'. Be careful when using LIKE since the database server won't always be able to optimize your query very well. This can be dangerous when selecting from a table with a lot of records in it. We'll cover basic query optimization later on. BETWEEN is useful for selecting ranges of values. It is inclusive of the specified values: SELECT LAST_NAME FROM EMPLOYEE WHERE HIRE_DATE BETWEEN '1/1/2001' AND '12/31/2001' IN allows you to select records matching a specified set of values: SELECT LAST_NAME FROM EMPLOYEE WHERE EMP_NO IN (20, 24, 28, 44) IN can also be used with sub-selects as you will see. Dealing with NULL Please read the next paragraph very carefully: NULL is not a value. It is a state representing no value. A field in a table can have many different values. If it's an integer field, it can have about four billion values. But there are only two states: NULL and NOT NULL. NULL means that either no value has ever been specified for that field in that particular row, or the NULL state has been explicitly assigned to it. NULL is not the same thing as zero, or an empty string. The SQL rules for dealing with NULL in comparisons and operations (addition, subtraction, and the like) are very simple: the result of any operator and NULL is always NULL. So NULL + 100 = NULL, a string concatenated with NULL is also NULL In a WHERE clause, any comparison with NULL will always evaluate to FALSE. So the following query: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE = NULL) /* Don't do this! */ …will never return any records, since nothing can ever equal NULL, not even NULL itself! However, we sometimes want to return records where a given column is in the NULL state, so there is a special operator for this: IS SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NULL) Likewise, we might want to return the list of records where the HIRE_DATE column is not in the NULL state: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) Dealing with Dates There is often confusion about the correct format to use when expressing a date as a string literal. InterBase will accept a few different formats, but you should select one and use it consistently to avoid confusion when coding. The most common format is 'mm/dd/yyyy'. Other accepted formats include 'mm-dd-yyyy', 'dd.mm.yyyy' and 'yyyy.mm.dd'. Dates must always be in single quotes. InterBase also accepts a few special quoted strings as the dates they imply, these are 'TODAY', 'YESTERDAY', 'TOMORROW'. Starting with version 6.0, the more standard keyword CURRENT_DATE should be used, you can specify tomorrow with CURRENT_DATE + 1. Within Delphi/BCB, one should always use the provided AsDateTime methods available in database field objects in order to avoid depending on the field objects to convert from string values. Converting strings to dates depends on the settings of individual workstations (e.g. Windows settings). Ordering result sets SQL result sets are always unordered unless you specify otherwise. Put another way, you cannot expect the rows in the result set to come back in the same order even if you run the exact same query twice unless you specify that you'd like the result set ordered. The records might come back in storage order (the order on the disk), they might come back in random order — you just don't know. So how do you tell the server that you'd like the records ordered? Simply specify which columns should be used to sort the records: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) ORDER BY LAST_NAME This query will return the records in alphabetical order, based upon the employee's name. They'll be in ascending order (that is, A to Z), because that is the default and we did not specify descending order. Note that two employees might have the same name. Since we didn't specify how to order two records with the same value in the LAST_NAME column, we will receive them in the result set in a random order — if there are two employees named "Smith" we simply cannot expect them to sort in any particular order unless we specify more fields in the ORDER BY clause: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) ORDER BY LAST_NAME, HIRE_DATE DESC In this query, we're telling the server to first order the records by the LAST_NAME column (in ascending order, since there is no DESC next to name), and then order any group of records with the same name in descending order by the value in the HIRE_DATE column. HIRE_DATE is not one of the values we've selected in the result set, but that's OK; there's no requirement that a column be mentioned in the fields list in order to be used in the ORDER BY clause. As of version 6.5, InterBase will always put records with a NULL state at the top of a list of sorted records, whether the list is ascending or descending. Prior versions sorted NULL to the bottom. Parameterized queries It is common in client applications to run very similar queries over and over again. A good example is a master-detail form. As we scroll through the result set representing the master table, the query for the detail result set will be run many times, once for each record. Because this is so common, there is a special optimization for it supported by nearly every database server on the market, called a parameterized query. When you type some SQL and send it to the server, there is a lot which has to happen in order to produce your result set. The server must parse your SQL into an internal structure it can understand. The server's optimizer must examine the indices on the table and determine if any of them can be used to run the query more quickly. This is called preparing a query. On the client application, the component sets you're using to build your application typically do some actions behind the scenes, such as interrogating the system table for field lengths and the like. All of this takes time, and if you know ahead of time that you'll be running the same query over and over again, it's redundant after the first time. When you use a parameterized query, you are in effect telling the server that you'll be running more or less the same query several times, and changing only the values used for comparison in the WHERE clause. For example, let's say we want a list of projects for a given employee. We'll use this query on a master detail form which allows the user to scroll through a list of employees and see the projects each employee is working on.The "master" query will be one of the queries above which returns EMP_NO as part of the result set. Now we need to write a detail query which shows projects for a given master record: SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = :EMP_NO The colon in this query means that the second reference to EMP_NO is not a field but is a parameter. The parameter, in this case, has the same name as the field EMP_NO, but that's not a requirement. We can now tell the server to prepare this query and then run it multiple times changing only the value of the parameter. It would be like running these two, separate queries: SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = 20 SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = 21 …without the overhead of preparing two separate SQL statements. You can specify any number of parameters in a query. Each time the query is executed, you can supply different values. Parameters can also have a NULL state, but remember that if we set the EMP_NO parameter to NULL then this query would return no records since EMP_NO = NULL is false! INSERT In some ways, the discussion of SELECT was premature, since we can't SELECT result sets from the database until there is data to SELECT! The INSERT verb is used to create a new row in a single table. To create a new row in the EMPLOYEE table, we could use the following statement: INSERT INTO EMPLOYEE ( EMP_NO, LAST_NAME ) VALUES ( 123, 'Smith'); This statement, when run will result in the creation of a new record in the table with the value 123 in the EMP_NO column and the value "Smith" in the LAST_NAME column. These two columns will have the non-NULL state, but all other columns in the table will have the NULL state, unless the database administrator has created a DEFAULT value for the column. There is no requirement to specify every column in the table in an INSERT statement, but a column may have a NOT NULL constraint. If there is a NOT NULL constraint on the column and there is not a DEFAULT value for the column then you must specify a value for the column in your INSERT statement or the server will be unable to create the record and you'll get an error. If you will be specifying a value for every column in the new row then it's permissible to omit the column list and just give the values, e.g. INSERT INTO EMPLOYEE VALUES ( 123, 'John', 'Smith', '1234', '05/24/2002', 600, 'VP', '2', 'USA', 100000); But it's crucial to specify the column values in the order they are defined. It is generally advisable to always specify the column list as it helps avoid problems. INSERT statements do not return a result set. Only SELECT statements return a result set. Copying data with INSERT … SELECT It is sometimes necessary to INSERT many records into a table at once. For example, we might want to create a new project for every employee who was hired today. Instead of specifying the VALUES for the new row as constants, like in the example above, we can embed a SELECT statement into the INSERT statement, and one row will be created for each row in the result set of the SELECT statement: INSERT INTO EMPLOYEE_PROJECT ( EMPLOYEE_PROJECT.PROJID, EMPLOYEE_PROJECT.EMP_NO) SELECT 'BORCON', EMPLOYEE.EMP_NO FROM EMPLOYEE WHERE EMPLOYEE.HIRE_DATE = CURRENT_DATE This is a somewhat complex statement, so let's look at it carefully. The first line states that we'll be inserting rows into the EMPLOYEE_PROJECT table. The second line specifies which fields we'll be specifying values for (later in the statement). It's not necessary to specify the table name in the fields list as I have done here, but I wanted to clearly distinguish the columns in the table we're inserting into from the columns coming from the table we're Selecting from, since some of them have the same names. The rest of the statement tells InterBase that the values will be coming from a SELECT statement rather than a list of constants. For each row in the result set, one row will be created in EMPLOYEE_PROJECT. CURRENT_DATE is a reserved word (in InterBase 6 and later) which always returns the current date from the server's clock, so the SELECT portion of the statement will return one row for each employee who was hired today. There is only one item in the fields list of the SELECT portion of the statement which is actually a field in the EMPLOYEE table (EMP_NO). The other item is a constant. If we ran the SELECT by itself, the result set might look like this: COLUMN_1 EMP_NO 1 21 1 22 So these are the values which will be assigned to the two rows which will be inserted into the EMPLOYEE_PROJECT table (presuming that there were two employees hired today). Note on the Select clause of the Insert…Select statement:This is an Insert statement, not a Select statement. Only single table or view selects are allowed in the Select clause, Interbase will not currently allow joins. UPDATE The UPDATE verb is used to change the value of existing rows. Use of a WHERE clause is essential with UPDATE, because, by default, UPDATE changes every row in the table! If we were going to pay everyone in the company the exact same salary, for example, we might run the following statement: UPDATE EMPLOYEE SET SALARY = 5000 Running a statement like this is much more efficient than iterating through the table and updating each record, one at a time, but for the most part we generally do want to change values in only one record or a number if specific records matching some criteria, so most UPDATE statements have WHERE clauses: UPDATE EMPLOYEE SET SALARY = 5000 WHERE EMP_NO = 20 Bad news for employee 20, but the rest of us can breathe easy. It would probably not be a good idea to write an UPDATE statement like this: UPDATE EMPLOYEE SET SALARY = 5000 WHERE FIRST_NAME = 'Bob' …because if there were more than one Bob in the company then we'd be giving a lousy salary to all of them, not just the lazy one. In general, use the primary key or another unique key of the table when writing a WHERE clause designed to affect only one record. It's common to write UPDATE statements which specify the value of every column in the table in the WHERE clause. Since the primary key will always uniquely identify a given row, why would anyone do this? The answer is that this is way of protecting against multiuser conflicts (transactions are another way, but they do it differently). Let's imagine that a user, Emma, creates a new record. Later, another user, Peter, updates the record. Emma returns to her computer, but she hasn't refreshed her display so she doesn't see Peter's changes to the record. Emma decides to change the record. If only the primary key was specified in the WHERE clause of the UPDATE statement, then Emma would write over Peter's changes without realizing it. By specifying all field values in the WHERE clause, and not just the primary key value, the UPDATE statement will have no effect (i.e, it will not change any rows) if the record has been changed since Emma last looked at it. The client application can detect that the statement had no effect and let Emma know that the row has changed (or been deleted) since she last SELECTed it. DELETE DELETE looks just like UPDATE except, for obvious reasons, there is no list of fields: DELETE FROM EMPLOYEE WHERE EMP_NO = 20 Like UPDATE, you can omit the WHERE clause if you want to affect (in this case, clear) the entire table: DELETE FROM EMPLOYEE Moving beyond simple SELECTs Subqueries A subquery is a SELECT inside of another SQL statement and is used to select values from another table to be used for qualifying or setting values in the outer SQL statement. Subqueries can also be used in other statements too but are most commonly used in Select statements: SELECT EMPLOYEE.EMP_NO, EMPLOYEE.LAST_NAME FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO IN (SELECT PROJECT.EMP_NO FROM PROJECT) This query will return the list of employees who have at least one project. Again, it's not strictly necessary to use the table names in the fields list (e.g., EMPLOYEE.EMP_NO), but it helps to clarify where the fields are coming from since EMPLOYEE and EMPLOYEE_PROJECT both have a field called EMP_NO. This query shows an example of using the IN operator with a subquery. We can't use the = comparison here, because the subquery returns many rows, and we can't use an equality comparison to compare a single value with a large number of rows. If we had tried to use = instead of IN, we'd receive the charming error message, "Multiple rows in singleton select." The = comparison could only be used if we were absolutely certain the subquery would return only one row. Subqueries can also be used in UPDATE statements. Let's imagine you want to set the project start date on every row in the EMPLOYEE_PROJECT table equal to the HIRE_DATE for the corresponding employee in the EMPLOYEE table. The following statement would do the trick: UPDATE EMPLOYEE_PROJECT SET EMPLOYEE_PROJECT.START_DATE = (SELECT EMPLOYEE.HIRE_DATE FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO = EMPLOYEE_PROJECT.EMP_NO ) (This query won't actually work since the EMPLOYEE_PROJECT doesn't have a field called START_DATE.) This kind of statement works fairly well for tables which are not especially large. For very large tables, use a stored procedure to do the update (see the Optimization section for details). Attempting this on a large table, especially if more than one subquery must be used to set field values, will perform very poorly because the subqueries must be executed once for each record to be updated. The following statement will delete any employees with no projects: DELETE FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO NOT IN (SELECT EMPLOYEE_PROJECT.EMP_NO FROM EMPLOYEE_PROJECT ) JOINs Everything we have seen thus far is useful, but JOINs are the real power of a relational database server. JOINs allow the developer to combine data from multiple tables in to a single result set. It is common when designing databases to use a lot of separate tables for various attributes of a single entity. For example, you might store records of an employee's salary, residence, etc., in separate tables rather than expanding the EMPLOYEE table. There a lot of advantages to doing this (the employee might have more than one residence, for example), but without being able to combine the information it would be very difficult to create a mailing list. A JOIN is simply a way of telling the database server how to match rows from two different tables. You can then return fields from both tables in your result set. For example, let's imagine we want to return a result set with employee names and the names of the projects they're working on. We need to tell the database server to match EMPLOYEE_PROJECT records with records in the EMPLOYEE table. In particular, given an EMPLOYEE_PROJECT record, we need to find a record in the EMPLOYEE table with the same EMP_NO value. The following query JOINs the two tables together to produce the desired result: SELECT EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID FROM EMPLOYEE INNER JOIN EMPLOYEE_PROJECT ON EMPLOYEE_PROJECT.EMP_NO = EMPLOYEE.EMP_NO ORDER BY EMPLOYEE.LAST_NAME, EMPLOYEE_PROJ_ID This query uses a particular type of JOIN called an INNER JOIN. INNER JOIN means that one or more records with a given EMP_NO must be present in both the EMPLOYEE_PROJECT and EMPLOYEE tables in order to produce a row in the result set. In other words, if there was a row in the EMPLOYEE table for employee #10 but there was no row in the EMPLOYEE_PROJECT table for employee #10 then there would be no row in the result set for employee #10. If no JOIN type is specified in the query (if we write JOIN EMPLOYEE instead of INNER JOIN EMPLOYEE) then an INNER JOIN will be used. INNER JOIN is the default join type. The second most common type of JOIN is a LEFT OUTER JOIN. This type of JOIN returns at least one row for each row in the "left" table, whether or not there is a corresponding row in the "right" table. The terms "left" and "right" often seem confusing at first, but that's what the SQL Standard calls them. The terms are based on the fact that we (at least most of the western world) read from left to right, therefore just remember that left refers to the table mentioned first and right refers to the second table being joined to the first. In the example above, the EMPLOYEE table is the left table, and the EMPLOYEE_PROJECT table is the right table. As previously mentioned, the query above won't return any rows in its result set for employees without projects. What if we wanted to see these employees anyway? Use a LEFT OUTER JOIN instead of an INNER JOIN: SELECT EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID FROM EMPLOYEE LEFT OUTER JOIN EMPLOYEE_PROJECT ON EMPLOYEE_PROJECT.EMP_NO = EMPLOYEE.EMP_NO ORDER BY EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID In this result set, the column EMPLOYEE_PROJECT.PROJ_ID will have a NULL state on rows for employees who don't have any projects. This might seem confusing since there are no records in the EMPLOYEE_PROJECT table with the NULL state in the PROJ_ID field, but it simply indicates that there is no matching record. In other words, if a column is NULL in the result set of a query resulting from a JOIN, one of two things is going on: 1. There was a matching row in the appropriate table, and the state of the corresponding column for that row was NULL, or 2. There was no matching row in the appropriate table. In addition to INNER and LEFT OUTER joins, there are also RIGHT OUTER and FULL OUTER joins (OUTER is optional and can be omitted). A RIGHT join, as it implies, is opposite of a LEFT join - all records are selected from the table on the right, and only matching records are selected from the table on the left. A FULL join is very rarely ever desired, it will select all records from both tables whether or not there is a match in the other. Using aliases for field and table names As we get used to the idea of having multiple tables in a SELECT statement, specifying the table names can get to be a lot of typing. SQL allows us to specify an alias for table names: SELECT E.LAST_NAME, EP.PROJ_ID FROM EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE_PROJECT AS EP ON EP.EMP_NO = E.EMP_NO ORDER BY E.LAST_NAME, EP.NAME The the alias for a table is defined by specifying it following the table name in the FROM clause. SQL allows you to use the alias before it's defined. Fields can also be given aliases. As with table aliases, they are defined directly following the column name in the field list: SELECT E.LAST_NAME AS SURNAME, EP.PROJ_ID AS PROJECT FROM EMPLOYEE E LEFT OUTER JOIN EMPLOYEE_PROJECT EP ON EP.EMP_NO = E.EMP_NO ORDER BY E.LAST_NAME, EP.NAME The "AS" operator is optional and can be omitted in Interbase. Field aliases can only be used for specifying the name to be returned in the result set, a field alias name cannot be used to reference the field elsewhere in the query (such as in an Order By clause). Aggregate Functions The MAX, MIN, COUNT, SUM, and AVG (average) functions do exactly what their names imply. They are called "aggregate functions" because they work on the entire result set at once instead of a single row. COUNT deserves special mention because it works in two different ways. COUNT(*) will count the number of rows in the table, after the WHERE clause is applied, JOINs are performed, etc. COUNT(FIELD_NAME) counts the number of non-NULL values in a particular column. COUNT(DISTINCT FIELD_NAME), of course, counts the number of distinct values there are in a particular column. For example, if you had a table with four rows in it, containing the following data: ID SOME_VALUE 1 2 1 3 2 4 2 Then the following query: SELECT COUNT(*), COUNT(SOME_VALUE), COUNT(DISTINCT SOME_VALUE), MAX(ID) Would return this result set: COUNT COUNT_2 COUNT_3 MAX 4 3 2 4 Grouping Since aggregate functions operate on all rows of the table at once and produce a single row as a result set, they can't be combined with "normal" selection of fields without grouping the results somehow. In other words, you can't SELECT employee names and the count of employees in the same statement, because the employee names selection produces many rows and the count of employees produces only a single row. You might, however, want to know how many employees have the same name. The GROUP BY clause allows you to tell the aggregate function that it should run once for each distinct value of a certain field or group of fields in the table. SELECT LAST_NAME, COUNT(LAST_NAME) FROM EMPLOYEE GROUP BY LAST_NAME ORDER BY LAST_NAME The GROUP BY clause must contain all columns mentioned in the fields list except for the aggregat functions. So in this example, LAST_NAME must be in the Group By clause. This query tells us what we want — for each name, how many people share the name. But the value for most names will be "1" and these ones would likely be uninteresting to the user of this query. So we need a way to eliminate the rows with unique names. HAVING The Having clause qualifies the Group By clause. After the Group By has done its work, any conditions specified in the Having clause are applied. So we can improve our query as follows: SELECT LAST_NAME, COUNT(LAST_NAME) FROM EMPLOYEE GROUP BY LAST_NAME HAVING COUNT(LAST_NAME) > 1
ORDER BY
LAST_NAME
Now we'll only see Last Names where at least two people share that name. This is accomplished by repeating the Count aggregate funtion on Last_Name.
DISTINCT
Distinct allows you to select unique values, or combinations of values, when there are duplicates. For example, you may want to see all the different cities your customers are from. If you execute the following:
SELECT
CITY
FROM CUSTOMER
ORDER BY
CITY
You will indeed see all the different cities, but each city will be repeated as many times as there are customers from that city. What is really wanted is each unique city:
SELECT DISTINCT
CITY
FROM CUSTOMER
ORDER BY
CITY
Cities are not necessarily unique across different provinces or states, so if your customers are international then the above won't be good enough. Distinct works on multiple columns:
SELECT DISTINCT
CITY, STATE_PROVINCE, COUNTRY
FROM CUSTOMER
ORDER BY
CITY, STATE_PROVINCE, COUNTRY
Note the Order By is not required to include the same columns (or be present at all) but it is very common to do so.
CAST, EXTRACT, and other functions
Cast allows you to coerce one datatype into another. The conversion being asked for must make sense or Interbase will raise an exception. All the valid conversions can be found in the Interbase documentation. A simple example will give the idea:
SELECT
CAST(QTY_ORDERED AS FLOAT) / 0.35
FROM SALES
Cast is not needed very often as Interbase will generally perform an implicit conversion for you. The above statement would give the same results without the cast.
Extract is available starting with Version 6.0. Extract allows parts if date and time fields to be extracted, for example:
SELECT
LAST_NAME, FIRST_NAME, EXTRACT(YEAR FROM HIRE_DATE)
FROM EMPLOYEE
The parts that can be extracted are: YEAR, MONTH, DAY, WEEKDAY, HOUR, MINUTE, and SECOND.
ALL, ANY, SOME, EXISTS, SINGULAR
These operators are used with sub-queries. Earlier you learned about the IN operator which lets you test whether a selected value exists within a set of specified or sub-selected values - it tests for equality.
ALL allows you to perform comparisons other than equality. For example, if you want to find those employees with a salary greater than the salary of all employees in your department, you might try:
SELECT
LAST_NAME, SALARY
FROM EMPLOYEE
WHERE
SALARY >ALL
(SELECT
SALARY
FROM EMPLOYEE
WHERE
DEPT_NO = 623
)
ANY and SOME are synonyms, use whichever you like. To find employees in your department with salaries less than the lowest in any other department, you could find them with:
SELECT
LAST_NAME, SALARY
FROM EMPLOYEE
WHERE
SALARY < ANY (SELECT SALARY FROM EMPLOYEE WHERE DEPT_NO <> 623
)
EXISTS is very valuable. Use it anywhere you need to know something exists but do not need to know the count.
If you want to see a list of Projects where no employees are currently assigned you could do this:
SELECT
PROJ_ID, PROJ_NAME
FROM PROJECT P
WHERE
NOT EXISTS
(SELECT
EMP_NO
FROM EMPLOYEE_PROJECT EP
WHERE
EP.PROJ_ID = P.PROJ_ID
)
SINGULAR returns true if exactly one row from the sub-select matches your test. The following will return projects where there is exactly one employee assigned.
SELECT
PROJ_ID, PROJ_NAME
FROM PROJECT P
WHERE
SINGULAR
(SELECT
EMP_NO
FROM EMPLOYEE_PROJECT EP
WHERE
EP.PROJ_ID = P.PROJ_ID
)
InterBase-specific SQL
STARTING WITH
Use this instead of LIKE wherever possible. This is most important for parameterized queries as a parameterized LIKE cannot use an index while STARTING WITH can.
CONTAINING
CONTAINING is similar to using LIKE with wildcards on both sides, e.g. LIKE '%BASE%'. However CONTAINING is case-insensitive and can also be used to search within text blob columns.
ROWS
Rows is a new feature starting with version 6.5. Rows is very flexible and provides a great deal of power. At its simplest, it less you select the n rows from a query.
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 1 TO 10
This query returns only the first 10 records of the result set (You can also specify just ROWS 10). Strictly speaking, Rows can be used without the Order By, but this woud be of little value. In this example, the 10 employees with the highest salaries will be returned. It's possible you may want all the employees with the 10 highest salaries (more than one employee may have the same salary), in this case you can use the WITH TIES option:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 1 TO 10 WITH TIES
So if there were 17 employees sharing the top 10 salaries, all 17 would be returned.
Rows can also be specified as a percent:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 10 PERCENT
Finally, Rows can be used for sampling. The following example returns every 20th record from the result set up to a maximum of 100 records:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 100 BY 20
Probably one of the most sought after uses of this feature is to allow for very efficient paging of result sets as commonly seen on many web sites; some number of records, typically 10, are displayed with Next and Previous buttons. Rows allows you to return any slice of a result set
The Rows feature can also be used in Update and Delete statements. Normally, the Order By clause is only valid in a Select statement. However, it becomes valid for other statements as long as it is used along with the Rows feature (otherwise using rows in these statements would be of little value). For example, the following gives a raise to the top 10 salaried employees hired before 2002:
UPDATE EMPLOYEE
SET
SALARY = SALARY * 1.10
WHERE
EXTRACT(YEAR FROM HIRE_DATE) < 2002 ORDER BY SALARY DESC ROWS 10 The following deletes the 10 employees with the newest hire date: DELETE FROM EMPLOYEE ORDER BY HIRE_DATE DESC ROWS 10 RDB$DATABASE This is a system table that is guaranteed to only have a single record in it. This makes a few special queries possible such as: Asking Interbase to return the current date and time from the server SELECT CURRENT_TIMESTAMP FROM RDB$DATABASE /* 6.0+ */ SELECT CAST('TODAY' AS DATE) FROM RDB$DATABASE /* 5.6- */ Asking Interbase to return the currently logged-in user SELECT USER FROM RDB$DATABASE There are other situations where such a table is needed, for example to simply test for the existence of a specific record - the EXISTS function can only be used with a sub-query: SELECT 1 FROM RDB$DATABASE WHERE EXISTS ( SELECT * FROM EMPLOYEE_PROJECT WHERE EP.PROJ_ID = 123 ) Generators Generators are a way to create unique values in a multi-user database. In a single-user database it is acceptable to do things such as manually incrementing a value from a "key" table, or adding one to the highest current value in a table, but these schemes will not do in a multi-user environment. Most RDBMS products offer something similar, MS SQL has Identity fields and Oracle has Sequences. The standard way to use a generator from in a query is like this: SELECT GEN_ID(GEN_CUSTOMER, 1) FROM RDB$DATABASE Generators can also be used in insert and update statements. It is also common to see them as part of insert triggers. Taken from : http://blogs.teamb.com/craigstuntz/dtpostname/ibsqlintro/
Introduction to InterBase SQL
SQL is not difficult to learn — which is good since any attempt to bypass learning it when working with a database server will almost certainly end in an inefficient failure. This session will introduce the basics of SQL (Structured Query Language, occasionally pronounced "sequel") grammar, and then progress to include intermediate syntax such as JOINs, aggregate functions. Finally, some Interbase-specific keywords and optimizations will be covered.
Introduction
SQL is not difficult to learn — which is good since any attempt to bypass learning it when working with a database server will almost certainly end in an inefficient failure. This session will introduce the basics of SQL (Structured Query Language, occasionally pronounced "sequel") grammar, and then progress to include intermediate syntax such as JOINs, aggregate functions. Finally, some Interbase-specific keywords and optimizations will be covered.
All database servers are slightly different, but most of them use SQL as the primary interface for data modification and retrieval. SQL is an ANSI standard, and was designed to provide a database server-independent way of requesting or modifying data. In practice, the standard, in spite of its monstrous length, leaves enough details for the implementers (the developers of the database servers) to decide upon, that writing a complex query which will work on every DB server on the market can range from tricky to impossible. Since most DB producers add proprietary features to their SQL implementation, SQL is only partially successful as a way to write database-independent client software.
While they're significant enough to frustrate developers attempting to make a single query run against different servers, these differences are not so great that you will have to re-learn your SQL skills from scratch when moving from, say, SQL Server to InterBase. Once you've learned basic SQL, you should be able to do productive work with most of the popular commercial database servers.
All of the SQL examples in this paper are designed to work with employee.gdb, one of the example databases which ships with InterBase. Feel free to try each query using the ISQL window in IBConsole. There is a lot of information in this paper, and if you're new to SQL you may find it helpful to try out each section as you go.
The four essential database operations
Generally, users need to be able to do one of four things with a particular record (often referred to as CRUD)
1. Create a new record
2. Read the record
3. Update the data in the record
4. Delete the record
SELECT
The SELECT verb is used to retrieve one or more records from one or more tables in the database. Some database servers allow you to retrieve records from multiple databases in a single SELECT statement, but InterBase limits SELECT statements to tables, views, and procedures within a single database.
A simple SELECT statement looks like this:
SELECT
EMP_NO, LAST_NAME
FROM EMPLOYEE
A SELECT statement produces a result set, or a collection of data produced by the server in response to a query. When dealing with data from a database in a client application, it is helpful to think of data in terms of result sets instead of as tables, since the returned records may or may not correspond to a single physical table in the database.
This particular SELECT statement will produce a result set with two columns, and with one row for every row in the EMPLOYEE table. Each row in the result set will have one column corresponding to EMP_NO, and one column corresponding to LAST_NAME.
Another way to write this query is:
SELECT
EMPLOYEE.EMP_NO, EMPLOYEE.LAST_NAME
FROM EMPLOYEE
In this case, we have specified the table name both in the list of fields we'd like in the result set, and in the FROM clause, where table names need to be listed. It's not necessary to do this in a SELECT statement which only references a single table, but this syntax will be useful later on when we examine how to join data from multiple tables together into a single SQL statement.
Specifying a fields list
In the queries above, we have specified that we'd like two fields from the EMPLOYEE table inour result set. There is a shorthand syntax to indicate that we'd like every field from the EMPLOYEE table in the result set. Just specify "*" instead of a list of fields:
SELECT *
FROM EMPLOYEE
This syntax is useful when running queries against a table when you're not sure what the field names are, but I don't recommend that you use it in client applications. The reason is that network traffic is one of the biggest bottlenecks in an application which works with a database server. The less network traffic, the better! So it is wise to specify only the columns which your application absolutely requires in your SELECT statements. (We'll examine how to retrieve only the rows your application absolutely requires in just a moment.) Even if you need every column in a given table, it is still wise to specify the list of fields explicitly instead of using "*" because additional columns not required by your application might be added to the table later on.
So you should always specify the field names required by your application when writing SELECT statements. You can specify as many fields as your application requires, just separate them with commas like in the example above.
Restricting returned records with WHERE
For the same reason that it is advisable to specify only the columns required by your application in a SELECT statement — network bandwidth — it is even more advisable to restrict the rows returned to only those your user is interested in seeing. It is not at all uncommon to have hundreds of megabytes of data in a single table, and chances are very good that your users only want to see a few of these records at a time. Chances are even better that they don't want to wait for millions of records they're not interested in to cross the network before they can see the 10 or so records that they asked for.
A note for those migrating older applications where users could simply browse entire tables.
I regularly encounter a great deal of resistence to the idea that this mode of operation must change. The usual excuse is "the users need to be able to see all the records" and "they've always done it that way".
The short answer is: No, they do not need to see all the records.
No user can do anything productive browsing back and forth among thousands of records, they are always looking for particular records, or at worse, a small set of related records. The key is to give them an easy to use means of specifying what they are actually looking for. While there are often a few users who continue to insist, the vast majority catch on easily and find it much more productive.
A WHERE clause is the part of a select statement used to restrict the rows returned as a part of a result set. Here is an example of a SELECT statement with a WHERE clause added:
SELECT
LAST_NAME
FROM EMPLOYEE
WHERE
EMP_NO = 20
Imagine that we want to know the name of employee #20. This query will return that information with minimal network traffic. If we looked at the metadata for the employee table we would find that there is a constraint, in the form of a Primary Key, placed on the EMP_NO column which tells the database server to ensure that values in this column are unique — no two employee records can have the same value in the EMP_NO column. Therefore, the query above, which specifies a single value for EMP_NO in the WHERE clause will return at most one row. It would return no rows if no employee with EMP_NO = 20 existed.
Comparison Operators: =, <, < =, > =, >, <>
The above query used the Equals operator. As with programming languages, the other standard comparison operators are also available. The Not Equal operator can be specified as either"<>" or "!=".
Thus…
SELECT
LAST_NAME
FROM
EMPLOYEE WHERE
EMP_NO <> 20
… will bring back all employee names except the one for employee #20.
SELECT
LAST_NAME
FROM
EMPLOYEE WHERE
EMP_NO <= 20 … will bring back names for those employees having an Emp_No of less than or equal to 20. Boolean Operators: AND, OR, NOT Sometimes we need more complex conditions in a WHERE clause. The AND, OR, and NOT boolean operators can be used to combine multiple conditions: SELECT LAST_NAME FROM EMPLOYEE WHERE ((EMP_NO >= 20)
AND
(EMP_NO <= 30)) AND (NOT (JOB_CODE = 'VP')) The use of parentheses when combining multiple conditions is strongly recommended. There is a set order of operations, but using parentheses will save confusion. Note that strings and dates are always specified with single quotes, since double quotes are reserved for a special SQL feature called delimited identifiers. We'll discuss delimited identifiers later. Much later, if you're lucky… Conditions: IN, BETWEEN, LIKE We can also do wildcard matching using LIKE. LIKE works similarly to =, except that it can accept wildcard characters. The character "%" matches a string of any length. The character "_" matches a single character. Here is an example of how to use LIKE: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (LAST_NAME LIKE 'S%') OR (LAST_NAME LIKE '___') This query will return one row in its result set for any employee whose name starts with "S" or whose name is three characters in length. (Hey, I didn't say it was a useful query.) Note that, depending upon the collation used, LIKE may be case-sensitive, and 'S' may not equal 's'. Be careful when using LIKE since the database server won't always be able to optimize your query very well. This can be dangerous when selecting from a table with a lot of records in it. We'll cover basic query optimization later on. BETWEEN is useful for selecting ranges of values. It is inclusive of the specified values: SELECT LAST_NAME FROM EMPLOYEE WHERE HIRE_DATE BETWEEN '1/1/2001' AND '12/31/2001' IN allows you to select records matching a specified set of values: SELECT LAST_NAME FROM EMPLOYEE WHERE EMP_NO IN (20, 24, 28, 44) IN can also be used with sub-selects as you will see. Dealing with NULL Please read the next paragraph very carefully: NULL is not a value. It is a state representing no value. A field in a table can have many different values. If it's an integer field, it can have about four billion values. But there are only two states: NULL and NOT NULL. NULL means that either no value has ever been specified for that field in that particular row, or the NULL state has been explicitly assigned to it. NULL is not the same thing as zero, or an empty string. The SQL rules for dealing with NULL in comparisons and operations (addition, subtraction, and the like) are very simple: the result of any operator and NULL is always NULL. So NULL + 100 = NULL, a string concatenated with NULL is also NULL In a WHERE clause, any comparison with NULL will always evaluate to FALSE. So the following query: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE = NULL) /* Don't do this! */ …will never return any records, since nothing can ever equal NULL, not even NULL itself! However, we sometimes want to return records where a given column is in the NULL state, so there is a special operator for this: IS SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NULL) Likewise, we might want to return the list of records where the HIRE_DATE column is not in the NULL state: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) Dealing with Dates There is often confusion about the correct format to use when expressing a date as a string literal. InterBase will accept a few different formats, but you should select one and use it consistently to avoid confusion when coding. The most common format is 'mm/dd/yyyy'. Other accepted formats include 'mm-dd-yyyy', 'dd.mm.yyyy' and 'yyyy.mm.dd'. Dates must always be in single quotes. InterBase also accepts a few special quoted strings as the dates they imply, these are 'TODAY', 'YESTERDAY', 'TOMORROW'. Starting with version 6.0, the more standard keyword CURRENT_DATE should be used, you can specify tomorrow with CURRENT_DATE + 1. Within Delphi/BCB, one should always use the provided AsDateTime methods available in database field objects in order to avoid depending on the field objects to convert from string values. Converting strings to dates depends on the settings of individual workstations (e.g. Windows settings). Ordering result sets SQL result sets are always unordered unless you specify otherwise. Put another way, you cannot expect the rows in the result set to come back in the same order even if you run the exact same query twice unless you specify that you'd like the result set ordered. The records might come back in storage order (the order on the disk), they might come back in random order — you just don't know. So how do you tell the server that you'd like the records ordered? Simply specify which columns should be used to sort the records: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) ORDER BY LAST_NAME This query will return the records in alphabetical order, based upon the employee's name. They'll be in ascending order (that is, A to Z), because that is the default and we did not specify descending order. Note that two employees might have the same name. Since we didn't specify how to order two records with the same value in the LAST_NAME column, we will receive them in the result set in a random order — if there are two employees named "Smith" we simply cannot expect them to sort in any particular order unless we specify more fields in the ORDER BY clause: SELECT EMP_NO, LAST_NAME FROM EMPLOYEE WHERE (HIRE_DATE IS NOT NULL) ORDER BY LAST_NAME, HIRE_DATE DESC In this query, we're telling the server to first order the records by the LAST_NAME column (in ascending order, since there is no DESC next to name), and then order any group of records with the same name in descending order by the value in the HIRE_DATE column. HIRE_DATE is not one of the values we've selected in the result set, but that's OK; there's no requirement that a column be mentioned in the fields list in order to be used in the ORDER BY clause. As of version 6.5, InterBase will always put records with a NULL state at the top of a list of sorted records, whether the list is ascending or descending. Prior versions sorted NULL to the bottom. Parameterized queries It is common in client applications to run very similar queries over and over again. A good example is a master-detail form. As we scroll through the result set representing the master table, the query for the detail result set will be run many times, once for each record. Because this is so common, there is a special optimization for it supported by nearly every database server on the market, called a parameterized query. When you type some SQL and send it to the server, there is a lot which has to happen in order to produce your result set. The server must parse your SQL into an internal structure it can understand. The server's optimizer must examine the indices on the table and determine if any of them can be used to run the query more quickly. This is called preparing a query. On the client application, the component sets you're using to build your application typically do some actions behind the scenes, such as interrogating the system table for field lengths and the like. All of this takes time, and if you know ahead of time that you'll be running the same query over and over again, it's redundant after the first time. When you use a parameterized query, you are in effect telling the server that you'll be running more or less the same query several times, and changing only the values used for comparison in the WHERE clause. For example, let's say we want a list of projects for a given employee. We'll use this query on a master detail form which allows the user to scroll through a list of employees and see the projects each employee is working on.The "master" query will be one of the queries above which returns EMP_NO as part of the result set. Now we need to write a detail query which shows projects for a given master record: SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = :EMP_NO The colon in this query means that the second reference to EMP_NO is not a field but is a parameter. The parameter, in this case, has the same name as the field EMP_NO, but that's not a requirement. We can now tell the server to prepare this query and then run it multiple times changing only the value of the parameter. It would be like running these two, separate queries: SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = 20 SELECT PROJ_ID FROM EMPLOYEE_PROJECT WHERE EMP_NO = 21 …without the overhead of preparing two separate SQL statements. You can specify any number of parameters in a query. Each time the query is executed, you can supply different values. Parameters can also have a NULL state, but remember that if we set the EMP_NO parameter to NULL then this query would return no records since EMP_NO = NULL is false! INSERT In some ways, the discussion of SELECT was premature, since we can't SELECT result sets from the database until there is data to SELECT! The INSERT verb is used to create a new row in a single table. To create a new row in the EMPLOYEE table, we could use the following statement: INSERT INTO EMPLOYEE ( EMP_NO, LAST_NAME ) VALUES ( 123, 'Smith'); This statement, when run will result in the creation of a new record in the table with the value 123 in the EMP_NO column and the value "Smith" in the LAST_NAME column. These two columns will have the non-NULL state, but all other columns in the table will have the NULL state, unless the database administrator has created a DEFAULT value for the column. There is no requirement to specify every column in the table in an INSERT statement, but a column may have a NOT NULL constraint. If there is a NOT NULL constraint on the column and there is not a DEFAULT value for the column then you must specify a value for the column in your INSERT statement or the server will be unable to create the record and you'll get an error. If you will be specifying a value for every column in the new row then it's permissible to omit the column list and just give the values, e.g. INSERT INTO EMPLOYEE VALUES ( 123, 'John', 'Smith', '1234', '05/24/2002', 600, 'VP', '2', 'USA', 100000); But it's crucial to specify the column values in the order they are defined. It is generally advisable to always specify the column list as it helps avoid problems. INSERT statements do not return a result set. Only SELECT statements return a result set. Copying data with INSERT … SELECT It is sometimes necessary to INSERT many records into a table at once. For example, we might want to create a new project for every employee who was hired today. Instead of specifying the VALUES for the new row as constants, like in the example above, we can embed a SELECT statement into the INSERT statement, and one row will be created for each row in the result set of the SELECT statement: INSERT INTO EMPLOYEE_PROJECT ( EMPLOYEE_PROJECT.PROJID, EMPLOYEE_PROJECT.EMP_NO) SELECT 'BORCON', EMPLOYEE.EMP_NO FROM EMPLOYEE WHERE EMPLOYEE.HIRE_DATE = CURRENT_DATE This is a somewhat complex statement, so let's look at it carefully. The first line states that we'll be inserting rows into the EMPLOYEE_PROJECT table. The second line specifies which fields we'll be specifying values for (later in the statement). It's not necessary to specify the table name in the fields list as I have done here, but I wanted to clearly distinguish the columns in the table we're inserting into from the columns coming from the table we're Selecting from, since some of them have the same names. The rest of the statement tells InterBase that the values will be coming from a SELECT statement rather than a list of constants. For each row in the result set, one row will be created in EMPLOYEE_PROJECT. CURRENT_DATE is a reserved word (in InterBase 6 and later) which always returns the current date from the server's clock, so the SELECT portion of the statement will return one row for each employee who was hired today. There is only one item in the fields list of the SELECT portion of the statement which is actually a field in the EMPLOYEE table (EMP_NO). The other item is a constant. If we ran the SELECT by itself, the result set might look like this: COLUMN_1 EMP_NO 1 21 1 22 So these are the values which will be assigned to the two rows which will be inserted into the EMPLOYEE_PROJECT table (presuming that there were two employees hired today). Note on the Select clause of the Insert…Select statement:This is an Insert statement, not a Select statement. Only single table or view selects are allowed in the Select clause, Interbase will not currently allow joins. UPDATE The UPDATE verb is used to change the value of existing rows. Use of a WHERE clause is essential with UPDATE, because, by default, UPDATE changes every row in the table! If we were going to pay everyone in the company the exact same salary, for example, we might run the following statement: UPDATE EMPLOYEE SET SALARY = 5000 Running a statement like this is much more efficient than iterating through the table and updating each record, one at a time, but for the most part we generally do want to change values in only one record or a number if specific records matching some criteria, so most UPDATE statements have WHERE clauses: UPDATE EMPLOYEE SET SALARY = 5000 WHERE EMP_NO = 20 Bad news for employee 20, but the rest of us can breathe easy. It would probably not be a good idea to write an UPDATE statement like this: UPDATE EMPLOYEE SET SALARY = 5000 WHERE FIRST_NAME = 'Bob' …because if there were more than one Bob in the company then we'd be giving a lousy salary to all of them, not just the lazy one. In general, use the primary key or another unique key of the table when writing a WHERE clause designed to affect only one record. It's common to write UPDATE statements which specify the value of every column in the table in the WHERE clause. Since the primary key will always uniquely identify a given row, why would anyone do this? The answer is that this is way of protecting against multiuser conflicts (transactions are another way, but they do it differently). Let's imagine that a user, Emma, creates a new record. Later, another user, Peter, updates the record. Emma returns to her computer, but she hasn't refreshed her display so she doesn't see Peter's changes to the record. Emma decides to change the record. If only the primary key was specified in the WHERE clause of the UPDATE statement, then Emma would write over Peter's changes without realizing it. By specifying all field values in the WHERE clause, and not just the primary key value, the UPDATE statement will have no effect (i.e, it will not change any rows) if the record has been changed since Emma last looked at it. The client application can detect that the statement had no effect and let Emma know that the row has changed (or been deleted) since she last SELECTed it. DELETE DELETE looks just like UPDATE except, for obvious reasons, there is no list of fields: DELETE FROM EMPLOYEE WHERE EMP_NO = 20 Like UPDATE, you can omit the WHERE clause if you want to affect (in this case, clear) the entire table: DELETE FROM EMPLOYEE Moving beyond simple SELECTs Subqueries A subquery is a SELECT inside of another SQL statement and is used to select values from another table to be used for qualifying or setting values in the outer SQL statement. Subqueries can also be used in other statements too but are most commonly used in Select statements: SELECT EMPLOYEE.EMP_NO, EMPLOYEE.LAST_NAME FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO IN (SELECT PROJECT.EMP_NO FROM PROJECT) This query will return the list of employees who have at least one project. Again, it's not strictly necessary to use the table names in the fields list (e.g., EMPLOYEE.EMP_NO), but it helps to clarify where the fields are coming from since EMPLOYEE and EMPLOYEE_PROJECT both have a field called EMP_NO. This query shows an example of using the IN operator with a subquery. We can't use the = comparison here, because the subquery returns many rows, and we can't use an equality comparison to compare a single value with a large number of rows. If we had tried to use = instead of IN, we'd receive the charming error message, "Multiple rows in singleton select." The = comparison could only be used if we were absolutely certain the subquery would return only one row. Subqueries can also be used in UPDATE statements. Let's imagine you want to set the project start date on every row in the EMPLOYEE_PROJECT table equal to the HIRE_DATE for the corresponding employee in the EMPLOYEE table. The following statement would do the trick: UPDATE EMPLOYEE_PROJECT SET EMPLOYEE_PROJECT.START_DATE = (SELECT EMPLOYEE.HIRE_DATE FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO = EMPLOYEE_PROJECT.EMP_NO ) (This query won't actually work since the EMPLOYEE_PROJECT doesn't have a field called START_DATE.) This kind of statement works fairly well for tables which are not especially large. For very large tables, use a stored procedure to do the update (see the Optimization section for details). Attempting this on a large table, especially if more than one subquery must be used to set field values, will perform very poorly because the subqueries must be executed once for each record to be updated. The following statement will delete any employees with no projects: DELETE FROM EMPLOYEE WHERE EMPLOYEE.EMP_NO NOT IN (SELECT EMPLOYEE_PROJECT.EMP_NO FROM EMPLOYEE_PROJECT ) JOINs Everything we have seen thus far is useful, but JOINs are the real power of a relational database server. JOINs allow the developer to combine data from multiple tables in to a single result set. It is common when designing databases to use a lot of separate tables for various attributes of a single entity. For example, you might store records of an employee's salary, residence, etc., in separate tables rather than expanding the EMPLOYEE table. There a lot of advantages to doing this (the employee might have more than one residence, for example), but without being able to combine the information it would be very difficult to create a mailing list. A JOIN is simply a way of telling the database server how to match rows from two different tables. You can then return fields from both tables in your result set. For example, let's imagine we want to return a result set with employee names and the names of the projects they're working on. We need to tell the database server to match EMPLOYEE_PROJECT records with records in the EMPLOYEE table. In particular, given an EMPLOYEE_PROJECT record, we need to find a record in the EMPLOYEE table with the same EMP_NO value. The following query JOINs the two tables together to produce the desired result: SELECT EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID FROM EMPLOYEE INNER JOIN EMPLOYEE_PROJECT ON EMPLOYEE_PROJECT.EMP_NO = EMPLOYEE.EMP_NO ORDER BY EMPLOYEE.LAST_NAME, EMPLOYEE_PROJ_ID This query uses a particular type of JOIN called an INNER JOIN. INNER JOIN means that one or more records with a given EMP_NO must be present in both the EMPLOYEE_PROJECT and EMPLOYEE tables in order to produce a row in the result set. In other words, if there was a row in the EMPLOYEE table for employee #10 but there was no row in the EMPLOYEE_PROJECT table for employee #10 then there would be no row in the result set for employee #10. If no JOIN type is specified in the query (if we write JOIN EMPLOYEE instead of INNER JOIN EMPLOYEE) then an INNER JOIN will be used. INNER JOIN is the default join type. The second most common type of JOIN is a LEFT OUTER JOIN. This type of JOIN returns at least one row for each row in the "left" table, whether or not there is a corresponding row in the "right" table. The terms "left" and "right" often seem confusing at first, but that's what the SQL Standard calls them. The terms are based on the fact that we (at least most of the western world) read from left to right, therefore just remember that left refers to the table mentioned first and right refers to the second table being joined to the first. In the example above, the EMPLOYEE table is the left table, and the EMPLOYEE_PROJECT table is the right table. As previously mentioned, the query above won't return any rows in its result set for employees without projects. What if we wanted to see these employees anyway? Use a LEFT OUTER JOIN instead of an INNER JOIN: SELECT EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID FROM EMPLOYEE LEFT OUTER JOIN EMPLOYEE_PROJECT ON EMPLOYEE_PROJECT.EMP_NO = EMPLOYEE.EMP_NO ORDER BY EMPLOYEE.LAST_NAME, EMPLOYEE_PROJECT.PROJ_ID In this result set, the column EMPLOYEE_PROJECT.PROJ_ID will have a NULL state on rows for employees who don't have any projects. This might seem confusing since there are no records in the EMPLOYEE_PROJECT table with the NULL state in the PROJ_ID field, but it simply indicates that there is no matching record. In other words, if a column is NULL in the result set of a query resulting from a JOIN, one of two things is going on: 1. There was a matching row in the appropriate table, and the state of the corresponding column for that row was NULL, or 2. There was no matching row in the appropriate table. In addition to INNER and LEFT OUTER joins, there are also RIGHT OUTER and FULL OUTER joins (OUTER is optional and can be omitted). A RIGHT join, as it implies, is opposite of a LEFT join - all records are selected from the table on the right, and only matching records are selected from the table on the left. A FULL join is very rarely ever desired, it will select all records from both tables whether or not there is a match in the other. Using aliases for field and table names As we get used to the idea of having multiple tables in a SELECT statement, specifying the table names can get to be a lot of typing. SQL allows us to specify an alias for table names: SELECT E.LAST_NAME, EP.PROJ_ID FROM EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE_PROJECT AS EP ON EP.EMP_NO = E.EMP_NO ORDER BY E.LAST_NAME, EP.NAME The the alias for a table is defined by specifying it following the table name in the FROM clause. SQL allows you to use the alias before it's defined. Fields can also be given aliases. As with table aliases, they are defined directly following the column name in the field list: SELECT E.LAST_NAME AS SURNAME, EP.PROJ_ID AS PROJECT FROM EMPLOYEE E LEFT OUTER JOIN EMPLOYEE_PROJECT EP ON EP.EMP_NO = E.EMP_NO ORDER BY E.LAST_NAME, EP.NAME The "AS" operator is optional and can be omitted in Interbase. Field aliases can only be used for specifying the name to be returned in the result set, a field alias name cannot be used to reference the field elsewhere in the query (such as in an Order By clause). Aggregate Functions The MAX, MIN, COUNT, SUM, and AVG (average) functions do exactly what their names imply. They are called "aggregate functions" because they work on the entire result set at once instead of a single row. COUNT deserves special mention because it works in two different ways. COUNT(*) will count the number of rows in the table, after the WHERE clause is applied, JOINs are performed, etc. COUNT(FIELD_NAME) counts the number of non-NULL values in a particular column. COUNT(DISTINCT FIELD_NAME), of course, counts the number of distinct values there are in a particular column. For example, if you had a table with four rows in it, containing the following data: ID SOME_VALUE 1 2 1 3 2 4 2 Then the following query: SELECT COUNT(*), COUNT(SOME_VALUE), COUNT(DISTINCT SOME_VALUE), MAX(ID) Would return this result set: COUNT COUNT_2 COUNT_3 MAX 4 3 2 4 Grouping Since aggregate functions operate on all rows of the table at once and produce a single row as a result set, they can't be combined with "normal" selection of fields without grouping the results somehow. In other words, you can't SELECT employee names and the count of employees in the same statement, because the employee names selection produces many rows and the count of employees produces only a single row. You might, however, want to know how many employees have the same name. The GROUP BY clause allows you to tell the aggregate function that it should run once for each distinct value of a certain field or group of fields in the table. SELECT LAST_NAME, COUNT(LAST_NAME) FROM EMPLOYEE GROUP BY LAST_NAME ORDER BY LAST_NAME The GROUP BY clause must contain all columns mentioned in the fields list except for the aggregat functions. So in this example, LAST_NAME must be in the Group By clause. This query tells us what we want — for each name, how many people share the name. But the value for most names will be "1" and these ones would likely be uninteresting to the user of this query. So we need a way to eliminate the rows with unique names. HAVING The Having clause qualifies the Group By clause. After the Group By has done its work, any conditions specified in the Having clause are applied. So we can improve our query as follows: SELECT LAST_NAME, COUNT(LAST_NAME) FROM EMPLOYEE GROUP BY LAST_NAME HAVING COUNT(LAST_NAME) > 1
ORDER BY
LAST_NAME
Now we'll only see Last Names where at least two people share that name. This is accomplished by repeating the Count aggregate funtion on Last_Name.
DISTINCT
Distinct allows you to select unique values, or combinations of values, when there are duplicates. For example, you may want to see all the different cities your customers are from. If you execute the following:
SELECT
CITY
FROM CUSTOMER
ORDER BY
CITY
You will indeed see all the different cities, but each city will be repeated as many times as there are customers from that city. What is really wanted is each unique city:
SELECT DISTINCT
CITY
FROM CUSTOMER
ORDER BY
CITY
Cities are not necessarily unique across different provinces or states, so if your customers are international then the above won't be good enough. Distinct works on multiple columns:
SELECT DISTINCT
CITY, STATE_PROVINCE, COUNTRY
FROM CUSTOMER
ORDER BY
CITY, STATE_PROVINCE, COUNTRY
Note the Order By is not required to include the same columns (or be present at all) but it is very common to do so.
CAST, EXTRACT, and other functions
Cast allows you to coerce one datatype into another. The conversion being asked for must make sense or Interbase will raise an exception. All the valid conversions can be found in the Interbase documentation. A simple example will give the idea:
SELECT
CAST(QTY_ORDERED AS FLOAT) / 0.35
FROM SALES
Cast is not needed very often as Interbase will generally perform an implicit conversion for you. The above statement would give the same results without the cast.
Extract is available starting with Version 6.0. Extract allows parts if date and time fields to be extracted, for example:
SELECT
LAST_NAME, FIRST_NAME, EXTRACT(YEAR FROM HIRE_DATE)
FROM EMPLOYEE
The parts that can be extracted are: YEAR, MONTH, DAY, WEEKDAY, HOUR, MINUTE, and SECOND.
ALL, ANY, SOME, EXISTS, SINGULAR
These operators are used with sub-queries. Earlier you learned about the IN operator which lets you test whether a selected value exists within a set of specified or sub-selected values - it tests for equality.
ALL allows you to perform comparisons other than equality. For example, if you want to find those employees with a salary greater than the salary of all employees in your department, you might try:
SELECT
LAST_NAME, SALARY
FROM EMPLOYEE
WHERE
SALARY >ALL
(SELECT
SALARY
FROM EMPLOYEE
WHERE
DEPT_NO = 623
)
ANY and SOME are synonyms, use whichever you like. To find employees in your department with salaries less than the lowest in any other department, you could find them with:
SELECT
LAST_NAME, SALARY
FROM EMPLOYEE
WHERE
SALARY < ANY (SELECT SALARY FROM EMPLOYEE WHERE DEPT_NO <> 623
)
EXISTS is very valuable. Use it anywhere you need to know something exists but do not need to know the count.
If you want to see a list of Projects where no employees are currently assigned you could do this:
SELECT
PROJ_ID, PROJ_NAME
FROM PROJECT P
WHERE
NOT EXISTS
(SELECT
EMP_NO
FROM EMPLOYEE_PROJECT EP
WHERE
EP.PROJ_ID = P.PROJ_ID
)
SINGULAR returns true if exactly one row from the sub-select matches your test. The following will return projects where there is exactly one employee assigned.
SELECT
PROJ_ID, PROJ_NAME
FROM PROJECT P
WHERE
SINGULAR
(SELECT
EMP_NO
FROM EMPLOYEE_PROJECT EP
WHERE
EP.PROJ_ID = P.PROJ_ID
)
InterBase-specific SQL
STARTING WITH
Use this instead of LIKE wherever possible. This is most important for parameterized queries as a parameterized LIKE cannot use an index while STARTING WITH can.
CONTAINING
CONTAINING is similar to using LIKE with wildcards on both sides, e.g. LIKE '%BASE%'. However CONTAINING is case-insensitive and can also be used to search within text blob columns.
ROWS
Rows is a new feature starting with version 6.5. Rows is very flexible and provides a great deal of power. At its simplest, it less you select the n rows from a query.
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 1 TO 10
This query returns only the first 10 records of the result set (You can also specify just ROWS 10). Strictly speaking, Rows can be used without the Order By, but this woud be of little value. In this example, the 10 employees with the highest salaries will be returned. It's possible you may want all the employees with the 10 highest salaries (more than one employee may have the same salary), in this case you can use the WITH TIES option:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 1 TO 10 WITH TIES
So if there were 17 employees sharing the top 10 salaries, all 17 would be returned.
Rows can also be specified as a percent:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 10 PERCENT
Finally, Rows can be used for sampling. The following example returns every 20th record from the result set up to a maximum of 100 records:
SELECT
LAST_NAME,
FIRST_NAME, SALARY
FROM EMPLOYEE
ORDER BY
SALARY DESC, LAST_NAME,FIRST_NAME
ROWS 100 BY 20
Probably one of the most sought after uses of this feature is to allow for very efficient paging of result sets as commonly seen on many web sites; some number of records, typically 10, are displayed with Next and Previous buttons. Rows allows you to return any slice of a result set
The Rows feature can also be used in Update and Delete statements. Normally, the Order By clause is only valid in a Select statement. However, it becomes valid for other statements as long as it is used along with the Rows feature (otherwise using rows in these statements would be of little value). For example, the following gives a raise to the top 10 salaried employees hired before 2002:
UPDATE EMPLOYEE
SET
SALARY = SALARY * 1.10
WHERE
EXTRACT(YEAR FROM HIRE_DATE) < 2002 ORDER BY SALARY DESC ROWS 10 The following deletes the 10 employees with the newest hire date: DELETE FROM EMPLOYEE ORDER BY HIRE_DATE DESC ROWS 10 RDB$DATABASE This is a system table that is guaranteed to only have a single record in it. This makes a few special queries possible such as: Asking Interbase to return the current date and time from the server SELECT CURRENT_TIMESTAMP FROM RDB$DATABASE /* 6.0+ */ SELECT CAST('TODAY' AS DATE) FROM RDB$DATABASE /* 5.6- */ Asking Interbase to return the currently logged-in user SELECT USER FROM RDB$DATABASE There are other situations where such a table is needed, for example to simply test for the existence of a specific record - the EXISTS function can only be used with a sub-query: SELECT 1 FROM RDB$DATABASE WHERE EXISTS ( SELECT * FROM EMPLOYEE_PROJECT WHERE EP.PROJ_ID = 123 ) Generators Generators are a way to create unique values in a multi-user database. In a single-user database it is acceptable to do things such as manually incrementing a value from a "key" table, or adding one to the highest current value in a table, but these schemes will not do in a multi-user environment. Most RDBMS products offer something similar, MS SQL has Identity fields and Oracle has Sequences. The standard way to use a generator from in a query is like this: SELECT GEN_ID(GEN_CUSTOMER, 1) FROM RDB$DATABASE Generators can also be used in insert and update statements. It is also common to see them as part of insert triggers. Taken from : http://blogs.teamb.com/craigstuntz/dtpostname/ibsqlintro/
0 comments:
Post a Comment