Two Advanced SQL Techniques That Can Drastically Improve Your Queries | by Jason Chong | Jun, 2023


If you are not interested to learn how I prepared the dataset, feel free to skip past this section and paste the following code onto DB fiddle to generate the schema.

CREATE TABLE taxi (
id varchar,
vendor_id integer,
pickup_datetime datetime,
dropoff_datetime datetime,
trip_seconds integer,
distance float
);

INSERT INTO taxi
VALUES
('id2875421', 2, '2016-03-14 17:24:55', '2016-03-14 17:32:30', 455, 0.93),
('id2377394', 1, '2016-06-12 00:43:35', '2016-06-12 00:54:38', 663, 1.12),
('id3858529', 2, '2016-01-19 11:35:24', '2016-01-19 12:10:48', 2124, 3.97),
('id3504673', 2, '2016-04-06 19:32:31', '2016-04-06 19:39:40', 429, 0.92),
('id2181028', 2, '2016-03-26 13:30:55', '2016-03-26 13:38:10', 435, 0.74),
('id0801584', 2, '2016-01-30 22:01:40', '2016-01-30 22:09:03', 443, 0.68),
('id1813257', 1, '2016-06-17 22:34:59', '2016-06-17 22:40:40', 341, 0.82),
('id1324603', 2, '2016-05-21 07:54:58', '2016-05-21 08:20:49', 1551, 3.55),
('id1301050', 1, '2016-05-27 23:12:23', '2016-05-27 23:16:38', 255, 0.82),
('id0012891', 2, '2016-03-10 21:45:01', '2016-03-10 22:05:26', 1225, 3.19),
('id1436371', 2, '2016-05-10 22:08:41', '2016-05-10 22:29:55', 1274, 2.37),
('id1299289', 2, '2016-05-15 11:16:11', '2016-05-15 11:34:59', 1128, 2.35),
('id1187965', 2, '2016-02-19 09:52:46', '2016-02-19 10:11:20', 1114, 1.16),
('id0799785', 2, '2016-06-01 20:58:29', '2016-06-01 21:02:49', 260, 0.62),
('id2900608', 2, '2016-05-27 00:43:36', '2016-05-27 01:07:10', 1414, 3.97),
('id3319787', 1, '2016-05-16 15:29:02', '2016-05-16 15:32:33', 211, 0.41),
('id3379579', 2, '2016-04-11 17:29:50', '2016-04-11 18:08:26', 2316, 2.13),
('id1154431', 1, '2016-04-14 08:48:26', '2016-04-14 09:00:37', 731, 1.58),
('id3552682', 1, '2016-06-27 09:55:13', '2016-06-27 10:17:10', 1317, 2.86),
('id3390316', 2, '2016-06-05 13:47:23', '2016-06-05 13:51:34', 251, 0.81),
('id2070428', 1, '2016-02-28 02:23:02', '2016-02-28 02:31:08', 486, 1.56),
('id0809232', 2, '2016-04-01 12:12:25', '2016-04-01 12:23:17', 652, 1.07),
('id2352683', 1, '2016-04-09 03:34:27', '2016-04-09 03:41:30', 423, 1.29),
('id1603037', 1, '2016-06-25 10:36:26', '2016-06-25 10:55:49', 1163, 3.03),
('id3321406', 2, '2016-06-03 08:15:05', '2016-06-03 08:56:30', 2485, 12.82),
('id0129640', 2, '2016-02-14 13:27:56', '2016-02-14 13:49:19', 1283, 2.84),
('id3587298', 1, '2016-02-27 21:56:01', '2016-02-27 22:14:51', 1130, 3.77),
('id2104175', 1, '2016-06-20 23:07:16', '2016-06-20 23:18:50', 694, 2.33),
('id3973319', 2, '2016-06-13 21:57:27', '2016-06-13 22:12:19', 892, 1.57),
('id1410897', 1, '2016-03-23 14:10:39', '2016-03-23 14:49:30', 2331, 6.18);

After running SELECT * from taxi, you should get a resulting table that looks like this.

Image by Author.

For the keen beans who are wondering how this table actually came about, I filtered the data to the first 30 rows and only kept the columns that you see above. As for the distance field, I computed the orthodromic distance between the pick-up and drop-off coordinates (latitude and longitude).

The orthodromic distance is the shortest distance between two points on a sphere, so this actually turns out to be an underestimate of the real distance travelled by the taxi. However, for the purpose of what we are doing today, we can ignore this for now.

The formula to calculate the orthodromic distance can be found here. Now, back to SQL.



Source link

Leave a Comment