Always delete by primary key
TL;DR: Deleting records by primary key mitigates the risk of write-lock on the table and full-table scans.
When working on a task to delete multiple records from a populated table based on a value of a column that is not a primary key, it is tempting to write batch flow like this (in pseudo-code):
$rows_deleted = null;
while ( $rows_deleted === null || $rows_deleted > 0 ) {
$rows_deleted = $db->query( 'DELETE FROM big_table WHERE foo="bar" LIMIT 100' );
sleep( 10 );
}
However, there is a hidden problem with this code. Once it deletes all the records it will issue one last DELETE
query which will match zero records. Such call will result in full table scan during which table will be blocked for writing. The writing block happens because database engine doesn’t know yet which records it will have to delete, thus, it can’t let new ones in until it completes the query.
This locking situation is best avoided by using primary keys in the condition for DELETE
query. With primary key there is no need to lock a table, as a record is clearly identified. Consider code like this:
$record_keys = null;
while ( $record_keys === null || $record_keys !== [] ) {
$record_keys = $db->get_column( 'SELECT primary_key FROM big_table WHERE foo="bar" LIMIT 100', 0 );
if ( $record_keys === [] ) {
break;
}
foreach ( $record_keys as $primary_key ) {
$db->query( "DELETE FROM big_table WHERE primary_key=${primary_key}" );
}
// Sleep condition is barely necessary now, but a good precaution.
sleep( 1 );
}