问题
After reading this article https://www.sqlite.org/rtree.html about the R*Tree in SQLite, i am currently experimenting with a 2-Dim R-Tree in a Core Data model. Particularly i expected (maybe somewhat naively) some kind of select
statement on the index table but i did not see any in the SQLite debug trace when executing a fetch statement on the Region
entity with indexed attributes (see predicateBoundaryIdx
in the code below).
My questions are: how must a Core Data model (entities, attributes) and the NSPredicate look like in order to benefit from the R-Tree index?
[XCode v11.4, iOS v13.1, Swift. Switched on com.apple.CoreData.SQLDebug 4]
Model
Index
Corresponding database scheme
CREATE TABLE ZPERSON ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLOCATION INTEGER, Z1CONTACTS INTEGER, ZNAME VARCHAR );
CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZMAXLATITUDE FLOAT, ZMAXLATITUDEIDX FLOAT, ZMAXLONGITUDE FLOAT, ZMAXLONGITUDEIDX FLOAT, ZMINLATITUDE FLOAT, ZMINLATITUDEIDX FLOAT, ZMINLONGITUDE FLOAT, ZMINLONGITUDEIDX FLOAT, ZNAME VARCHAR );
CREATE INDEX ZPERSON_ZLOCATION_INDEX ON ZPERSON (ZLOCATION);
CREATE INDEX ZPERSON_Z1CONTACTS_INDEX ON ZPERSON (Z1CONTACTS);
CREATE VIRTUAL TABLE Z_Region_RegionIndex USING RTREE (Z_PK INTEGER PRIMARY KEY, ZMINLATITUDEIDX_MIN, ZMINLATITUDEIDX_MAX, ZMAXLATITUDEIDX_MIN, ZMAXLATITUDEIDX_MAX, ZMINLONGITUDEIDX_MIN, ZMINLONGITUDEIDX_MAX, ZMAXLONGITUDEIDX_MIN, ZMAXLONGITUDEIDX_MAX)
/* Z_Region_RegionIndex(Z_PK,ZMINLATITUDEIDX_MIN,ZMINLATITUDEIDX_MAX,ZMAXLATITUDEIDX_MIN,ZMAXLATITUDEIDX_MAX,ZMINLONGITUDEIDX_MIN,ZMINLONGITUDEIDX_MAX,ZMAXLONGITUDEIDX_MIN,ZMAXLONGITUDEIDX_MAX) */;
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_rowid"(rowid INTEGER PRIMARY KEY,nodeno);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_node"(nodeno INTEGER PRIMARY KEY,data);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_parent"(nodeno INTEGER PRIMARY KEY,parentnode);
Code for testing
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
let mainContext: NSManagedObjectContext
mainContext = persistentContainer.viewContext
mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
mainContext.undoManager = nil
mainContext.shouldDeleteInaccessibleFaults = true
mainContext.automaticallyMergesChangesFromParent = true
var personObj: Person
var locationObj: Region
let n = 1000000
let personNr = stride(from: 1, through: n+1, by: 1).map(String.init).shuffled()
for i in 1...n
{
personObj = Person(context: mainContext)
locationObj = Region(context: mainContext)
locationObj.name = "Region \(i)"
locationObj.minlatitude = 40.000000 - Float.random(in: 0 ..< 5)
locationObj.minlongitude = 9.000000 - Float.random(in: 0 ..< 5)
locationObj.maxlatitude = 40.000000 + Float.random(in: 0 ..< 5)
locationObj.maxlongitude = 9.000000 + Float.random(in: 0 ..< 5)
locationObj.minlatitudeidx = locationObj.minlatitude
locationObj.minlongitudeidx = locationObj.minlongitude
locationObj.maxlatitudeidx = locationObj.maxlatitude
locationObj.maxlongitudeidx = locationObj.maxlongitude
personObj.name = "Person \(personNr[i])"
personObj.location = locationObj
if i % 1000 == 0 {
saveContext()
}
}
saveContext()
let request: NSFetchRequest<Region> = Region.fetchRequest()
let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()
let eps : Float = 1.0
let predicateBoundaryIdx = NSPredicate(format: "(minlatitudeidx >= %lf and maxlatitudeidx =< %lf) and (minlongitudeidx >= %lf and maxlongitudeidx =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
let predicateBoundary = NSPredicate(format: "(minlatitude >= %lf and maxlatitude =< %lf) and (minlongitude >= %lf and maxlongitude =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
requestIdx.predicate = predicateBoundaryIdx;
request.predicate = predicateBoundary;
print("fetch index:")
do {
let result = try mainContext.count(for:requestIdx)
print("Count = \(result)")
} catch {
print("Error: \(error)")
}
print("fetch no index:")
do {
let result = try mainContext.count(for:request)
print("Count = \(result)")
} catch {
print("Error: \(error)")
}
for store in (persistentContainer.persistentStoreCoordinator.persistentStores) {
os_log("Store URL: %@", log: Debug.coredata_log, type: .info, store.url?.absoluteString ?? "No Store")
}
return true
}
Core Data SQL Trace
CoreData: sql: SELECT COUNT( DISTINCT t0.Z_PK) FROM ZREGION t0 WHERE ( t0.ZMINLATITUDEIDX >= ? AND t0.ZMAXLATITUDEIDX <= ? AND t0.ZMINLONGITUDEIDX >= ? AND t0.ZMAXLONGITUDEIDX <= ?)
回答1:
CoreData support for R-Tree indexes was introduced in 2017. WWDC 2017 session 210 covers it and provides an example. As you will see, the key is that you need to use a function in the predicate format string to indicate that the index should be used. There's another example in WWDC 2018 session 224.
Take a slightly simpler variation of your example: an entity with location (latitude
and longitude
) attributes and a name
attribute:
Add a Fetch Index named "bylocation", specify its type as "R-Tree" and add Fetch Index Elements for latitude
and longitude
:
Modify your code slightly, to reflect the different attributes etc. Prepare two separate predicates, one using the index, the other without, and run them both to compare:
let mainContext: NSManagedObjectContext
mainContext = persistentContainer.viewContext
mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
mainContext.undoManager = nil
mainContext.shouldDeleteInaccessibleFaults = true
mainContext.automaticallyMergesChangesFromParent = true
var locationObj: Region
let n = 10 // Just for demo purposes
for i in 1...n
{
locationObj = Region(context: mainContext)
locationObj.name = "Region \(i)"
locationObj.latitude = 40.000000 + 5.0 - Float.random(in: 0 ..< 10)
locationObj.longitude = 9.000000 + 5.0 - Float.random(in: 0 ..< 10)
if i % 1000 == 0 {
saveContext()
}
}
saveContext()
mainContext.reset()
let request: NSFetchRequest<Region> = Region.fetchRequest()
let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()
let eps : Float = 1.0
let predicateBoundaryIdx = NSPredicate(format: "indexed:by:(latitude, 'bylocation') between { %lf, %lf } AND indexed:by:(longitude, 'bylocation') between { %lf, %lf }", 40.0-eps, 40.0+eps, 9.0-eps, 9.0+eps)
let predicateBoundary = NSPredicate(format: "latitude between { %lf, %lf } AND longitude between { %lf, %lf} ",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
requestIdx.predicate = predicateBoundaryIdx;
request.predicate = predicateBoundary;
print("fetch index:")
do {
let result = try mainContext.fetch(requestIdx)
print("Count = \(result.count)")
} catch {
print("Error: \(error)")
}
mainContext.reset()
print("fetch no index:")
do {
let result = try mainContext.fetch(request)
print("Count = \(result.count)")
} catch {
print("Error: \(error)")
}
Run that with SQLDebug = 4, and you can then see a bit of what's going on in the logs. First, the database is created and the Region table is added, followed by the RTree index. Triggers are created to add the relevant data to the index whenever the Region table is amended:
CoreData: sql: CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLATITUDE FLOAT, ZLONGITUDE FLOAT, ZNAME VARCHAR )
CoreData: sql: CREATE VIRTUAL TABLE IF NOT EXISTS Z_Region_bylocation USING RTREE (Z_PK INTEGER PRIMARY KEY, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX)
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_INSERT AFTER INSERT ON ZREGION FOR EACH ROW BEGIN INSERT OR REPLACE INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_UPDATE AFTER UPDATE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = NEW.Z_PK ; INSERT INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_DELETE AFTER DELETE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = OLD.Z_PK ; END
Then when it comes to the fetches, you can see the two different queries being sent to SQLite:
With the index:
CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE ( t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLATITUDE_MIN AND n1_t0.ZLATITUDE_MAX <= ?)) AND t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLONGITUDE_MIN AND n1_t0.ZLONGITUDE_MAX <= ?)))
and the logs even include the query plan used by SQLite:
2 0 0 SEARCH TABLE ZREGION AS t0 USING INTEGER PRIMARY KEY (rowid=?)
6 0 0 LIST SUBQUERY 1
8 6 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D0B1
26 0 0 LIST SUBQUERY 2
28 26 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D2B3
Without the index:
CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE (( t0.ZLATITUDE BETWEEN ? AND ?) AND ( t0.ZLONGITUDE BETWEEN ? AND ?))
2 0 0 SCAN TABLE ZREGION AS t0
What you can see from this is that using the index involves some pretty messy subselects. I found the result was that for small datasets, the index actually slows things down. Likewise if the result set is large. But if the dataset is large and the result set is small, there is an advantage. I leave it to you to play and work out whether the game is worth the candle. One thing I can't quite fathom is that using the index requires two separate subselects, one for the longitude and one for the latitude. That seems to me (though maybe I'm missing something) to undermine the whole point of R-Trees, namely their multidimensionality.
回答2:
I've slightly modified the database from the OP for testing the (recently learned) indexed:by:
statement and for doing some time measurements:
Database:
Index:
Use Case:
Count people who visited a region.
Here for Region R42 the result should be 2 (Person 1 and 3):
Code:
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
let mainContext: NSManagedObjectContext
mainContext = persistentContainer.viewContext
mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
mainContext.undoManager = nil
mainContext.shouldDeleteInaccessibleFaults = true
mainContext.automaticallyMergesChangesFromParent = true
var bounds: Bounds
var location: Bounds
var person: Person
var region: Region
let longstep = 2
let latstep = 2
let minlong = 0
let maxlong = 20
let minlat = 20
let maxlat = 55
let createSomeData: Bool = false
if(createSomeData) {
// create some regions
var hotsptLvl : Dictionary<String,Int> = [:]
var regionNr: Int = 0
for long in stride(from: minlong, to: maxlong, by: longstep)
{
for lat in stride(from: minlat, to: maxlat, by: latstep) {
regionNr += 1
region = Region(context: mainContext)
bounds = Bounds(context: mainContext)
bounds.minlongitude = Float(long)
bounds.maxlongitude = Float(min(long + longstep,maxlong))
bounds.minlatitude = Float(lat)
bounds.maxlatitude = Float(min(lat + latstep,maxlat))
region.bounds = bounds
region.name = "Region \(regionNr)"
// hotsptLvl["Region \(regionNr)"] = Int.random(in: 0 ... 100)
print("region.name = \(String(describing: region.name))")
if regionNr % 1000 == 0 {
saveContext()
}
}
}
saveContext()
// create persons and vistited locations
var k = 0
let n = 100000
let personNr = stride(from: 1, through: n+1, by: 1).map(String.init).shuffled()
for i in 1...n
{
person = Person(context: mainContext)
person.name = "Person \(personNr[i])"
let isInfected = Float.random(in: 0 ..< 1000)
person.infected = isInfected < 1 ? true : false
// create locations
let m = 10
for _ in 1...m
{
k += 1
location = Bounds(context: mainContext)
location.minlatitude = Float.random(in: Float(minlat + 3 * latstep) ... Float(maxlat)) - Float.random(in: 0 ... Float(3 * latstep))
location.minlongitude = Float.random(in: Float(minlong + 3 * longstep) ... Float(maxlong)) - Float.random(in: 0 ... Float(3 * longstep))
location.maxlatitude = min(location.minlatitude + Float.random(in: 0 ... Float(3 * latstep)),Float(maxlat))
location.maxlongitude = min(location.minlongitude + Float.random(in: 0 ... Float(3 * longstep)),Float(maxlong))
person.addToLocations(location)
if k % 1000 == 0 {
saveContext()
}
}
}
saveContext()
}
let start = Date()
for regionName in ["Region 1","Region 13","Region 43","Region 101","Region 113","Region 145"] {
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Region: \(regionName)")
let requestOnRegion: NSFetchRequest<Region> = Region.fetchRequest()
let someRegion = NSPredicate(format: "(name = %@)",regionName)
requestOnRegion.predicate = someRegion
do {
let regionResA : [Region] = try mainContext.fetch(requestOnRegion) as [Region]
let regionRes : Region = regionResA[0]
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Region: L1 = (\(regionRes.bounds!.minlongitude),\(regionRes.bounds!.minlatitude)) R1 = (\(regionRes.bounds!.maxlongitude),\(regionRes.bounds!.maxlatitude))")
let someBounds1 = NSPredicate(format: "(region = nil) && (minlongitude <= %lf && maxlongitude >= %lf && minlatitude <= %lf && maxlatitude >= %lf)",
regionRes.bounds!.maxlongitude,
regionRes.bounds!.minlongitude,
regionRes.bounds!.maxlatitude,
regionRes.bounds!.minlatitude)
let someBounds2 = NSPredicate(format: "(region = nil) && (indexed:by:(minlongitude, 'BoundsIndex') between { %lf, %lf } && " +
"indexed:by:(maxlongitude, 'BoundsIndex') between { %lf, %lf } && " +
"indexed:by:(minlatitude, 'BoundsIndex') between { %lf, %lf } && " +
"indexed:by:(maxlatitude, 'BoundsIndex') between { %lf, %lf} )",
Float(minlong),
regionRes.bounds!.maxlongitude,
regionRes.bounds!.minlongitude,
Float(maxlong),
Float(minlat),
regionRes.bounds!.maxlatitude,
regionRes.bounds!.minlatitude,
Float(maxlat))
let requestOnBounds: NSFetchRequest<NSDictionary> = NSFetchRequest<NSDictionary>(entityName:"Bounds")
requestOnBounds.resultType = NSFetchRequestResultType.dictionaryResultType
requestOnBounds.propertiesToFetch = ["person.name"]
requestOnBounds.returnsDistinctResults = true
requestOnBounds.predicate = someBounds1
print("\n")
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Start - Fetch (no index):")
var boundsRes = try mainContext.fetch(requestOnBounds)
var uniquePersons : [String] = boundsRes.compactMap { $0.value(forKey: "person.name") as? String };
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Number of Persons in this Region: \(uniquePersons.count)")
print("\n")
requestOnBounds.predicate = someBounds2
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Start - Fetch (with index):")
boundsRes = try mainContext.fetch(requestOnBounds)
uniquePersons = boundsRes.compactMap { $0.value(forKey: "person.name") as? String };
print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Number of Persons in this Region: \(uniquePersons.count)")
print("\n")
} catch {
print("Error: \(error)")
}
}
for store in (persistentContainer.persistentStoreCoordinator.persistentStores) {
os_log("Store URL: %@", log: Debug.coredata_log, type: .info, store.url?.absoluteString ?? "No Store")
}
return true
}
Output:
Leading number is time in seconds.
0 Region: Region 1
0 Region: L1 = (0.0,20.0) R1 = (2.0,22.0)
0 Start - Fetch (no index):
2 Number of Persons in this Region: 267
2 Start - Fetch (with index):
10 Number of Persons in this Region: 267
10 Region: Region 13
10 Region: L1 = (0.0,44.0) R1 = (2.0,46.0)
10 Start - Fetch (no index):
11 Number of Persons in this Region: 4049
11 Start - Fetch (with index):
13 Number of Persons in this Region: 4049
13 Region: Region 43
13 Region: L1 = (4.0,32.0) R1 = (6.0,34.0)
13 Start - Fetch (no index):
14 Number of Persons in this Region: 28798
14 Start - Fetch (with index):
17 Number of Persons in this Region: 28798
17 Region: Region 101
17 Region: L1 = (10.0,40.0) R1 = (12.0,42.0)
17 Start - Fetch (no index):
18 Number of Persons in this Region: 46753
18 Start - Fetch (with index):
22 Number of Persons in this Region: 46753
22 Region: Region 113
22 Region: L1 = (12.0,28.0) R1 = (14.0,30.0)
22 Start - Fetch (no index):
22 Number of Persons in this Region: 45312
22 Start - Fetch (with index):
28 Number of Persons in this Region: 45312
28 Region: Region 145
28 Region: L1 = (16.0,20.0) R1 = (18.0,22.0)
28 Start - Fetch (no index):
28 Number of Persons in this Region: 3023
28 Start - Fetch (with index):
34 Number of Persons in this Region: 3023
Result:
indexed:by:
causes Core Date to use the R*Tree index.- Using R*Tree was really disadvantageous for query execution time.
Open question:
What type of query and Core Data model does take advantage of a R*Tree index?
来源:https://stackoverflow.com/questions/61627719/what-kind-of-queries-in-core-data-can-profit-from-r-tree-index-on-attributes