问题
I was just migrating a module from the old java dates to the new java.time API, and noticed a huge drop in performance. It boiled down to parsing of dates with timezone (I parse millions of them at a time).
Parsing of date string without a time zone (yyyy/MM/dd HH:mm:ss
) is fast - about 2 times faster than with the old java date, about 1.5M operations per second on my PC.
However, when the pattern contains a time zone (yyyy/MM/dd HH:mm:ss z
), the performance drops about 15 times with the new java.time
API, while with the old API it is about as fast as without a time zone. See the performance benchmark below.
Does anyone have an idea if I can somehow parse these strings quickly using the new java.time
API? At the moment, as a workaround, I am using the old API for parsing and then convert the Date
to Instant, which is not particularly nice.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(1)
@Fork(1)
@Warmup(iterations = 3)
@Measurement(iterations = 5)
@State(Scope.Thread)
public class DateParsingBenchmark {
private final int iterations = 100000;
@Benchmark
public void oldFormat_noZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {
SimpleDateFormat simpleDateFormat =
new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
for(int i=0; i<iterations; i++) {
bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12"));
}
}
@Benchmark
public void oldFormat_withZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {
SimpleDateFormat simpleDateFormat =
new SimpleDateFormat("yyyy/MM/dd HH:mm:ss z");
for(int i=0; i<iterations; i++) {
bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12 CET"));
}
}
@Benchmark
public void newFormat_noZone(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("yyyy/MM/dd HH:mm:ss").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12"));
}
}
@Benchmark
public void newFormat_withZone(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("yyyy/MM/dd HH:mm:ss z").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12 CET"));
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(DateParsingBenchmark.class.getSimpleName()).build();
new Runner(opt).run();
}
}
And the results for 100K operations:
Benchmark Mode Cnt Score Error Units
DateParsingBenchmark.newFormat_noZone avgt 5 61.165 ± 11.173 ms/op
DateParsingBenchmark.newFormat_withZone avgt 5 1662.370 ± 191.013 ms/op
DateParsingBenchmark.oldFormat_noZone avgt 5 93.317 ± 29.307 ms/op
DateParsingBenchmark.oldFormat_withZone avgt 5 107.247 ± 24.322 ms/op
UPDATE:
I just did some profiling of the java.time classes, and indeed, the time zone parser seems to be implemented quite inefficiently. Just parsing a standalone timezone is responsible for all the slowness.
@Benchmark
public void newFormat_zoneOnly(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("z").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("CET"));
}
}
There is a class called ZoneTextPrinterParser
in the java.time
bundle, which is internally making a copy of the set of all available time zones in every parse()
call (via ZoneRulesProvider.getAvailableZoneIds()
), and this is accountable for 99% of the time spent in the zone parsing.
Well, an answer then might be to write my own zone parser, which would not be too nice either, because then I could not build the DateTimeFormatter
via appendPattern()
.
回答1:
As noted in your question and in my comment, ZoneRulesProvider.getAvailableZoneIds()
creates a new set of all the available time zones' string representation (the keys of the static final ConcurrentMap<String, ZoneRulesProvider> ZONES
) each time a time zone needs to be parsed.1
Fortunately, a ZoneRulesProvider
is an abstract
class which is designed to be subclassed. The method protected abstract Set<String> provideZoneIds()
is responsible for populating ZONES
. Thus, a subclass can provide only the needed time zones if it knows ahead of time of all time zones to be used. Since the class will provide less entries than the default provider, which contains hundreds of entries, it has the potential to significantly reduce the invocation time of getAvailableZoneIds()
.
The ZoneRulesProvider API provides instructions on how to register one. Note that providers can't be deregistered, only supplemented, so it is not a simple matter of removing the default provider and adding your own. The system property java.time.zone.DefaultZoneRulesProvider
defines the default provider. If it returns null
(via System.getProperty("..."
) then the JVM's notorious provider is loaded. Using System.setProperty("...", "fully-qualified name of a concrete ZoneRulesProvider class")
one can supply their own provider, which is the one discussed in the 2nd paragraph.
To conclude, I suggest:
- Subclass the
abstract class ZoneRulesProvider
- Implements the
protected abstract Set<String> provideZoneIds()
with only the needed time zones. - Set the system property to this class.
I did not do it myself, but I am sure it will fail for some reason think it will work.
1 It is suggested in the comments of the question that the exact nature of the invocation might have changed between 1.8 versions.
Edit: more information found
The aforementioned default ZoneRulesProvider
is final class TzdbZoneRulesProvider
located in java.time.zone
. The regions in that class are read from the path: JAVA_HOME/lib/tzdb.dat
(in my case it's in the JDK's JRE). That file indeed contains many regions, here is a snippet:
TZDB 2014cJ Africa/Abidjan Africa/Accra Africa/Addis_Ababa Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau Africa/Blantyre Africa/Brazzaville Africa/Bujumbura Africa/Cairo Africa/Casablanca Africa/Ceuta Africa/Conakry Africa/Dakar Africa/Dar_es_Salaam Africa/Djibouti
Africa/Douala Africa/El_Aaiun Africa/Freetown Africa/Gaborone
Africa/Harare Africa/Johannesburg Africa/Juba Africa/Kampala Africa/Khartoum
Africa/Kigali Africa/Kinshasa Africa/Lagos Africa/Libreville Africa/Lome
Africa/Luanda Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru Africa/Mbabane Africa/Mogadishu Africa/Monrovia Africa/Nairobi Africa/Ndjamena
Africa/Niamey Africa/Nouakchott Africa/Ouagadougou Africa/Porto-Novo Africa/Sao_Tome Africa/Timbuktu Africa/Tripoli Africa/Tunis Africa/Windhoek America/Adak America/Anchorage America/Anguilla America/Antigua America/Araguaina America/Argentina/Buenos_Aires America/Argentina/Catamarca America/Argentina/ComodRivadavia America/Argentina/Cordoba America/Argentina/Jujuy America/Argentina/La_Rioja America/Argentina/Mendoza America/Argentina/Rio_Gallegos America/Argentina/Salta America/Argentina/San_Juan America/Argentina/San_Luis America/Argentina/Tucuman America/Argentina/Ushuaia
America/Aruba America/Asuncion America/Atikokan America/Atka
America/Bahia
Then If one finds a way to create a similar file with only the needed zones and load that one instead, the performance issues will probably not surely be resolved.
回答2:
This problem is caused by ZoneRulesProvider.getAvailableZoneIds()
which copied the set of time-zones each time. Bug JDK-8066291 tracked the issue, and it has been fixed in Java SE 9. It will not be backported to Java SE 8 because the bug fix involved a specifiation change (the method now returns an immutable set instead of a mutable one).
As a side note, some other performance issues with parsing have been backported to Java SE 8, so always use the latest update release.
来源:https://stackoverflow.com/questions/34374464/extremely-slow-parsing-of-time-zone-with-the-new-java-time-api