Reading a xml file multithreaded

こ雲淡風輕ζ 提交于 2019-12-06 01:40:31

I agree with most everyone's comments. Reading a 38Kb file should not take so long. Do you have something else running on the machine, antivirus / etc, that could be interfering with the processing?

The amount of time it would take you to create a thread will be far greater than the amount of time spent reading the file. If you could post the actual code used to read the file and the file itself, it might help analyze performance bottlenecks.

I think you can't parse XML in multiple threads, at least not in a way that would bring performance benefits, because to read from some point in the file, you need to know everything that comes before it, if nothing else, to know at what level you are.

Your code, if tit worked, would do something like this:

main  season1  season2

read
read
skip   read
skip   read
read
skip             read
skip             read

Note that to do “skip”, you need to fully parse the XML, which means you're doing the same amount of work as before on the main thread. The only difference is that you're doing some additional work on the background threads.

Regarding the slowness, just parsing such a small XML file should be very fast. If it's slow, you're most likely doing something else that is slow, or you're parsing the file multiple times.

If I am understanding how your .xml file is being used, you have essentially created an .xml database.

If correct, I would recommend breaking your Xml into different .xml files, with an indexed .xml document. I would think you can then query - using Linq-2-Xml - a set of .xml data from a specific .xml source.

Of course, this means you will still need to load an .xml file; however, you will be loading significantly smaller files and you would be able to, although highly discouraged, asynchronously load .xml document objects.

Your XML schema doesn't lend itself to parallelism since you seem to have node names (Season1, Season2) that contain the same data but must be parsed individually. You could redesign you schema to have the same node names (i.e. Season) and attributes that express the differences in the data (i.e. Number to indicate the season number). Then you can parallelize i.e. using Linq to XML and PLinq:

XDocument doc = XDocument.Load(@"TVShowSeasons.xml");
var seasonData = doc.Descendants("Season")
                    .AsParallel()
                    .Select(x => new Season()
                    {
                        Number = (int)x.Attribute("Number"),
                        Descripton = x.Value
                    }).ToList();
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!