How much work should the constructor for an HTML parsing class do?

前端 未结 19 1838
悲&欢浪女
悲&欢浪女 2020-12-23 09:55

How much work is it reasonable for an object constructor to do? Should it simply initialize fields and not actually perform any operations on data, or is it okay to have it

19条回答
  •  醉梦人生
    2020-12-23 10:28

    I would not do the parsing in the constructor. I would do everything necessary to validate the constructor parameters, and to ensure that the HTML can be parsed when needed.

    But I'd have the accessor methods do the parsing if the HTML is not parsed by the time they need it to be. The parsing can wait until that time - it does not need to be done in the constructor.


    Suggested code, for discussion purposes:

    public class MyHtmlScraper {
        private TextReader _htmlFileReader;
        private bool _parsed;
    
        public MyHtmlScraper(string htmlFilePath) {
            _htmlFileReader = new StreamReader(htmlFilePath);
            // If done in the constructor, DoTheParse would be called here
        }
    
        private string _parsedValue1;
        public string Accessor1 {
            get {
                EnsureParsed();
                return _parsedValue1;
            }
        }
    
        private string _parsedValue2;
        public string Accessor2 {
            get {
                EnsureParsed();
                return _parsedValue2;
            }
        }
    
        private void EnsureParsed(){
            if (_parsed) return;
            DoTheParse();
            _parsed = true;
        }
    
        private void DoTheParse() {
            // parse the file here, using _htmlFileReader
            // parse into _parsedValue1, 2, etc.
        }
    }
    

    With this code in front of us, we can see there's very little difference between doing all the parsing in the constructor, and doing it on demand. There's a test of a boolean flag, and the setting of the flag, and the extra calls to EnsureParsed in each accessor. I'd be surprised if that extra code were not inlined.

    This isn't a huge big deal, but my inclination is to do as little as possible in the constructor. That allows for scenarios where the construction needs to be fast. These will no doubt be situations you have not considered, like deserialization.

    Again, it's not a huge big deal, but you can avoid doing the work in the constructor, and it's not expensive to do the work elsewhere. I admit, it's not like you're off doing network I/O in the constructor (unless, of course, a UNC file path is passed in), and you're not going to have to wait long in the constructor (unless there are networking problems, or you generalize the class to be able to read the HTML from places other than a file, some of which might be slow).

    But since you don't have to do it in the constructor, my advice is simply - don't.

    And if you do, it could be years before it causes an issue, if at all.

提交回复
热议问题