How much work is it reasonable for an object constructor to do? Should it simply initialize fields and not actually perform any operations on data, or is it okay to have it
"Should the parsing code be placed within a void parseHtml() method and the accessors only return valid values once this method is called?"
Yes.
"The design of the class is such that the class' constructor does the parsing"
This prevents customization, extension, and -- most importantly -- dependency injection.
There will be times when you want to do the following
Construct a parser.
Add Features to the parser: Business Rules, Filters, Better Algorithms, Strategies, Commands, whatever.
Parse.
Generally, it's best to do as little as possible in a constructor so that you are free to extend or modify.
Edit
"Couldn't extensions simply parse the extra information in their constructors?"
Only if they don't have any kind of features that need to be injected. If you want to add features -- say a different strategy for constructing the parse tree -- your subclasses have to also manage this feature addition before they parse. It may not amount to a simple super()
because the superclass does too much.
"Also, parsing in the constructor allows me to fail early"
Kind of. Failing during construction is a weird use case. Failing during construction makes it difficult to construct a parser like this...
class SomeClient {
parser p = new Parser();
void aMethod() {...}
}
Usually a construction failure means you're out of memory. There's rarely a good reason to catch construction exceptions because you're doomed anyway.
You're forced to build the parser in a method body because it has too complex arguments.
In short, you've removed options from the clients of your parser.
"It's inadvisable to inherit from this class to replace an algorithm."
That's funny. Seriously. It's an outrageous claim. No algorithm is optimal for all possible use cases. Often a high-performance algorithm uses a lot of memory. A client may want to replace the algorithm with a slower one that uses less memory.
You can claim perfection, but it's rare. Subclasses are the norm, not an exception. Someone will always improve on your "perfection". If you limit their ability to subclass your parser, they'll simply discard it for something more flexible.
"I don't see needing step 2 as described in the answer."
A bold statement. Dependencies, Strategies and related injection design patterns are common requirements. Indeed, they're so essential for unit testing that a design which makes it difficult or complex often turns out to be a bad design.
Limiting the ability to subclass or extend your parser is a bad policy.
Bottom Line.
Assume nothing. Write a class with as few assumptions about it's use cases as possible. Parsing at construction time makes too many assumptions about client use cases.
A constructor should set up the object to be used.
So whatever that is. That may include taking action on some data or just setting fields. It will change from each class.
In the case you are speaking of an Html Parser, I would opt for creating the class, and then calling a Parse Html method. The reason for this is it gives you a furture opportunity to set items in the class for parsing the Html.
Why not just pass the parser to the constructor? This would allow you to change the implementation without changing the model:
public interface IParser
{
Dictionary<string, object> ParseDocument(string document);
}
public class HtmlParser : IParser
{
// Properties, etc...
public Dictionary<string, object> ParseDocument(string document){
//Do what you need to, return the collection of properties
return someDictionaryOfHtmlObjects;
}
}
public class HtmlScrapper
{
// Properties, etc...
public HtmlScrapper(IParser parser, string HtmlDocument){
//Set your properties
}
public void ParseDocument(){
this.myDictionaryOfHtmlObjects =
parser.ParseDocument(this.htmlDocument);
}
}
This should give you some flexibility in changing/improving how your application performs without needing to rewrite this class.
A possible option is to move the parsing code to a seperate function, make the constructor private, and have a static function parse( html ) that constructs the object and immediately calls the parse function.
This way you avoid the problems with parsing in the constructur (inconsistent state, problems when calling overridden functions, ...). But the client code still gets all the advantages (one call to get the parsed html or an 'early' error).
I think when you create a class ($obj = new class), the class should not affect the page at all, and should be relatively low processing.
For instance:
If you have a user class, it should be checking for incoming login/logout parameters, along with cookies, and assigning them to class variables.
If you have a database class, it should make the connection to the database so it is ready when you are going to start a query.
If you have a class that deals with a particular form, it should go get the form values.
In a lot of my classes, I check for certain parameters to define an 'action', like add, edit or delete.
All of these things don't really affect the page, so it wouldn't matter too much if you created them or not. They are simply ready for when you are going to call that first method.
I agree with the posters here arguing minimal work in the constructor, really just putting the object into a non-zombie state, then have verb functions like parseHTML();
One point I'd like to make, although I don't want to cause a flame war, is consider the case of a non-exception environment. I know you're talking about C#, but I try to keep my programming models as similar as possible between c++ and c#. For various reasons, I don't use exceptions in C++ (think embedded video game programming), I use return code errors.
In this case, I can't throw exceptions in a constructor, so I tend to not have a constructor do anything which could fail. I leave that to the accessor functions.