My program needs to do 2 things.
Extract stuff from a webpage.
Do stuff with a webpage.
However, there are many webpages, su
It's up to you. I personally try to stay away from Java-style classes when programming in python. Instead, I use dicts and/or simple objects.
For instance, after defining these functions (the ones you defined in the question), I'd create a simple dict, maybe like this:
{ 'facebook' : { 'process' : facebookProcess, 'extract': facebookExtract },
.....
}
or, better yet, use introspection to get the process/extract function automatically:
def processor(sitename):
return getattr(module, sitename + 'Process')
def extractor(sitename):
return getattr(module, sitename + 'Extractor')
Where module
is the current module (or the module that has these functions).
To get this module as an object:
import sys
module = sys.modules[__name__]
Assuming of course, that the generic main function does something like this:
figure out sitename based on input. get the extractor function for the site get processor function for the site call the extractor then the processor
"My program needs to do 2 things."
When you start out like that, the objects cannot be seen. You're perspective isn't right.
Change your thinking.
"My program works with stuff"
That's OO thinking. What "stuff" does your program work with? Define the stuff. Those are your basic classes. There's a class for each kind of stuff.
"My program gets the stuff from various sources"
There's a class for each source.
"My program displays the stuff"
This is usually a combination of accessor methods of the stuff plus some "reporting" classes that gather parts of the stuff to display it.
When you start out defining the "stuff" not the "do", you're doing OO programming. OO applies to everything, since every single program involves "doing" and "stuff". You can chose the "doing" POV (which is can be procedural or functional), or you can chose the "stuff" POV (which is object-oriented.)
Put as much of the common stuff together in a single function. Once you've factored as much out as possible, build a mechanism for branching to the appropriate function for each website.
One possible way to do this is with python's if/else
clauses, but if you have many such functions, you may want something more elegant such as
F = __import__('yourproject.facebookmodule')
This lets you put the code that's specific for facebook in it's own area. Since you pass __import__()
a string, you can modify that at runtime based on which site you're accessing, and then just call function F in your generic worker code.
More on that here: http://effbot.org/zone/import-confusion.htm
You use OOP when it makes sense, when it makes developing the solution quicker and when it makes the end result easier to read, understand and maintain.
In this case it might make sense to create a generic Extractor interface/class and then have subclasses for Twitter, MySpace, Facebook, etc but this really depends on how site-specific the extraction is. The idea of this kind of abstraction is to hide such details. If you can do it, it makes sense. If you can't you probably need a different approach.
It may also be that similar benefits can be obtained from good decomposition of a procedural solution.
Remember at the end of the day that all these things are just tools. Pick the best one for that particular job rather than picking the hammer and then trying to turn everything into a nail.
I regularly define classes for solving problems for a few reasons, I'll model an example of my thinking below. I have no compunctions about mixing OO models and procedural styles, that's often more a reflection of your work society than personal religion. It often works to have a procedural facade for a class hierarchy if that's what other maintainers expect. (Please excuse the PHP syntax.)
// batch process method
function MunchPages( $list_of_urls )
{
foreach( $list_of_urls as $url )
{
$muncher = PageMuncher::MuncherForUrl( $url );
$muncher->gather();
$muncher->process();
}
}
// factory method encaps strategy selection
function MuncherForUrl( $url )
{
if( strpos( $url, 'facebook.com' ))
return new FacebookPageMuncher( $url );
if( ... )
return new .... ;
}
// common tasks defined in base PageMuncher
class PageMuncher
{
function gather() { /* use some curl or what */ }
function process() {}
}
class FacebookPageMuncher extends PageMuncher
{
function process() { /* I do it 'this' way for FB */ }
}
class PageMuncherUtils
{
static function begin( $html, $context )
{
// process assertions about html and context
}
static function report_fail( $context ) {}
static function exit_retry( $context ) {}
}
// elsewhere I compose the methods in cases I don't wish to inherit them
class TwitterPageMuncher
{
function validateAnchor( $html, $context )
{
if( ! PageMuncherUtils::begin( $html, $context ))
return PageMuncherUtils::report_fail( $context );
}
}
class WeatherAPI
{
const URL = 'http://weather.net';
const URI_TOMORROW = '/nextday/';
const URI_YESTERDAY= '/yesterday/';
const API_KEY = '123';
}
class WeatherService
{
function get( $uri ) { }
function forecast( $dateurl ) { }
function alerts( $dateurl )
{
return new WeatherAlert(
$this->get( WeatherAPI::URL.$date
."?api=".WeatherAPI::API_KEY ));
}
}
class WeatherAlert
{
function refresh() {}
}
// exercise:
$alert = WeatherService::alerts( WeatherAPI::URI_TOMORROW );
$alert->refresh();
My favorite rule of thumb: if you're in doubt (unspoken assumption: "and you're a reasonable person rather than a fanatic";-), make and use some classes. I've often found myself refactoring code originally written as simple functions into classes -- for example, any time the simple functions' best way to communicating with each others is with globals, that's a code smell, a strong hint that the system's factoring is not really good -- and often refactoring the OOP way is a reasonable fix for that.
Python is multi-paradigm, but its central paradigm is OOP (much like, say, C++). When a procedural or functional approach (maybe through generators &c) is optimal for some part of the system, that generally stands out -- for example, static functions are also a code smell, and if your classes have any substantial amount of those THAT is a hint to refactor things to avoid that requirement.
So, assuming you have a rich grasp of all the paradigms Python affords -- if you're STILL in doubt, that suggests you probably want to go OOP for that part of your system! Just because Python supports OOP even more wholly than it supports functional programming and the like.
From your very skeletal code, it seems to me that each extract/process pair belongs together and probably needs to communicate state, so a small set of classes with extraction and processing methods seems a natural fit.