问题
I can't seem to get the CsvReader in the Bonobo ETL library to yield anything other than tuples. The documentation seems to indicate that it should be yielding dicts and not tuples but try as I might I can't seem to get it to pass anything other than tuples. I'd really like to have access to the column names attached to each value. It throws an error that suggests the column names are present when passed but in the transform method I have defined, only the values themselves are available.
import bonobo
def printer(*csv):
print(csv)
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
bonobo.CsvReader('csv.txt'),
printer
)
return graph
def get_services(**options):
return {}
if __name__ == '__main__':
parser = bonobo.get_argument_parser()
with bonobo.parse_args(parser) as options:
bonobo.run(get_graph(**options), services=get_services(**options))
Does it have something to do with the arguments of the printer method? I understand that *csv
as the argument unpacks the arguments of an iterable but any other possible declaration of arguments just throws a typeError.
Any suggestions? Would it be better to avoid using the built in Bonobo CsvReader completely and just create an extract method that uses DictReader or something?
Edit: Here is the error that gets thrown using anything other than *csv
as the argument to printer().
CRIT|0002|bonobo.execution.contexts.base←[90m:←[39m ←[90m│ ←[39mTraceback (most recent call last): ←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 102, i n call ←[90m│ ←[39m bound = self._bind(_input) ←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 89, in _bind ←[90m│ ←[39m return bind(*self.args, *_input, **self.kwargs) ←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 3002, in b ind ←[90m│ ←[39m return args[0]._bind(args[1:], kwargs) ←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 2923, in _ bind ←[90m│ ←[39m raise TypeError('too many positional arguments') from None ←[90m├←[39m←[100m←[97m TypeError ←[39m←[49m ←[97mtoo many positional arguments←[39m ←[90m│ ←[39mThe above exception was the direct cause of the following exception: ←[90m│ ←[39mTraceback (most recent call last): ←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line 102, in loop ←[90m│ ←[39m self.step() ←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line 132, in step ←[90m│ ←[39m results = self._stack(input_bag) ←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 112, i n call ←[90m│ ←[39m )) from exc ←[90m└←[39m←[100m←[97m bonobo.errors.UnrecoverableTypeError ←[39m←[49m ←[97mInput of do es not bind to the node signature. Args: () Input: Bag(id='1', name='Alice',age='20', height='62', weight='120.6') Kwargs: {} Signature: (csv)←[39m
回答1:
There may be an issue with documentation, but the CsvReader is indeed yielding some kind of tuples (in fact, something very similar to namedtuples) for one simple reason: yielding dicts in python3.5 would result in field order change, and a simple csvread->csvwrite would change field order in a non reproductible way.
If you want to retrieve the "raw" input (aka the tuple object, not expanded to args), you can use the @use_raw_input decorator.
from bonobo.config import use_raw_input
@use_raw_input
def some_node(row):
for f in row._fields:
...
Another option if you know the expected fields is to be explicit, using keyword arguments.
def some_node(id, name, value):
...
Hope that helps.
来源:https://stackoverflow.com/questions/51673963/why-does-bonobos-csvreader-method-yield-tuples-and-not-dicts