We have a web application that does time-tracking, payroll, and HR. As a result, we have to write a lot of fixed-width data files for export into other systems (state tax filing
I don't know of any PHP library that specifically handles fixed-width records. But there are some good libraries for filtering and validating a row of data fields if you can do the job of breaking up each line of the file yourself.
Take a look at the Zend_Filter and Zend_Validate components from Zend Framework. I think both components are fairly self-contained and require only Zend_Loader to work. If you want you can pull just those three components out of Zend Framework and delete the rest of it.
Zend_Filter_Input acts like a collection of filters and validators. You define a set of filters and validators for each field of a data record which you can use to process each record of a data set. There are lots of useful filters and validators already defined and the interface to write your own is pretty straightforward. I suggest the StringTrim filter for removing padding characters.
To break up each line into fields I would extend the Zend_Filter_Input class and add a method called setDataFromFixedWidth(), like so:
class My_Filter_Input extends Zend_Filter_Input
{
public function setDataFromFixedWidth($record, array $recordRules)
{
if (array_key_exists('regex', $recordRules) {
$recordRules = array($recordRules);
}
foreach ($recordRules as $rule) {
$matches = array();
if (preg_match($rule['regex'], $record, $matches)) {
$data = array_combine($rule['fields'], $matches);
return $this->setData($data);
}
}
return $this->setData(array());
}
}
And define the various record types with simple regular expressions and matching field names. ICESA might look something like this:
$recordRules = array(
array(
'regex' => '/^(A)(.{4})(.{9})(.{4})/', // This is only the first four fields, obviously
'fields' => array('recordId', 'year', 'federalEin', 'taxingEntity',),
),
array(
'regex' => '/^(B)(.{4})(.{9})(.{8})/',
'fields' => array('recordId', 'year', 'federalEin', 'computer',),
),
array(
'regex' => '/^(E)(.{4})(.{9})(.{9})/',
'fields' => array('recordId', 'paymentYear', 'federalEin', 'blank1',),
),
array(
'regex' => '/^(S)(.{9})(.{20})(.{12})/',
'fields' => array('recordId', 'ssn', 'lastName', 'firstName',),
),
array(
'regex' => '/^(T)(.{7})(.{4})(.{14})/',
'fields' => array('recordId', 'totalEmployees', 'taxingEntity', 'stateQtrTotal'),
),
array(
'regex' => '/^(F)(.{10})(.{10})(.{4})/',
'fields' => array('recordId', 'totalEmployees', 'totalEmployers', 'taxingEntity',),
),
);
Then you can read your data file line by line and feed it into the input filter:
$input = My_Filter_Input($inputFilterRules, $inputValidatorRules);
foreach (file($filename) as $line) {
$input->setDataFromFixedWidth($line, $recordRules);
if ($input->isValid()) {
// do something useful
}
else {
// scream and shout
}
}
To format data for writing back to the file, you would probably want to write your own StringPad filter that wraps the internal str_pad function. Then for each record in your data set:
$output = My_Filter_Input($outputFilterRules);
foreach ($dataset as $record) {
$output->setData($record);
$line = implode('', $output->getEscaped()) . "\n";
fwrite($outputFile, $line);
}
Hope this helps!