At the end of Part 3 my Employee class was fleshed out.

But, I need to be able to parse these from the sample text file.

The sample file consists of a header, and then 2 employee records.

TestCase subclass: FileEmployeeRepositoryTestCase [
    <comment: 'tests for parsing employee data from a csv'>
    <category: 'BirthdayKata-FileEmployeeRepositoryTests'>

    testReadEmployeesFromFileParsesEmployeeRows [
        <category: 'tests'>
        | repository records |
        repository := FileEmployeeRepository newFilename: 'employee_data.txt'.

        records := repository employees.

        self assert: records size equals: 2
    ]
]

This instantiates a FileEmployeeRepository with employee_data.txt as the filename and then reads from it, and asserts that we get two items.

Object subclass: FileEmployeeRepository [
  <comment: 'I represent an employee database based on a text file.'>
  <category: 'BirthdayKata'>
  |fileName|

  employees [
    "Read employees as a collection from the database file."
    <category: 'database'>
    | f employees parts |
    f := (FileStream open: fileName mode: FileStream read) lines.
    f skip: 1.
    employees := f collect: [:aLine |
      aLine
    ].
    f close.
    ^employees contents
  ]
]

I’ve ellided the constructor here, so focussing on employees.

Streaming from a file

f := (FileStream open: fileName mode: FileStream read) lines.

This opens the fileName in read-only mode, which returns a FileStream.

FileStream is a subclass of the Stream class, and it calls lines to get the stream streaming back lines from the opened file.

f skip: 1.

This is a Stream method, that unsurprisingly causes it to skip the next item, the first item is the header, so this means we don’t need to check for parsing the header.

employees := f collect: [:aLine |
  aLine
]

This is a Stream operation, but Stream derives from Iterable, and so the collect: message comes from Iterable.

Answer a new instance of a Collection containing all the results of evaluating aBlock passing each of the receiver’s elements

This same functionality is available in most modern languages.

Ruby

Comparing this with the collect method of Ruby’s Enumerable mixin (which is an alias for #map)

collect { |obj| block } → array Returns a new array with the results of running block once for every element in enum.

Scala

Scala too, has this in it’s Traversable

xs map f The collection obtained from applying the function f to every element in xs

Clojure

Clojure too, has map.

Returns a lazy sequence consisting of the result of applying f to the set of first items of each coll, followed by applying f to the set of second items in each coll, until any one of the colls is exhausted. Any remaining items in other colls are ignored. Function f should accept number-of-colls arguments. Returns a transducer when no collection is provided.

(I’m fairly sure this is a Lisp-derived operation originally).

Java

Even modern Java has the same map method.

Returns a stream consisting of the results of applying the given function to the elements of this stream.

Executing the tests

With this code in place, the test passes…

$ gst-sunit -f EmployeeRepository.st -f EmployeeRepositoryTest.st FileEmployeeRepositoryTestCase
1 run, 1 passes

But, getting the correct number of employees is only half the story.

Further testing

testReadEmployeesParsesEmployeeCorrectly [
    <category: 'tests'>
    | repository employee |
    repository := FileEmployeeRepository newFilename: 'employee_data.txt'.

    employee := repository employees contents first.

    self assert: (employee firstName) equals: 'John'.
    self assert: (employee lastName) equals: 'Doe'.
    self assert: (employee email) equals: 'john.doe@foobar.com'.
    self assert: (employee birthDate) equals: (Date newDay:8 month: #Oct year: 1982).
]

This test ensure that we’re actually parsing the CSV correctly, I’d split this out into separate tests, but to save space, I’ve merged them, I’ll come back to this, using a TestResource to manage “expensive” state in a test.

This test currently fails, as the implementation just passes through the CSV lines unparsed.

To get the tests to pass, this is the code that needs to be inserted.

employees := f collect: [:aLine |
  parts := (aLine tokenize: ',') collect: [:part | part trim].
  Employee
    newBirthdate: (Date readFrom: (parts at: 3) readStream)
    firstName: (parts at: 2)
    lastName: (parts at: 1)
    email: (parts at: 4).
]

Breaking down the logic.

parts := (line tokenize: ',') collect: [:part | part trim].

the line value that is passed to the collect: block is the text line, and I’m responsible for splitting that up into it’s component parts.

So, the tokenize: method of Strings comes in useful, it splits a string on provided pattern.

Again, collect: is used to clean up the string values, in this case using trim method.

Opening up existing classes

It’s possible to extend an existing class, by adding methods in Smalltalk, and all objects that derive from the class will grow the new method, this can be confusing, and doesn’t scale particularly well, but this feels like core-logic, and I’m only extending String.

Ruby and Python have this too, and I’ve seen it lead to confusion and collisions in code, so I would caution against doing this, unless it’s something fairly generic like this.

(Kotlin also has Extension Functions which are a significant improvement on this approach, as they have to be explicitly applied in the code).

String extend [
   ltrim [
      <ccomment: 'I trim leading whitespace from the left of the receiver.'>
      <category: 'string-manipulation'>
      ^self replacingRegex: '^\s+' with: ''.
   ]
   rtrim [
      <ccomment: 'I trim leading whitespace from the right of the receiver.'>
      <category: 'string-manipulation'>
      ^self replacingRegex: '\s+$' with: ''.
   ]
   trim [
      <ccomment: 'I trim leading whitespace from the left and right of the receiver.'>
      <category: 'string-manipulation'>
      ^self ltrim rtrim.
   ]
]

By extending String, I’ve added 3 methods, this simplifies the cleaning up of the parts split out in each line of the input stream.

Employee
  newBirthdate: (Date readFrom: (parts at: 3) readStream)
  firstName: (parts at: 2)
  lastName: (parts at: 1)
  email: (parts at: 4).

This instantiates a new Employee, using the stripped and tidied up row parts, collect: gathers up the instantiated Employees and we get to the end of the code.

One curio in this code:

(Date readFrom: (parts at: 3) readStream)

This converts the String into a stream, and then “reads” the date from the string to parse the string into a Date.

^employees contents

In my original test I had:

self assert: records size equals: 2

This test fails, because Streams don’t understand the size message, Streams are lazy, i.e. they are not realised until the last moment:

From contents

“Answer the whole contents of the receiver, from the next object to the last”

In order to get my test to pass as-is, I returned the contents, but…returning the unrealised stream is more useful.

This needs a tiny tweak to the test.

testReadEmployeesFromFileParsesEmployeeRows [
    <category: 'tests'>
    | repository records |
    repository := FileEmployeeRepository newFilename: 'employee_data.txt'.

    records := repository employees contents.

    self assert: records size equals: 2
]

This doesn’t actually require any changes in the BirthdayService driver, because unrealised streams and arrays have the same do: functionality.