Smalltalk Birthday Kata Part 4
At the end of Part 3 my Employee
class was fleshed out.
But, I need to be able to parse these from the sample text file.
The sample file consists of a header, and then 2 employee records.
TestCase subclass: FileEmployeeRepositoryTestCase [
<comment: 'tests for parsing employee data from a csv'>
<category: 'BirthdayKata-FileEmployeeRepositoryTests'>
testReadEmployeesFromFileParsesEmployeeRows [
<category: 'tests'>
| repository records |
repository := FileEmployeeRepository newFilename: 'employee_data.txt'.
records := repository employees.
self assert: records size equals: 2
]
]
This instantiates a FileEmployeeRepository
with employee_data.txt
as the
filename and then reads from it, and asserts that we get two items.
Object subclass: FileEmployeeRepository [
<comment: 'I represent an employee database based on a text file.'>
<category: 'BirthdayKata'>
|fileName|
employees [
"Read employees as a collection from the database file."
<category: 'database'>
| f employees parts |
f := (FileStream open: fileName mode: FileStream read) lines.
f skip: 1.
employees := f collect: [:aLine |
aLine
].
f close.
^employees contents
]
]
I’ve ellided the constructor here, so focussing on employees
.
Streaming from a file
f := (FileStream open: fileName mode: FileStream read) lines.
This opens the fileName
in read-only mode, which returns a FileStream
.
FileStream
is a subclass of the Stream
class, and it calls lines
to get the stream streaming back lines from the opened file.
f skip: 1.
This is a Stream
method, that unsurprisingly causes it to skip the next item,
the first item is the header, so this means we don’t need to check for parsing
the header.
employees := f collect: [:aLine |
aLine
]
This is a Stream
operation, but Stream
derives from Iterable
, and so the
collect:
message comes from Iterable
.
Answer a new instance of a Collection containing all the results of evaluating aBlock passing each of the receiver’s elements
This same functionality is available in most modern languages.
Ruby
Comparing this with the collect
method of Ruby’s Enumerable
mixin (which is an alias for #map
)
collect { |obj| block } → array Returns a new array with the results of running block once for every element in enum.
Scala
Scala too, has this in it’s Traversable
xs map f The collection obtained from applying the function f to every element in xs
Clojure
Clojure too, has map
.
Returns a lazy sequence consisting of the result of applying f to the set of first items of each coll, followed by applying f to the set of second items in each coll, until any one of the colls is exhausted. Any remaining items in other colls are ignored. Function f should accept number-of-colls arguments. Returns a transducer when no collection is provided.
(I’m fairly sure this is a Lisp-derived operation originally).
Java
Even modern Java has the same map
method.
Returns a stream consisting of the results of applying the given function to the elements of this stream.
Executing the tests
With this code in place, the test passes…
$ gst-sunit -f EmployeeRepository.st -f EmployeeRepositoryTest.st FileEmployeeRepositoryTestCase
1 run, 1 passes
But, getting the correct number of employees is only half the story.
Further testing
testReadEmployeesParsesEmployeeCorrectly [
<category: 'tests'>
| repository employee |
repository := FileEmployeeRepository newFilename: 'employee_data.txt'.
employee := repository employees contents first.
self assert: (employee firstName) equals: 'John'.
self assert: (employee lastName) equals: 'Doe'.
self assert: (employee email) equals: 'john.doe@foobar.com'.
self assert: (employee birthDate) equals: (Date newDay:8 month: #Oct year: 1982).
]
This test ensure that we’re actually parsing the CSV correctly, I’d split this out into separate tests, but to save space, I’ve merged them, I’ll come back to this, using a TestResource
to manage “expensive” state in a test.
This test currently fails, as the implementation just passes through the CSV lines unparsed.
To get the tests to pass, this is the code that needs to be inserted.
employees := f collect: [:aLine |
parts := (aLine tokenize: ',') collect: [:part | part trim].
Employee
newBirthdate: (Date readFrom: (parts at: 3) readStream)
firstName: (parts at: 2)
lastName: (parts at: 1)
email: (parts at: 4).
]
Breaking down the logic.
parts := (line tokenize: ',') collect: [:part | part trim].
the line
value that is passed to the collect:
block is the text line, and
I’m responsible for splitting that up into it’s component parts.
So, the tokenize:
method of Strings comes in useful, it splits a string on
provided pattern.
Again, collect:
is used to clean up the string values, in this case using
trim
method.
Opening up existing classes
It’s possible to extend an existing class, by adding methods in Smalltalk, and all objects that derive from the class will grow the new method, this can be confusing, and doesn’t scale particularly well, but this feels like core-logic, and I’m only extending String
.
Ruby and Python have this too, and I’ve seen it lead to confusion and collisions in code, so I would caution against doing this, unless it’s something fairly generic like this.
(Kotlin also has Extension Functions which are a significant improvement on this approach, as they have to be explicitly applied in the code).
String extend [
ltrim [
<ccomment: 'I trim leading whitespace from the left of the receiver.'>
<category: 'string-manipulation'>
^self replacingRegex: '^\s+' with: ''.
]
rtrim [
<ccomment: 'I trim leading whitespace from the right of the receiver.'>
<category: 'string-manipulation'>
^self replacingRegex: '\s+$' with: ''.
]
trim [
<ccomment: 'I trim leading whitespace from the left and right of the receiver.'>
<category: 'string-manipulation'>
^self ltrim rtrim.
]
]
By extending String
, I’ve added 3 methods, this simplifies the cleaning up of
the parts split out in each line of the input stream.
Employee
newBirthdate: (Date readFrom: (parts at: 3) readStream)
firstName: (parts at: 2)
lastName: (parts at: 1)
email: (parts at: 4).
This instantiates a new Employee
, using the stripped and tidied up row parts,
collect:
gathers up the instantiated Employee
s and we get to the end of the
code.
One curio in this code:
(Date readFrom: (parts at: 3) readStream)
This converts the String
into a stream, and then “reads” the date from the
string to parse the string into a Date
.
^employees contents
In my original test I had:
self assert: records size equals: 2
This test fails, because Streams don’t understand the size
message, Streams are lazy, i.e. they are not realised until the last moment:
From contents
“Answer the whole contents of the receiver, from the next object to the last”
In order to get my test to pass as-is, I returned the contents, but…returning the unrealised stream is more useful.
This needs a tiny tweak to the test.
testReadEmployeesFromFileParsesEmployeeRows [
<category: 'tests'>
| repository records |
repository := FileEmployeeRepository newFilename: 'employee_data.txt'.
records := repository employees contents.
self assert: records size equals: 2
]
This doesn’t actually require any changes in the BirthdayService
driver,
because unrealised streams and arrays have the same do:
functionality.