I do a lot of exploratory work poking around in Kubernetes clusters, and Git repositories for information to radiate information about GitOps deployments.

Currently, I’m exploring git repository directory structures, trying to detect structures in the files and directories, in the past I’ve looked at ways of describing this in a “manifest”, and while I think this is useful, it leads to duplication of the data.

So, what can we figure out from a git repository?

If a git repository contains this simplified directory structure, I want to detect the single app demo-app.

apps
└── demo-app

To do this, I created a tarball of this, because I’m working with Flux GitRepository objects, and write this test…

func TestFindApplications(t *testing.T) {
	ts := httptest.NewServer(http.FileServer(http.Dir("testdata/archives")))
	t.Cleanup(func() {
		ts.Close()
	})

	apps, err := FindApplications(ts.URL+"/first.tar.gz")
	if err != nil {
		t.Fatal(err)
	}
	want := sets.NewString("demo-app")
	if diff := cmp.Diff(want, apps); diff != "" {
		t.Fatalf("failed to parse apps:\n%s", diff)
	}
}

This basically sets up a Server to serve the generated tarball from a testdata directory.

The simplest thing I could write to implement this looks like this:

func FindApplications(archiveURL string) (sets.String, error) {
	resp, err := http.Get(archiveURL)
	// error handling trimmed for brevity
	defer resp.Body.Close()

	gzipReader, err := gzip.NewReader(resp.Body)
	// error handling trimmed for brevity

	tarReader := tar.NewReader(gzipReader)
	apps := sets.NewString()
	for true {
		header, err := tarReader.Next()
		if err == io.EOF {
			break
		}
		// error handling trimmed for brevity
		switch header.Typeflag {
		case tar.TypeDir:
			if strings.HasPrefix(header.Name, "apps") {
				appPath := strings.TrimPrefix(header.Name, "apps")
				pathElements := strings.Split(appPath, "/")
				if len(pathElements) == 2 {
					apps.Insert(pathElements[1])
				}
			}
		}
	}
	return apps, nil
}

I am removing most error handling and logging from this code to clarify it

I could download the tarball and unarchive it to a temporary directory, but I’ll parse the streamed tarball in this case.

This gets the test passing, great!

From here, we need tests for error handling, what happens if the file is a 404, or isn’t a valid GZIP stream?

But, the main goal of this code, is parsing directory trees, and I don’t want to create tarballs for every case I want to test…it’s too expensive.

I could write a tarball generation mechanism, and I might need to, but the first thing I really care about, is making sure that it parses the correct things from the names (the code doesn’t read the files that are in the archive yet).

So, I refactor the FindApplications function, to separate out the functionality, this gets me code that looks like this:

func FindApplications(archiveURL string) (sets.String, error) {
	// identical to the earlier code, opening the stream and ddecoding
	return parseTarball(tarReader, appsPrefix)
}

func parseTarball(tarReader *tar.Reader, base string) (sets.String, error) {
	apps := sets.NewString()
	for true {
		header, err := tarReader.Next()
		if err == io.EOF {
			break
		}
		// trim error handling
		switch header.Typeflag {
		// case tar.TypeReg:
		case tar.TypeDir:
			if strings.HasPrefix(header.Name, appsPrefix) {
				appPath := strings.TrimPrefix(header.Name, appsPrefix)
				pathElements := strings.Split(appPath, "/")
				if len(pathElements) == 2 {
					apps.Insert(pathElements[1])
				}
			}
		}
	}
	return apps, nil
}

My tests still pass, but now I can isolate the behaviour of the directory parsing.

All I need from the *tar.Reader is this simple interface.

type tarIterator interface {
	Next() (*tar.Header, error)
}

Because of this, I can build a fake tar reader…

type fakeTarReader struct {
	headers []*tar.Header
	sync.Mutex
}

// Next() implements the tarReader interface.
func (s *fakeTarReader) Next() (*tar.Header, error) {
	s.Lock()
	defer s.Unlock()
	if len(s.headers) == 0 {
		return nil, io.EOF
	}
	r := s.headers[len(s.headers)-1]
	s.headers = s.headers[0 : len(s.headers)-1]
	return r, nil
}

This is a simple stack implementation which is a technique I use a fair bit in test fakes.

I can then cheaply set up a set of table tests…

func Test_parseTarball(t *testing.T) {
	parseTests := []struct {
		headers []*tar.Header
		prefix  string
		want    sets.String
	}{
		{
			headers: []*tar.Header,
			want:    sets.NewString("demo-app"),
			prefix:  "apps",
		},
		{
			headers: []*tar.Header,
			want:    sets.NewString("test-app"),
			prefix:  "team/apps",
		},
	}

	for i, tt := range parseTests {
		t.Run(fmt.Sprintf("test-%d", i), func(t *testing.T) {
			reader := &fakeTarReader{headers: tt.headers}
			apps, err := parseTarball(reader, tt.prefix)
			if err != nil {
				t.Fatal(err)
			}
			if diff := cmp.Diff(tt.want, apps); diff != "" {
				t.Fatalf("failed to parse apps:\n%s", diff)
			}
		})
	}
}

The second test, with the team/apps prefix fails, and I’m into the classic TDD “write a failing test”, “make the tests pass”, “refactor” cycle.

There’s a lot of discussion about whether or not you should test “private methods”, but I’m pragmatic :-)

In this case, it’s not exported, and so, I don’t feel too bad, but this is a tricky line to tread in TDD, as this test is strongly tied to the underlying tar parsing.