Jan Erik and Steve met, and discussed some of the materials that were posted on this wiki. See the starting page for more details:
Jan Erik liked the way the se-example2.zip came out. Steve Suggested that if Jan Erik were to add some of the additional files to the example for his interviews, it would be a nice package, too. Particularly if you added demographic data on your subjects in the data.csv file (all the stuff in your metadata summary regarding subject background info … well, what parts of it you have available, anyway).
Jan Erik also suggested that it would be valuable to use a more structured form for transcript data to promote tool-processability of the transcripts. Is there a more structured text format for the interviews you would recommend?
I agree that it would be preferable to use a more structured format, but the problem is that if there is not a standard format available, it will be difficult to get many contributors (besides us ;-)) to use it.
I think the term in text mining for this is “semi-structured text”. Most text mining tools try to work in a flexible/forgiving way with such documents.
Action items for next week:
- I think if we each (well, Brian, Jan Erik, and Kat) could try to apply this strawman format to some dataset, then we’d start to uncover where things are missing or don’t fit well. Steve didn’t have time to package up the interview data that Brian sent, so Brian, you can start there if you like.
- Steve will try to draft up some of the stuff he wrote about the format in the form of a “how to upload” help page for a hypothetical web site.