‘Found Poem’ Generator – Loading Data From A File

#poem #poetry #python #programming

The next stage in my Found Poem Generator project was to work on item number 6 in my list of enhancements. This is to update the program to read in the found poem lines from a csv text file rather than having them hard-coded in the program itself:

1. Output the created poems to a larger screen ie a graphical type large fonted output rather than  using the console window.
2. Provide two methods of running:
a. Auto-generate a new poem every 5 minutes or so
b. Provide buttons for user input to create poems on demand
3. Package the program up to be self-contained
4. Deploy the program onto some sort of platform that wasn’t my Mac laptop ie a Raspberry PI and large screen monitor or TV
5. Make the program auto-run when the device is started / restarted

Further enhancements could then be:

6. Use a text / csv file to load in the words and lines of the poems rather than the hardcoded ones used currently
7. Check the first letter of each line to capitalise it for poems and un-capitalise it for haikus, and add commas to the end of lines if they are missing or remove for haikus.

This tutorial got me most of the way there, opening the csv file and reading in the records. I could then access the individual fields of each row using the ‘row’ command ie ‘row[0]’ gives me the data from column 0 of the current row, ‘row[1]’ gives me the data from column 1 etc. I was then ready to read the lines from the csv file into my data structure in the program. This is where I came a little unstuck.

 

My project was based on each poem line or phrase being an individual object. Each object was created one after the other and given a sequential name using code like this:

p1 = phrase(blah, blah, blah)
p2 = phrase(blah, blah, blah)

So ‘p1’ becomes an object called ‘p1’ which has some attributes ie the poem phrase or line, the word and syllable counts. Once all the objects are created they get loaded into a registry so I can keep track of them and mark them off as used once they have been selected and displayed for a particular poem. My problem here was auto-generating the ‘p1 =‘ part of the code for each record in the csv file. 

There just didn’t seem to be an easy way of generating the new object names in order to instantiate objects for them. A re-think was required.

In my previous life I would have read the data from the csv into a multi-dimensional array and worked with that, so I thought I would explore this option first. Unfortunately Python doesn’t appear to support multi-dimensional arrays. It has lists which are single-dimensional arrays and dictionaries which are more structured versions of lists but in order to make either of these multi-dimensional one has to make a list of lists or a dictionary of dictionaries which started to look quite complex for my dataset which has 4 columns plus the ‘line has been used’ flag in the registry. So I googled for possible solutions and came across someone asking a similar question and one of the answers intrigued and surprised me – Python has SQLLite built in and it can be instantiated and run solely in memory. 

Neat. Now all I have to do is chuck out most of my code and rewrite using a SQL table in memory instead of instantiating all those phrase objects and using a registry to track them all.

This is of course a lesson in how not to develop, or one of the pitfalls of having bespoke software written, and potentially a pitfall of the agile software development methodology. If we don’t know at the start where we are going to end up then we can end up writing ourselves into a corner. Ever asked for what you feel like would be a small simple change to a piece of software and been quoted an extortionate amount of effort for it? This is where I was right at this moment, my whole design no longer supported the new functionality I need to write into it.

In reading the poem phrases in from a file I could no longer use my object based data structure. Now that I was no longer using this structure it meant that my method and code for selecting the lines of poem no longer worked and were no longer appropriate either. 

I began working methodically, finishing the function that reads the data in from the csv file and then working through the two functions that created the poem output and the Haiku output. Once I had finished the poem function, I was able to reuse much of the code to update the haiku function.

The previous methods used a random number generator to pick a line number and then simply iterated through all the poem line objects until it reached that line and then checked a flag to see if it had already been used. If it hadn’t been used before the line was selected and the flag updated. Either way the process then ran again until all the required lines had been selected. This sort of continuous scrolling though data is just about passable as a solution when working with a small array of data in memory, but the conversion of the data storage to a SQL based system meant this method was not really best practice. Sure it would work but it wouldn’t be pretty under the bonnet. It just would not do.

The change was relatively straight-forward, the random line number is still generated, the line is read directly from the table using a SELECT and then checked for validity. If it passes it gets used, if it fails then we go round again until the requisite number of lines have been retrieved. After working on this method for the ‘poem’ function I made it a little more efficient by updating the SQL to only select lines that had not been used and were the correct number of syllables. This still meant the program spent an amount of time firing ‘SELECT’ statements and returning empty recordsets when they didn’t match the right criteria ie the random line selected had already been used or didn’t have the required number of syllables. An update for the future on this method would be to firstly identify all the valid lines and then select one randomly from that subset, this would bring the SQL interaction down to a maximum of two SELECTs per poem line rather than a variable amount which is happening now.

This version also contains a little tidying up, I’ve learnt more about the ‘Pack’ command and organised the screen a little better and also change the font size based on the number of lines of poem that need to be displayed – this prevents the buttons disappearing off the end of the screen on long poems. In the back of my mind I’m also wondering what will happen if I have long lines as well, these are likely to disappear of the the sides of the screen so I ought to write in some sort of dynamic font sizing routine to check for and handle these potential issues.

The updated source code has been placed in Dropbox for those of you enjoying working through it yourselves.

One final thought I had to add to the ‘To Do’ list is for the program to output each of the poems created to a text or log file to create a permanent record of all the combinations created.

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s