Monday, May 30, 2016

Automate the build and run in local machine - GSoC 2016

Wrote a small bash script today. Just few lines. It do these things specifically,
  • Build tajo-storage-mongodb module and copy the .jar file to the snapshot. The snap shot is already configured to use mongo storage.
  • Remove the logs. 
  • Start tajo. 
  • Wait for a little and open the log file with gedit. 
  • Stop tajo 
Lol. It is really a small script, but it simplified my work a lot.

Anyway I was able to run tajo with this configurations. 
Of course table space don't do anything yet but seeing something like this makes me really happy. :D

The First Week ( GSoC )

The beginning of the coding period was not actually rushing as expected. This week was allocated to discuss the architecture of the module, with my mentors. Actually it was done a long before. Of course still there are questions regarding the architecture but they can't be solved before hand. They will be solved during the implementation. It's agile guys!

Project at the moment

Created a new module for mongodb storage plugin which is going to be implemented throughout the summer by me ;) 
Created the following main classes by implementing those interfaces and abstract classes.
  • MongoDbTableSpace
  • MongoDbFragment
  • MongoDbScanner 
  • MongoDbAppender
Also implemented a class called ConnectionInfo to keep MongoDB connection. When I implement it copplied a lot from the JDBC connection info class. Thank you blrunner. Hope you will not be mad at me about that. ;)

Problems and Solutions

Let's discuss about some questions came across in the first week. The first question was regarding the newly created module. When I buld using mvn command it says the module was build successfully but the relevant jar was not in the snapshot. I couldn't find why was it. Actually I build it several time (around 10 times) by changing pom.xml file several times. Problem was not with the mvn configurations. The module was build in the module directory, but it should be copied into the snapshot directory. It is done by a command in pom.xml of tajo-storage module. Anyway I added the lien and it started to work fine.

The next question is replication. It is something complex. ;) The thing is that in configurations for hdfs you can provide multiple hosts. MongoDB also can have multiple hosts as replica. Should the storage plugin I write include that functionality? If so, how the URI passed a question. For a table space details will of the table is given as a URI. By default java URI don't allowed multiple hosts. Then how hdfs do that? It is something to be studied.

Optional

I setup the Travis for my GitHub account. It is cool. I mean great. It can be name as one of the coolest things provided in the internet. Traivis automatically build the project in my GirHub repositories. We can configure it with travis.yml. And the best thing is it is completely free for opensource projects. :D :D 

Sunday, May 8, 2016

The Simple Contribution - GSoC 2016

Got a reply from the mongo community. Seems like I have to learn a lot about mapping document based databases to column based databases.

Yesterday something marvelous happened. My mentor asked me to do a commit. Actually he told me how to do it. First I couldn't even understand the issue, but somehow he explained it really well. I made the changes in my repo yesterday, today I  make the pull request. Lol, I should have done it yesterday, but I had doubts. Anyway Travis the bot is doing tests automatically. I don't have to worry about that. 😂 I think what I edited do not effect unit tests or integration tests but Jaehwa said that after testing with MariaDB server, he'll prepare to commit my patch.

Further I got an email from a student(Subashini Hariharan) who is doing her Masters. She wants to add Cassandra plugin for Tajo as her Master's project. I don't know whether it is enough for that but I think it's a great idea. I think it is possible and will be easier to do compared to the MongoDB. So I introduced her to my mentors.


Still I need to understand the storage module architecture. It can't be much complicated. I want understand how to map Mongo Collections to Tajo tables. Thing is that we don't have much time. So many assignments and submissions. I am going to go through Storage Module again today. That's it for today.