Sunday, May 18, 2008

Detik Parser

I have some leisure time this Saturday so I think it'll be good if I wrote a useful program. So here it is, a program that will read an HTML source of Detik News website, parse all the titles, save all the titles in a list, and finally return the list. The title will be saved in an entity-like Java class so it'll be much easier to save it using ORM mechanism using Hibernate, etc.

Here's the NewsEntity.java:

[sourcecode language='java']
package org.gandhim.news.model;

/**
*
* @author gandhim
*/
public class NewsEntity {
private Long id;

private String datePublished;
private String link;
private String subTitle;
private String title;

public void setId(Long id) {
this.id = id;
}

public Long getId() {
return id;
}

public String getLink() {
return link;
}

public void setDatePublished(String datePublished) {
this.datePublished = datePublished;
}

public void setLink(String link) {
this.link = link;
}

public void setSubTitle(String subTitle) {
this.subTitle = subTitle;
}

public void setTitle(String title) {
this.title = title;
}

public NewsEntity(String datePublished, String link, String subTitle, String title) {
this.datePublished = datePublished;
this.link = link;
this.subTitle = subTitle;
this.title = title;
}

public String getDatePublished() {
return datePublished;
}

public String getSubTitle() {
return subTitle;
}

public String getTitle() {
return title;
}

}
[/sourcecode]

And this is the parser, DetikParser.java:

[sourcecode language='java']
package org.gandhim.news.parser;

import java.io.*;
import java.nio.*;
import java.net.*;
import java.util.List;
import java.util.LinkedList;
import org.gandhim.news.model.NewsEntity;

/**
*
* @author gandhim
*/
public class DetikParser {

private static List newsList = new LinkedList();
private static String url = "http://www.detik.com/indexberita/index.php?fuseaction=indeks.berita&idkanal=10";

public static List doParse() {
setupProxy();
return parseHtml(getRawHtml());
}

public static void setupProxy() {
System.setProperty("http.proxyHost", "your_proxy_server_ip");
System.setProperty("http.proxyPort", "8080");
}

public static String getRawHtml() {
String rawHtml = "";

try {
URL detik = new URL(url);

URLConnection detikConnection = detik.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(detikConnection.getInputStream()));
String readLine;
while ((readLine = in.readLine()) != null) {
rawHtml += readLine + "\n";
}

} catch (Exception e) {
System.out.println("Error occurred: " + e.getMessage());
}

return rawHtml;
}

public static List parseHtml(String rawHtml) {
if (rawHtml == null || rawHtml.trim().equals("")) {
System.exit(-1);
}

String startNews = "namakanalindex";
int startPos, endPos;

startPos = rawHtml.indexOf(startNews);
startPos = rawHtml.indexOf("

2 comments:

  1. i think Hibernate is nightmare for me sir...
    since i got it in my class.. :D

    but the taste NHibernate is good for me... hahaha...

    mm... yami..yami..

    ReplyDelete
  2. Then I guess it's because MS provided tools to make coding with NHibernate is easier right?

    It doesn't matter whether you use Hibernate or NHibernate, once you grab the concept then the tool is a matter of syntax "only" :)

    ReplyDelete