Option C: Web science
Objectives	C.1 Creating the Web C.2 Searching the Web C.3 Distributed Approaches to the Web C.4 The Evolving Web C.5 Analysing the Web (HL Extension) C.6 The Intellegent Web (HL Extension)
References	Material for this web site compiled from the following sources Quizlet Activity by Ekei Shelvine IB Computer Science - published curriculum

C.1 Creating the Web

Command Term	Level	Definition
Describe	2	Give a detailed account.
Distinguish	2	Make clear the differences between two or more concepts or items
Outline	2	Give a brief account or summary.
Evaluate	3	Make an appraisal by weighing up the strengths and limitations
Explain	3	Give a detailed account including reasons or causes.
Identify	2	Provide an answer from a number of possibilities.

C.1.1 Distinguish between the internet and World Wide Web (web).
*internet is the infrastructure which enables computers, servers and other devices to establish communication by means of cables and satellite connection.
*world wide web is the uses the internet to access data and enable data exchange between users all over the globe. some of the applications of the wed include: wed pages and email.

C.1.2 Describe how the web is constantly evolving.
*the web was first a platform of data exchange with a limited number of users with applications like : online libraries at universities.
*then commercial applications were added to the web like: online shopping.
*with web 2.0, users demand for social aspects have been added such as : social platforms like Facebook or Myspace, semantic webs (helps computers understand the meaning behind the webpages and the interaction between the computer and the users).
* due to the progress of technology and availability of high speed internet, mobile devices and connected things such as fridges, houses, cars will play a bigger role

C.1.3 Identify the characteristics of the following:

hypertext transfer protocol (HTTP)
hypertext transfer protocol (HTTP): is a protocol that describes the data exchange in the world wide web.(which port to uses and how the data should be formatted)
Port 80 is the standard port for HTTP though other ports can be used

hypertext transfer protocol secure (HTTPS)
hypertext transfer protocol secure (HTTPS) : its the same like http but is extended with a security component that encrypts the data exchange between sender and receiver.
Port 443 is the standard port for HTTPS though other ports can be used

hypertext mark-up language (HTML)
hypertext mark-up language (HTML): is the standards for formatting content that is to be displayed in computer browsers.

uniform resource locator (URL)
Uniform Resource Locator (URL) : is the address of a webpage that is usually easy to remember. it consists at least of a second level domain such as "Facebook" and a top level domain such as .com, .de

extensible mark-up language (XML)
Extensible mark-up language (XML) : is a tag-based syntax which is used to structure and describe information

extensible stylesheet language transformations (XSLT)
Extensible style sheet language transformations (XSLT) : is a programming language which transform XML documents into different output formats which are required by browsers such as google chrome and internet explorer.

JavaScript.
JavaScript: is a programming language is commonly used for web applications
We used this during the first year of IBCS to validate form elements, among other things.

cascading style sheet (CSS).
cascading style sheet (CSS): is the central source for formatting instructions of content and layout of a webpage.

Activities Covering the Content

hypertext transfer protocol (HTTP)

Research Web Science class activity

hypertext transfer protocol secure (HTTPS)

Research Web Science class activity

Outline the principal difference between HTML and HTTP. HTML is a programming/scripting/markup language;
HTTP is a protocol/standard;

hypertext mark-up language (HTML)

See activities from last year
HTML Topics from last year
Research Web Science class activity

uniform resource locator (URL)

Research Web Science class activity

extensible mark-up language (XML)

Research Web Science class activity

Identify one characteristic of XML.
It does not contain a fixed set of tags, therefore new ones can be added; [1 mark]

extensible stylesheet language transformations (XSLT)

JavaScript (see year 1 JavaScript coverage)

cascading style sheet (CSS)

See agenda items for Wednesday November 9, 2016, include examples and lesson instructions for creating html pages in Cloud 9 that use CSS
Reference: Adding CSS to HTML
Open-note CSS Quiz
Open-note CSS Quiz - Key

C.1.4 Identify the characteristics of the following:

uniform resource identifier (URI)
URL.

C.1.5 Describe the purpose of a URL.

Activities
Research Web Science class activity

C.1.6 Describe how a domain name server functions.

C.1.7 Identify the characteristics of:

internet protocol (IP)
transmission control protocol (TCP)
file transfer protocol (FTP).

Activities
Research Web Science class activity

C.1.8 Outline the different components of a web page.

C.1.9 Explain the importance of protocols and standards on the web.
A protocol is a set of rules and procedures that both sender and receiver must adhere to in order to allow coherent data transfer; without protocols a lossless data transfer can not be established.
Standards such as html allow interoperability between different systems and components.

C.1.10 Describe the different types of web page.
personal pages, blogs, search engine pages, forums, social media platforms, newspages, media sources, trading pages, customer service platforms, information pages of authorities

C.1.11 Explain the differences between a static web page and a dynamic web page.
*static HTML web pages are remaining with the same content and layout until the webdesigner is changing them. *dynamic web pages, that make us of PHP, ASP.NET, Java Servlets change their appearance and content depending on user input.

C.1.12 Explain the functions of a browser.
A web browser (commonly referred to as a browser) is a software application for retrieving, presenting, and traversing information resources on the World Wide Web.

C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages.
A client-side script will not require access to a remote server so that any processing that is done will be done more quickly and use less bandwidth; This will reduce the load on the server;

C.1.14 Describe how web pages can be connected to underlying data sources.
A webpage can be connected to a database server (for example a SQL-Server), from which the webserver can retrieve information that is to be displayed to the user. In IBCS Year one we created a series of C programs that could read and write a series of files as part of a grade book project. A web fron end was used to connect to a web server, and php programs executed by the web server would in turn call the C programs that could access the grade book files.

C.1.15 Describe the function of the common gateway interface (CGI).
CGI is making executable programs that are installed on a server available to a client.
Perl was one of the first programming languages to be used for CGI programming. Web servers could execute Perl programs on the server and direct the program output back to user. Perl can connect to database creating a "gateway" to data sources.

C.1.16 Evaluate the structure of different types of web pages.

C.2 Searching the Web

Command Term Level Definition

Define 1 Give the precise meaning of a word, phrase, concept or physical quantity.

Describe 2 Give a detailed account.

Distinguish 2 Make clear the differences between two or more concepts or items

Outline 2 Give a brief account or summary.

Discuss 3 Offer a considered and balanced review that includes a range of arguments, factors or hypotheses. Opinions or conclusions should be presented clearly and supported by appropriate evidence

Explain 3 Give a detailed account including reasons or causes.

Suggest 3 Propose a solution, hypothesis or other possible answer.

Vocabulary for Searching the Web

black hat optimization
bots
deep web
HITS algorithm
meta-tag
PageRank algorithm
parallel web crawling
search engine metrics
search engines
surface web
web crawler
web indexing
web robots
web spiders
white hat optimization

C.2.1

Define the term search engine.

A web search engine is a software system that is designed to search for information on the World Wide Web

Research Web Science class activity

C.2.2

Distinguish between the surface web and the deep web.

The Surface Web is that portion of the World Wide Web that is readily available to the general public and searchable with standard web search engines.
The deep web are parts of the World Wide Web whose contents are not indexed by standard search engines for any reason. The deep web is opposite to the surface web.
It is much larger than the surface web. Only a fraction of the data on the web is accessible by conventional means.
The deep web includes:
dynamically generated pages (as a result of queries /produced by JavaScript / downloaded from servers using AJAX/Flash)
pass-word protected pages (and subscriptions)
pages without any inlinks

C.2.3

Outline the principles of searching algorithms used by search engines.

Google's PageRank algorithm
each page is given a score (rank) for a particular search
the score determines how high up the list the page will appear
score primarily determined by the number(and importance ) of inlinks
the value of an inlink from Page A is proportional to P(A)/C(A) where P(A) is the PageRank of Page A, and C(A) are the number of outlinks from Page A
values are calculated when pages are indexed

Google's ranking also includes other factors such as:
the time that the page has existed
the frequency of the search keywords on the page
other unknown factors (the exact algorithm is not made public by Google)
Link analysis algorithm that assigns numerical weighting to each element of hyperlinked texts. PR(E) (page rank of E). A hyperlink to a page counts as a vote or support of a particular page. Importance by association. Number of paths to the page divided by number of outgoing links from the page/step before and then considering the PR of the previous page/step. Altogether, the different PageRanks would sum 1, its a probability distribution.
Ref: wikibooks.org: IB CS Web_Science

HITS algorithm
has been superseded by PageRank
based upon hubs and authorities
a hub is a page that leads to many authoritative pages
an authority is a page that is linked to by many hubs
page ranking determined by the sum of the hub score and the authority score
authority score is the sum of the hub scores of each node pointing to it
hub score is the sum of authority scores of every node that it points to
The HITS algorithm is an iterative process that is executed at query time (therefore relatively slow)
a link analysis program that also rates Web pages. Hubs and authorities. A good hub points to many pages, a good authority is a page linked to by many hubs. Each page is assigned two scores: its authority, which estimates value of content, and its hub value, which estimates the value of its links to other pages. First generates a root set (most relevant pages) through text-based algorithm. Then a base set generated by augmenting the root set with web pages linked from it or to it. The base set and all the hyperlinks in the base set form a focused subgraph upon which HITS is performed.
Ref: wikibooks.org: IB CS Web_Science

Things to also know and understand
the consequential effects that a change in PageRank of one page will have on others and that the calculation of PageRanks is an iterative process.

Things that you will not be asked on an IBCS Paper
mathematical examples - you will encounter math concepts in the algorithm reading assigned, and when we talk about graph theory.

C.2.4

Describe how a web crawler functions.

A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing. The "spider" checks for the standard filename robots.txt, addressed to it, before sending certain information back to be indexed depending on many factors, such as the titles, page content, JavaScript, Cascading Style Sheets (CSS), headings, as evidenced by the standard HTML markup of the informational content, or its metadata in HTML meta tags.
Web Crawlers:
creates acopy of every web page (for later indexing by the search engine) that it visits
usually starts at a popular site
searches a page for links to other pages
follows these links and repeats process
initially looks for the file robots.txt for instructions on pages to ignore (duplicate content, irrelevant pages)
also used to retrieve email addresses (for spam)
also used by webmaster for checking integrity of site (it can find links that are no longer valid or files that are missing)

Synonyms: web robots, bots, web spiders
Introduction to web crawlers.
Class Note form and example using wget to download java files.
- can access sites using http, https, and ftp protocols
- supports connecting using a userid and password
- supports identifying itself as a particular browser, which is useful for download browser specific versions of a web site/web page (for example, a firefox or Internet Explorer version)
- is available from the Cloud9 environment we are using.

C.2.5

Discuss the relationship between data in a meta-tag and how it is accessed by a web crawler.

Google says: "Currently we don't trust metadata because we are afraid of being manipulated". So meta tags can only be one source of information to index a web site. But mostly content based algorithms are being utilized by modern web crawlers.
Some spiders pay more attention to words occurring in
titles
sub-titles
metatags

while other spiders/index may index every one found on a page
Meta tags
are inserted by web designer/owner
contain keywords and concepts (helps to clarify meaning)
description / title can be shown in the search results
noindex, nofollowin 'robots' tag can instruct crawlers not to index pages

Things to also remember
keywords can be misleading

C.2.6

Discuss the use of parallel web crawling.

The expansion of the web has led to new search engine initiatives which include parallelization of web crawlers.
Parallel web crawlersare designed to:
maximize performance
minimise overheads
avoid duplication
communicate with each other (to avoid above)
can work different geographical areas

C.2.7

Outline the purpose of web-indexing in search engines.

Web indexing allows to quickly give the user search results based on the webpages meta data, content or other sources.
Web-crawlers retrieve copies of each web page visited
Each page is inspected to determine its ranking for specific search terms.

C.2.8

Suggest how web developers can create pages that appear more prominently in search engine results.

use good keywords in the content
make the page be linked by many source pages
Allow search engines to find your site - submit your web site for indexing to the search engines, make sure search engines have authorization to reach the pages you would like indexed
set the robots.txt file appropriately
Have a link-worthy site - so other web sites will link to yours, making it more relevent and increase your page rank
Identify key words, metadata
Ensure search-friendly architecture
Have quality content - and don't leave it stagnate. Updating the content regularly will make it more timely.
Remove outdated material
See also C.2.11

C.2.9

Describe the different metrics used by search engines.

Keyword rankings
Backlinks
Organic search traffic
Average time on-page
Pages per visitor
Trustworthiness of linking domain/hub
Popularity of linking page
Relevancy of content between source and target page
Anchor text used in link
Amount of links to the same page on source page
Amount of domains linking to target page
Relationship between source and target domains
Variations of anchor text in link to target page

How do different search engines compare? Parameters to look at include:
recall (finding the relevant page in an index)
precision (ranking a page correctly)
relevance
coverage
customization
user experience

C.2.10

Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.

This is a topic worth thinking about. Think about the vocabulary for the topic "searching the web" and how that vocabulary applies to answering this question. Here are some things to consider:

What is a search engine?
How does a search engine work?
If you were creating a search engine, what are the assumptions that you would make?
Who are the targeted users of search engines?
Who are the content providers for search engines?
Do all content providers follow the "rules?" What are the rules?
As a search engine designer, what "rules" would you want to your content providers to follow?
What would you do with content that doesn't follow the rules?
You need not write an essay (because the command term is Explain, but connect some of these ideas while addressing the question directly. Read the question again carefully and pick some key concepts to include in your answer.

C.2.11

Discuss the use of white hat and black hat search engine optimization.

Things to know
The difference betwwen white hat search engine optimization and black hat
The degree of success achieved by either white hat or black hat optimization efforts

White hat (links from C.2.8)
new sites can send XML site map to Google
include a robots.txt file
add site to Google's Webmaster Tools to warn you if site is uncrawlable
make sure the HI tag contains your main keyword
page titles contain keywords
relevant keywords with each image
site has suitable keyword density (but no keyword stuffing)
White hat techniques are "within" guidelines and considered ethical - long term return. Guest blogging, Link baiting, Quality content, Site optimization,
"In search engine optimization (SEO) terminology, white hat SEO refers to the usage of optimization strategies, techniques and tactics that focus on a human audience opposed to search engines and completely follows search engine rules and policies.
For example, a website that is optimized for search engines, yet focuses on relevancy and organic ranking is considered to be optimized using White Hat SEO practices. Some examples of White Hat SEO techniques include using keywords and keyword analysis, backlinking, link building to improve link popularity, and writing content for human readers.
White Hat SEO is more frequently used by those who intend to make a long-term investment on their website. Also called Ethical SEO."
Ref: White Hat SEO

Black-hat
hidden content
keyword stuffing
link farms
other tricks to get page rankings higher than they should be, or to get pages marked as hits when they may have nothing to do with a particular search.
Black hat use aggressive SEO strategies that exploit search engines rather than focusing on human audience - short term return. Include usage of: Blog spamming, Parasite hosting, Cloaking

White hat search engine optimization is being performed when filling the webpage with relevant data.
Hence black hat search engine optimization stuffs the webpage with key words which give barely sense. In order to give a good product to the user, good content of the webpage should be the main tool to get a high search engine ranking.

C.2.12

Outline future challenges to search engines as the web continues to grow.

Issues such as error management, lack of quality assurance of information uploaded. Since the number of webpages and the number of authors increase rapidly, it is getting more and more important for search engines to filter the information the user wants. Due to the larger amount of data in the world wide web, the crawlers have to be designed more efficiently.
Areas being developed are:
concept-based searching
natural language queries (e.g Ask.Jeeves.com)

Review Materials

Outline of things to know ReviewTopics-C-2-Searching-The-Web.pdf

Classwork/Homework Searching the Web Review Questions

C.3 Distributed Approaches to the Web

Command Term	Level	Definition
Define	1	Give the precise meaning of a word, phrase, concept or physical quantity.
Describe	2	Give a detailed account.
Distinguish	2	Make clear the differences between two or more concepts or items
Compare	3	Give an account of the similarities between two (or more) items or situations, referring to both (all) of them throughout.
Evaluate	3	Make an appraisal by weighing up the strengths and limitations
Explain	3	Give a detailed account including reasons or causes.

C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing.

C.3.2 Compare the major features of:

mobile computing

ubiquitous computing

peer-2-peer network
Explain one advantage of the use of a peer-2-peer (P2P) network for obtaining and downloading music and movie files.
Easier to set up; Less time will need to be spent in configuring the network;
Other advantages could deal with the increased range of available files and the lower (or even zero) costs involved (depending upon the network).

grid computing.

C.3.3 Distinguish between interoperability and open standards.

C.3.4 4 Describe the range of hardware used by distributed networks.

C.3.5 5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web.

C.3.6 Distinguish between lossless and lossy compression.
Discuss two factors that would affect the decision to use either lossless or lossy compression when transferring files across the Internet.

Lossless compression is used when loss of data is unacceptable when transferring files such as audio files;
Lossy compression may not significantly affect the final version of the file when it is decompressed;
Lossy compression will reduce file size;
Reduced file size may be an important requirement such as in the use of MP3 music files;
Lossy compression results in faster file transfer; Which is important when Internet connections are slow or files are large;
If lossy compression is used the original file cannot be reinstated;

C.3.7 Evaluate the use of decompression software in the transfer of information.

C.4 The Evolving Web

Command Term Level Definition

Describe 2 Give a detailed account.

Discuss 3 Offer a considered and balanced review that includes a range of arguments, factors or hypotheses. Opinions or conclusions should be presented clearly and supported by appropriate evidence

Explain 3 Give a detailed account including reasons or causes.

C.4.1 Discuss how the web has supported new methods of online interaction such as social networking.

C.4.2 Describe how cloud computing is different from a client-server architecture.
Define the term Private Cloud:
Cloud computing services that are provided for a particular group with a limited number of users;

C.4.3 Discuss the effects of the use of cloud computing for specified organizations.

C.4.4 Discuss the management of issues such as copyright and intellectual property on the web.

C.4.5 Describe the interrelationship between privacy, identification and authentication.

C.4.6 Describe the role of network architecture, protocols and standards in the future development of the web.

C.4.7 Explain why the web may be creating unregulated monopolies.

C.4.8 Discuss the effects of a decentralized and democratic web.

HL Extension C.5 Analysing the Web

Command Term Level Definition

Describe 2 Give a detailed account.

Outline 2 Give a brief account or summary.

Discuss 3 Offer a considered and balanced review that includes a range of arguments, factors or hypotheses. Opinions or conclusions should be presented clearly and supported by appropriate evidence

Explain 3 Give a detailed account including reasons or causes.

C.5.1 Describe how the web can be represented as a directed graph.

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

C.5.2 Outline the difference between the web graph and sub-graphs.

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

C.5.3 Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter.

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

C.5.4 Explain the role of graph theory in determining the connectivity of the web.

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

C.5.5 Explain that search engines and web crawling use the web graph to access information. <

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

C.5.6 Discuss whether power laws are appropriate to predict the development of the web.

See Handout: C.5 Analysing the web
Source: www.cs-ib.net

HL Extension C.6 The Intellegent Web

Command Term Level Definition

Define 1 Give the precise meaning of a word, phrase, concept or physical quantity.

Describe 2 Give a detailed account.

Distinguish 2 Make clear the differences between two or more concepts or items

Discuss 3 Offer a considered and balanced review that includes a range of arguments, factors or hypotheses. Opinions or conclusions should be presented clearly and supported by appropriate evidence

Evaluate 3 Make an appraisal by weighing up the strengths and limitations

Explain 3 Give a detailed account including reasons or causes.

C.6.1 Define the term semantic web.
A "web of data" that can be read and analysed by machines. Students should appreciate the difference between this and a "web of documents" which would describe the present state of the web (pre-Semantic Web).

C.6.2 Distinguish between the text-web and the multimedia-web.
This is part of the evolution of the web.
Describe some of the tools that allow the use of multimedia.

C.6.3 Describe the aims of the semantic web.

The web should become the ultimate (machine-readable) database.
The facility to link data across different enterprises.
The web should become a highly collaborative medium.
Common vocabularies and methods for handling and querying data need to be developed and agreed upon.
Students should explore the above and also understand the principle features of the RDF model.

C.6.4 Distinguish between an ontology and folksonomy.
An ontology is a standardised vocabulary(that avoids ambiguities) for use on the web that allows data from different enterprises to be usefully combined. It includes the use of relationship (e.g. creator = author).
A folksonomy is a more informal ontology that has evolved throughthe use of tags posted by ordinary users.
Students should look at specific examples of each (e.g. ontologies: DBPedia/ incorporating book data from different booksellers; folksonomy: use in photo albums, blogs, delicious.com).

Activities
Research Web Science class activity

C.6.5 Describe how folksonomies and emergent social structures are changing the web.
Following on from above, students should discuss how sites that allow users to tag the elements on those sites make the data more accessible. Sites to look at include:

Technorati
Delicious
Flickr
MetaFilter

Are these new structures a viable alternative to search engines?

C.6.6 Explain why there needs to be a balance between expressivity and usability on the semantic web.
Discuss the balance between creating web pages for the benefit of people or for the benefit of machines.

C.6.7 Evaluate methods of searching for information on the web.
Can YouTube be classed as a search engine? Google�s Panda puts the focus on quality. Cloud Kite (Open Drive) for searching the cloud. Multimedia search engines (visual / audio).

C.6.8 Distinguish between ambient intelligence and collective intelligence.
Ambient intelligence collects and processes data from the physical surroundings in order to provide a unique user experience.
Collective intelligence collects and processes data about a particular topic from around the web.

Activities
Research Web Science class activity

C.6.9 Discuss how ambient intelligence can be used to support people.
Be able to discuss different examples looking at both positive and negative consequences. The discussion should include the technology needed for this, such as nanotechnology, biometrics, sensors etc.

C.6.10 Explain how collective intelligence can be applied to complex issues.
Research examples such as climate change, social bookmarking, and stock market fluctuations.

C.3.1	Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing.
C.3.2	Compare the major features of: mobile computing ubiquitous computing peer-2-peer network Explain one advantage of the use of a peer-2-peer (P2P) network for obtaining and downloading music and movie files. Easier to set up; Less time will need to be spent in configuring the network; Other advantages could deal with the increased range of available files and the lower (or even zero) costs involved (depending upon the network). grid computing.
C.3.3	Distinguish between interoperability and open standards.
C.3.4	4 Describe the range of hardware used by distributed networks.
C.3.5	5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web.
C.3.6	Distinguish between lossless and lossy compression. Discuss two factors that would affect the decision to use either lossless or lossy compression when transferring files across the Internet. Lossless compression is used when loss of data is unacceptable when transferring files such as audio files; Lossy compression may not significantly affect the final version of the file when it is decompressed; Lossy compression will reduce file size; Reduced file size may be an important requirement such as in the use of MP3 music files; Lossy compression results in faster file transfer; Which is important when Internet connections are slow or files are large; If lossy compression is used the original file cannot be reinstated;
C.3.7	Evaluate the use of decompression software in the transfer of information.

Command Term	Level	Definition
Describe	2	Give a detailed account.
Discuss	3	Offer a considered and balanced review that includes a range of arguments, factors or hypotheses. Opinions or conclusions should be presented clearly and supported by appropriate evidence
Explain	3	Give a detailed account including reasons or causes.

C.4.1	Discuss how the web has supported new methods of online interaction such as social networking.
C.4.2	Describe how cloud computing is different from a client-server architecture. Define the term Private Cloud: Cloud computing services that are provided for a particular group with a limited number of users;
C.4.3	Discuss the effects of the use of cloud computing for specified organizations.
C.4.4	Discuss the management of issues such as copyright and intellectual property on the web.
C.4.5	Describe the interrelationship between privacy, identification and authentication.
C.4.6	Describe the role of network architecture, protocols and standards in the future development of the web.
C.4.7	Explain why the web may be creating unregulated monopolies.
C.4.8	Discuss the effects of a decentralized and democratic web.

C.5.1	Describe how the web can be represented as a directed graph. See Handout: C.5 Analysing the web Source: www.cs-ib.net
C.5.2	Outline the difference between the web graph and sub-graphs. See Handout: C.5 Analysing the web Source: www.cs-ib.net
C.5.3	Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter. See Handout: C.5 Analysing the web Source: www.cs-ib.net
C.5.4	Explain the role of graph theory in determining the connectivity of the web. See Handout: C.5 Analysing the web Source: www.cs-ib.net
C.5.5	Explain that search engines and web crawling use the web graph to access information. < See Handout: C.5 Analysing the web Source: www.cs-ib.net
C.5.6	Discuss whether power laws are appropriate to predict the development of the web. See Handout: C.5 Analysing the web Source: www.cs-ib.net

Option C: Web science

C.1 Creating the Web

C.2 Searching the Web

Vocabulary for Searching the Web

C.3 Distributed Approaches to the Web

C.4 The Evolving Web

HL Extension C.5 Analysing the Web

HL Extension C.6 The Intellegent Web